transformer_lens.model_bridge.supported_architectures.granite_moe_hybrid module

Granite MoE Hybrid architecture adapter.

Hybrid Mamba2 + Attention with Sparse MoE. Most layers are Mamba SSM blocks; a few are standard attention (determined by config.layer_types). Every layer has a shared MLP and optional sparse MoE.

Both attention and Mamba are mapped as optional — each present only on its respective layer type. Mamba hooks expose in_proj, conv1d, and inner_norm.

class transformer_lens.model_bridge.supported_architectures.granite_moe_hybrid.GraniteMoeHybridArchitectureAdapter(cfg: Any)

Bases: GraniteArchitectureAdapter

Hybrid Mamba2 + Attention with Sparse MoE.

Attention is optional (absent on Mamba layers). shared_mlp and MoE are universal. Inherits Granite config and attention bridge construction.