transformer_lens.model_bridge.supported_architectures.granite_moe_hybrid module¶
Granite MoE Hybrid architecture adapter.
Hybrid Mamba2 + Attention with Sparse MoE. Most layers are Mamba SSM blocks; a few are standard attention (determined by config.layer_types). Every layer has a shared MLP and optional sparse MoE.
Both attention and Mamba are mapped as optional — each present only on its respective layer type. Mamba hooks expose in_proj, conv1d, and inner_norm.
- class transformer_lens.model_bridge.supported_architectures.granite_moe_hybrid.GraniteMoeHybridArchitectureAdapter(cfg: Any)¶
Bases:
GraniteArchitectureAdapterHybrid Mamba2 + Attention with Sparse MoE.
Attention is optional (absent on Mamba layers). shared_mlp and MoE are universal. Inherits Granite config and attention bridge construction.