transformer_lens.model_bridge.supported_architectures.granite_moe module¶
Granite MoE architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.granite_moe.GraniteMoeArchitectureAdapter(cfg: Any)¶
Bases:
GraniteArchitectureAdapterArchitecture adapter for IBM Granite MoE models.
Identical to dense Granite but replaces the gated MLP with a Sparse Mixture of Experts block (block_sparse_moe) using batched expert parameters and top-k routing.