transformer_lens.model_bridge.supported_architectures.granite_moe module

Granite MoE architecture adapter.

class transformer_lens.model_bridge.supported_architectures.granite_moe.GraniteMoeArchitectureAdapter(cfg: Any)

Bases: GraniteArchitectureAdapter

Architecture adapter for IBM Granite MoE models.

Identical to dense Granite but replaces the gated MLP with a Sparse Mixture of Experts block (block_sparse_moe) using batched expert parameters and top-k routing.