transformer_lens.model_bridge.supported_architectures.mixtral module¶
Mixtral architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.mixtral.MixtralArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for Mixtral models.
Mixtral uses a pre-norm architecture with RMSNorm, rotary position embeddings (RoPE), and a Sparse Mixture of Experts MLP. Key features:
Pre-norm: RMSNorm applied BEFORE attention and BEFORE MLP.
Rotary embeddings: stored at model.rotary_emb and passed per-forward-call.
Sparse MoE: batched expert parameters (gate_up_proj, down_proj as 3D tensors).
MixtralAttention.forward() requires position_embeddings and attention_mask args.
Optional GQA (n_key_value_heads may differ from n_heads).
- __init__(cfg: Any) None¶
Initialize the Mixtral architecture adapter.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for Mixtral component testing.
Mixtral uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.
- Parameters:
hf_model – The HuggingFace Mixtral model instance
bridge_model – The TransformerBridge model (if available)