transformer_lens.model_bridge.supported_architectures.mixtral module¶

Mixtral architecture adapter.

class transformer_lens.model_bridge.supported_architectures.mixtral.MixtralArchitectureAdapter(cfg: Any)¶

Architecture adapter for Mixtral models.

Mixtral uses a pre-norm architecture with RMSNorm, rotary position embeddings (RoPE), and a Sparse Mixture of Experts MLP. Key features:

Pre-norm: RMSNorm applied BEFORE attention and BEFORE MLP.
Rotary embeddings: stored at model.rotary_emb and passed per-forward-call.
Sparse MoE: batched expert parameters (gate_up_proj, down_proj as 3D tensors).
MixtralAttention.forward() requires position_embeddings and attention_mask args.
Optional GQA (n_key_value_heads may differ from n_heads).

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶

Set up rotary embedding references for Mixtral component testing.

Mixtral uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.

Parameters: