transformer_lens.model_bridge.supported_architectures.llama module¶
Llama architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.llama.LlamaArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for Llama models.
Optional Parameters (may not exist in state_dict):¶
LLaMA models do NOT have biases on attention and MLP projections:
blocks.{i}.attn.b_Q - No bias on query projection
blocks.{i}.attn.b_K - No bias on key projection
blocks.{i}.attn.b_V - No bias on value projection
blocks.{i}.attn.b_O - No bias on output projection
blocks.{i}.mlp.b_in - No bias on MLP input (up_proj)
blocks.{i}.mlp.b_gate - No bias on MLP gate projection
blocks.{i}.mlp.b_out - No bias on MLP output (down_proj)
blocks.{i}.ln1.b - RMSNorm has no bias
blocks.{i}.ln2.b - RMSNorm has no bias
ln_final.b - RMSNorm has no bias
Weight processing must handle these missing biases gracefully using ProcessWeights._safe_get_tensor() or by checking for None values.
- __init__(cfg: Any) None¶
Initialize the Llama architecture adapter.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for Llama component testing.
Llama uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.
- Parameters:
hf_model – The HuggingFace Llama model instance
bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)