transformer_lens.model_bridge.supported_architectures.llama module

Llama architecture adapter.

class transformer_lens.model_bridge.supported_architectures.llama.LlamaArchitectureAdapter(cfg: Any)

Bases: ArchitectureAdapter

Architecture adapter for Llama models.

Optional Parameters (may not exist in state_dict):

LLaMA models do NOT have biases on attention and MLP projections:

  • blocks.{i}.attn.b_Q - No bias on query projection

  • blocks.{i}.attn.b_K - No bias on key projection

  • blocks.{i}.attn.b_V - No bias on value projection

  • blocks.{i}.attn.b_O - No bias on output projection

  • blocks.{i}.mlp.b_in - No bias on MLP input (up_proj)

  • blocks.{i}.mlp.b_gate - No bias on MLP gate projection

  • blocks.{i}.mlp.b_out - No bias on MLP output (down_proj)

  • blocks.{i}.ln1.b - RMSNorm has no bias

  • blocks.{i}.ln2.b - RMSNorm has no bias

  • ln_final.b - RMSNorm has no bias

Weight processing must handle these missing biases gracefully using ProcessWeights._safe_get_tensor() or by checking for None values.

__init__(cfg: Any) None

Initialize the Llama architecture adapter.

setup_component_testing(hf_model: Any, bridge_model: Any = None) None

Set up rotary embedding references for Llama component testing.

Llama uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.

Parameters:
  • hf_model – The HuggingFace Llama model instance

  • bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)