transformer_lens.model_bridge.supported_architectures.qwen2 module

Qwen2 architecture adapter.

class transformer_lens.model_bridge.supported_architectures.qwen2.Qwen2ArchitectureAdapter(cfg: Any)

Bases: ArchitectureAdapter

Architecture adapter for Qwen2 models.

Optional Parameters (may not exist in state_dict):

Qwen2 models do NOT have biases on any linear layers:

  • blocks.{i}.attn.b_Q - No bias on query projection

  • blocks.{i}.attn.b_K - No bias on key projection

  • blocks.{i}.attn.b_V - No bias on value projection

  • blocks.{i}.attn.b_O - No bias on output projection

  • blocks.{i}.mlp.b_in - No bias on MLP input (up_proj)

  • blocks.{i}.mlp.b_gate - No bias on MLP gate projection

  • blocks.{i}.mlp.b_out - No bias on MLP output (down_proj)

  • blocks.{i}.ln1.b - RMSNorm has no bias

  • blocks.{i}.ln2.b - RMSNorm has no bias

  • ln_final.b - RMSNorm has no bias

Weight processing must handle these missing biases gracefully using ProcessWeights._safe_get_tensor() or by checking for None values.

__init__(cfg: Any) None

Initialize the Qwen2 architecture adapter.

setup_component_testing(hf_model: Any, bridge_model: Any = None) None

Set up rotary embedding references for Qwen2 component testing.

Qwen2 uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.

Parameters:
  • hf_model – The HuggingFace Qwen2 model instance

  • bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)