transformer_lens.model_bridge.supported_architectures.qwen2 module¶
Qwen2 architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.qwen2.Qwen2ArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for Qwen2 models.
Optional Parameters (may not exist in state_dict):¶
Qwen2 models do NOT have biases on any linear layers:
blocks.{i}.attn.b_Q - No bias on query projection
blocks.{i}.attn.b_K - No bias on key projection
blocks.{i}.attn.b_V - No bias on value projection
blocks.{i}.attn.b_O - No bias on output projection
blocks.{i}.mlp.b_in - No bias on MLP input (up_proj)
blocks.{i}.mlp.b_gate - No bias on MLP gate projection
blocks.{i}.mlp.b_out - No bias on MLP output (down_proj)
blocks.{i}.ln1.b - RMSNorm has no bias
blocks.{i}.ln2.b - RMSNorm has no bias
ln_final.b - RMSNorm has no bias
Weight processing must handle these missing biases gracefully using ProcessWeights._safe_get_tensor() or by checking for None values.
- __init__(cfg: Any) None¶
Initialize the Qwen2 architecture adapter.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for Qwen2 component testing.
Qwen2 uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.
- Parameters:
hf_model – The HuggingFace Qwen2 model instance
bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)