transformer_lens.model_bridge.supported_architectures.qwen3_5 module¶

Qwen3.5 architecture adapter.

Hybrid linear-attention (GatedDeltaNet) + full-attention with dense gated MLP. 3 linear-attn layers per 1 full-attn layer. Extends Qwen3 base with optional attention mapping and fold_ln disabled.

class transformer_lens.model_bridge.supported_architectures.qwen3_5.Qwen3_5ArchitectureAdapter(cfg: Any)¶

Bases: Qwen3ArchitectureAdapter

Hybrid linear-attention + full-attention with dense gated MLP.

Inherits Qwen3 config/attention/MLP structure. Differences: - Attention + linear_attn are optional (per-layer type) - Gated q_proj (2x wide) sliced by preprocess_weights for weight analysis

prepare_loading(model_name: str, model_kwargs: dict) → None¶

Swap multimodal Qwen3_5Config for text-only Qwen3_5TextConfig.

Published checkpoints carry architectures=[‘Qwen3_5ForConditionalGeneration’]. We replace config with text_config so AutoModelForCausalLM loads the text-only Qwen3_5ForCausalLM.

prepare_model(hf_model: Any) → None¶: Reject full multimodal Qwen3.5 models on this text-only adapter.

preprocess_weights(state_dict: dict[str, Tensor]) → dict[str, Tensor]¶

Slice query half from gated q_proj.weight for weight-space analysis.

In processed mode, W_Q is the pure query projection (for composition scores, logit lens). Gate signal available in unprocessed mode on full-attention layers via blocks.N.attn.hook_q_gate.