transformer_lens.model_bridge.supported_architectures.qwen3_5_multimodal module¶

Qwen3.5 multimodal (vision-language) adapter for Qwen3_5ForConditionalGeneration.

Reuses the text-only Qwen3.5 hybrid backbone nested under model.language_model and adds the vision tower (model.visual) + merger. The HF model runs the vision computation during forward; this adapter only supplies the component mapping (hooks + weights).

class transformer_lens.model_bridge.supported_architectures.qwen3_5_multimodal.Qwen3_5MultimodalArchitectureAdapter(cfg: Any)¶

Bases: Qwen3ArchitectureAdapter

Full vision-language adapter for Qwen3_5ForConditionalGeneration.

component_mapping: ComponentMapping | None¶

preprocess_weights(state_dict: dict[str, Tensor]) → dict[str, Tensor]¶: Slice query half from gated q_proj.weight (matcher is path-prefix-agnostic).

required_libraries: list[str] = ['torchvision']¶

required_libraries_group: str = 'multimodal'¶

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶

Set eager attn and rotary_emb refs for the nested language model.

Hybrid: only full-attention layers have self_attn/attn; linear-attention layers are skipped.

uses_split_attention: bool¶

weight_processing_conversions: dict¶