transformer_lens.model_bridge.supported_architectures.qwen3_5_multimodal module¶
Qwen3.5 multimodal (vision-language) adapter for Qwen3_5ForConditionalGeneration.
Reuses the text-only Qwen3.5 hybrid backbone nested under model.language_model and adds
the vision tower (model.visual) + merger. The HF model runs the vision computation during
forward; this adapter only supplies the component mapping (hooks + weights).
- class transformer_lens.model_bridge.supported_architectures.qwen3_5_multimodal.Qwen3_5MultimodalArchitectureAdapter(cfg: Any)¶
Bases:
Qwen3ArchitectureAdapterFull vision-language adapter for Qwen3_5ForConditionalGeneration.
- component_mapping: ComponentMapping | None¶
- preprocess_weights(state_dict: dict[str, Tensor]) dict[str, Tensor]¶
Slice query half from gated q_proj.weight (matcher is path-prefix-agnostic).
- required_libraries: list[str] = ['torchvision']¶
- required_libraries_group: str = 'multimodal'¶
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set eager attn and rotary_emb refs for the nested language model.
Hybrid: only full-attention layers have
self_attn/attn; linear-attention layers are skipped.
- uses_split_attention: bool¶
- weight_processing_conversions: dict¶