transformer_lens.model_bridge.supported_architectures.qwen3_next module¶
Qwen3Next architecture adapter.
Hybrid linear-attention (GatedDeltaNet) + full-attention with sparse MoE MLP. 3 linear-attn layers per 1 full-attn layer. Extends Qwen3 base with optional attention mapping, MoE MLP, and fold_ln disabled.
- class transformer_lens.model_bridge.supported_architectures.qwen3_next.Qwen3NextArchitectureAdapter(cfg: Any)¶
Bases:
Qwen3ArchitectureAdapterHybrid linear-attention + full-attention with sparse MoE MLP.
Same hybrid design as Qwen3.5 but with MoE instead of dense MLP.
- preprocess_weights(state_dict: dict[str, Tensor]) dict[str, Tensor]¶
Slice query half from gated q_proj.weight for weight-space analysis.