transformer_lens.model_bridge.supported_architectures.qwen3_next module¶

Qwen3Next architecture adapter.

Hybrid linear-attention (GatedDeltaNet) + full-attention with sparse MoE MLP. 3 linear-attn layers per 1 full-attn layer. Extends Qwen3 base with optional attention mapping, MoE MLP, and fold_ln disabled.

class transformer_lens.model_bridge.supported_architectures.qwen3_next.Qwen3NextArchitectureAdapter(cfg: Any)¶

Bases: Qwen3ArchitectureAdapter

Hybrid linear-attention + full-attention with sparse MoE MLP.

Same hybrid design as Qwen3.5 but with MoE instead of dense MLP.

preprocess_weights(state_dict: dict[str, Tensor]) → dict[str, Tensor]¶: Slice query half from gated q_proj.weight for weight-space analysis.