transformer_lens.model_bridge.generalized_components.glm_moe_dsa_attention module¶
GLM-MoE-DSA attention bridge component.
- class transformer_lens.model_bridge.generalized_components.glm_moe_dsa_attention.GlmMoeDsaAttentionBridge(name: str, config: Any, submodules: Dict[str, GeneralizedComponent] | None = None, **kwargs: Any)¶
Bases:
MLAAttentionBridgeBridge for GLM-5 DeepSeek Sparse Attention.
GLM-MoE-DSA extends MLA with a learned top-k token indexer and returns
(attn_output, attn_weights, topk_indices_or_none)to feed shared top-k indices into later layers.- forward(*args: Any, **kwargs: Any) Any¶
Reimplemented MLA forward with hooks at each computation stage.
Follows the DeepseekV3Attention forward path, calling into HF submodules individually and firing hooks at each meaningful stage.