transformer_lens.model_bridge.generalized_components.glm_moe_dsa_attention module¶

GLM-MoE-DSA attention bridge component.

class transformer_lens.model_bridge.generalized_components.glm_moe_dsa_attention.GlmMoeDsaAttentionBridge(name: str, config: Any, submodules: Dict[str, GeneralizedComponent] | None = None, **kwargs: Any)¶

Bases: MLAAttentionBridge

Bridge for GLM-5 DeepSeek Sparse Attention.

GLM-MoE-DSA extends MLA with a learned top-k token indexer and returns (attn_output, attn_weights, topk_indices_or_none) to feed shared top-k indices into later layers.

forward(*args: Any, **kwargs: Any) → Any¶

Reimplemented MLA forward with hooks at each computation stage.

Follows the DeepseekV3Attention forward path, calling into HF submodules individually and firing hooks at each meaningful stage.