transformer_lens.model_bridge.supported_architectures.glm4_moe module¶
GLM-4.5 MoE architecture adapter.
Supports GLM-4.5/4.6/4.7 mixture-of-experts families (Glm4MoeForCausalLM).
Key features: - RMSNorm with partial pre-norm layout. - RoPE-style rotary embeddings (partial RoPE supported by Hugging Face model logic). - Q/K normalization blocks (q_norm, k_norm) and GQA / MQA handling. - Sparse MoE block in model.layers[i].mlp, with optional dense-prefix layers. - QKVO rearrangements for bridge-side attention hooks.
Optional Parameters (may not exist in state_dict):¶
blocks.{i}.mlp.gate - absent on dense-prefix layers before sparse MoE starts.
- class transformer_lens.model_bridge.supported_architectures.glm4_moe.Glm4MoeArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for GLM-4.5 / 4.6 / 4.7 MoE decoder models.
GLM-4x MoE families use RMSNorm, RoPE and sparse routing, with early dense-MLP layers in some checkpoints. The dense layers are represented by a present-but-slightly-thinner mlp sub-module where routing is absent.
- __init__(cfg: Any) None¶
Initialize the GLM-4 MoE architecture adapter.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for GLM-4 MoE component testing.