transformer_lens.model_bridge.supported_architectures.xglm module¶
XGLM architecture adapter.
Supports XGLMForCausalLM (facebook/xglm-*). Assumes add_cross_attention=False (all published XGLM checkpoints).
- class transformer_lens.model_bridge.supported_architectures.xglm.XGLMArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for XGLM models.
XGLM uses pre-norm LayerNorm, sinusoidal positional embeddings (no learnable weights), standard MHA with separate q/k/v/out_proj, and a 2-layer MLP (fc1/fc2) that lives directly on the decoder block rather than inside an mlp sub-module.
All attention projections and fc1/fc2 carry biases. lm_head has no bias. Embeddings are scaled by sqrt(d_model) at runtime in XGLMScaledWordEmbedding.
Optional Parameters (may not exist in state_dict):¶
None — all published XGLM checkpoints include all parameters listed above.
- __init__(cfg: Any) None¶
Initialize the XGLM architecture adapter.
- setup_hook_compatibility(bridge: Any) None¶
Scale hook_embed by sqrt(d_model) to match XGLMScaledWordEmbedding.forward().
XGLMScaledWordEmbedding multiplies the embedding lookup by embed_scale = sqrt(d_model) at runtime. Without this override, hook_embed would capture the raw (unscaled) table output, diverging from actual model activations.