transformer_lens.model_bridge.supported_architectures.apertus module¶
Apertus architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.apertus.ApertusArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for Apertus models.
Apertus uses a pre-norm architecture with RMSNorm, Q/K normalization in attention, rotary position embeddings (RoPE with LLaMA-3 scaling), grouped query attention (GQA), non-gated MLP (XiELU activation), and no biases on any projections.
Similar to Qwen3 (pre-norm RMSNorm, QK-norm, GQA, RoPE) but uses a non-gated MLP (up_proj -> XiELU -> down_proj) instead of gated MLP.
Note: Apertus uses different layer norm names than most Llama-family models: - attention_layernorm (instead of input_layernorm) - feedforward_layernorm (instead of post_attention_layernorm)
- __init__(cfg: Any) None¶
Initialize the Apertus architecture adapter.
- prepare_loading(model_name: str, model_kwargs: dict) None¶
Patch XIELUActivation to defer eager .item() calls for meta tensor compat.
Transformers v5 uses meta tensors during from_pretrained, but XIELUActivation.__init__ eagerly calls .item() on beta/eps buffers to precompute _beta_scalar/_eps_scalar for the CUDA kernel path. This fails on meta device. Once upstream fixes this (transformers PR #43473), this patch can be removed.
Instead of reimplementing __init__, we wrap it to catch the meta tensor failure and defer scalar computation to forward() time.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for Apertus component testing.
Apertus uses RoPE (Rotary Position Embeddings). We set the rotary_emb on all attention bridge instances for component testing.
We also force the HF model to use “eager” attention to match the bridge’s implementation. The bridge uses “eager” to support output_attentions for hooks.
- Parameters:
hf_model – The HuggingFace Apertus model instance
bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)