transformer_lens.model_bridge.supported_architectures.stablelm module¶
StableLM architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.stablelm.StableLmArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for StableLM models.
StableLM uses a Llama-like architecture with separate Q/K/V projections and gated MLP, but differs in using standard LayerNorm (not RMSNorm) and partial rotary embeddings (25% of head dimensions by default).
Supports optional features: - Grouped Query Attention (num_key_value_heads != num_attention_heads) - QKV bias (use_qkv_bias=True on some models like stable-code-3b) - Parallel residual connections (use_parallel_residual=True) - Per-head QK LayerNorm (qk_layernorm=True)
Optional Parameters (may not exist in state_dict):¶
blocks.{i}.attn.b_Q - Only present when use_qkv_bias=True
blocks.{i}.attn.b_K - Only present when use_qkv_bias=True
blocks.{i}.attn.b_V - Only present when use_qkv_bias=True
blocks.{i}.attn.b_O - No bias on output projection
blocks.{i}.mlp.b_in - No bias on MLP up_proj
blocks.{i}.mlp.b_gate - No bias on MLP gate_proj
blocks.{i}.mlp.b_out - No bias on MLP down_proj
- __init__(cfg: Any) None¶
Initialize the StableLM architecture adapter.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for StableLM component testing.
StableLM uses RoPE (Rotary Position Embeddings) with partial rotation. We set the rotary_emb reference on all attention bridge instances and force eager attention for numerical consistency.
- Parameters:
hf_model – The HuggingFace StableLM model instance
bridge_model – The TransformerBridge model (if available)
- setup_hook_compatibility(bridge: Any) None¶
Inject hook points for QK LayerNorm on models with qk_layernorm=True.
StableLM v2 models (e.g., stablelm-2-12b) apply per-head LayerNorm to Q and K after projection but before rotary embedding. The native HF attention handles this internally, but we inject hooks so researchers can observe/intervene on the post-norm Q/K values.
- Adds to each attention bridge:
hook_q_layernorm: fires after q_layernorm(query_states)
hook_k_layernorm: fires after k_layernorm(key_states)
This runs during bridge __init__ via _setup_hook_compatibility(), after component setup but before hook registry finalization. The hook registry scanner skips _original_component subtrees, so we register hooks directly in bridge._hook_registry with canonical TL-style names.
- Parameters:
bridge – The TransformerBridge instance (fully initialized)