transformer_lens.model_bridge.supported_architectures.stablelm module

StableLM architecture adapter.

class transformer_lens.model_bridge.supported_architectures.stablelm.StableLmArchitectureAdapter(cfg: Any)

Bases: ArchitectureAdapter

Architecture adapter for StableLM models.

StableLM uses a Llama-like architecture with separate Q/K/V projections and gated MLP, but differs in using standard LayerNorm (not RMSNorm) and partial rotary embeddings (25% of head dimensions by default).

Supports optional features: - Grouped Query Attention (num_key_value_heads != num_attention_heads) - QKV bias (use_qkv_bias=True on some models like stable-code-3b) - Parallel residual connections (use_parallel_residual=True) - Per-head QK LayerNorm (qk_layernorm=True)

Optional Parameters (may not exist in state_dict):

  • blocks.{i}.attn.b_Q - Only present when use_qkv_bias=True

  • blocks.{i}.attn.b_K - Only present when use_qkv_bias=True

  • blocks.{i}.attn.b_V - Only present when use_qkv_bias=True

  • blocks.{i}.attn.b_O - No bias on output projection

  • blocks.{i}.mlp.b_in - No bias on MLP up_proj

  • blocks.{i}.mlp.b_gate - No bias on MLP gate_proj

  • blocks.{i}.mlp.b_out - No bias on MLP down_proj

__init__(cfg: Any) None

Initialize the StableLM architecture adapter.

setup_component_testing(hf_model: Any, bridge_model: Any = None) None

Set up rotary embedding references for StableLM component testing.

StableLM uses RoPE (Rotary Position Embeddings) with partial rotation. We set the rotary_emb reference on all attention bridge instances and force eager attention for numerical consistency.

Parameters:
  • hf_model – The HuggingFace StableLM model instance

  • bridge_model – The TransformerBridge model (if available)

setup_hook_compatibility(bridge: Any) None

Inject hook points for QK LayerNorm on models with qk_layernorm=True.

StableLM v2 models (e.g., stablelm-2-12b) apply per-head LayerNorm to Q and K after projection but before rotary embedding. The native HF attention handles this internally, but we inject hooks so researchers can observe/intervene on the post-norm Q/K values.

Adds to each attention bridge:
  • hook_q_layernorm: fires after q_layernorm(query_states)

  • hook_k_layernorm: fires after k_layernorm(key_states)

This runs during bridge __init__ via _setup_hook_compatibility(), after component setup but before hook registry finalization. The hook registry scanner skips _original_component subtrees, so we register hooks directly in bridge._hook_registry with canonical TL-style names.

Parameters:

bridge – The TransformerBridge instance (fully initialized)