transformer_lens.model_bridge.supported_architectures.olmo module

OLMo architecture adapter.

class transformer_lens.model_bridge.supported_architectures.olmo.OlmoArchitectureAdapter(cfg: Any)

Bases: ArchitectureAdapter

Architecture adapter for OLMo (v1) models.

OLMo v1 uses a pre-norm architecture with a custom non-learnable LayerNorm (fixed weight=1, bias=0), rotary position embeddings (RoPE), and gated MLP (SwiGLU). Key differences from later OLMo variants:

  • Pre-norm: LayerNorm is applied BEFORE attention and BEFORE MLP.

  • Non-learnable LayerNorm: Weight and bias are not trainable parameters. Delegating to HF’s native forward via NormalizationBridge handles this correctly.

  • No Q/K normalization in attention.

  • Optional QKV clipping (handled by HF’s native attention forward).

Optional Parameters (may not exist in state_dict):

  • blocks.{i}.attn.b_Q - No bias on query projection

  • blocks.{i}.attn.b_K - No bias on key projection

  • blocks.{i}.attn.b_V - No bias on value projection

  • blocks.{i}.attn.b_O - No bias on output projection

  • blocks.{i}.mlp.b_in - No bias on MLP up_proj

  • blocks.{i}.mlp.b_gate - No bias on MLP gate_proj

  • blocks.{i}.mlp.b_out - No bias on MLP down_proj

__init__(cfg: Any) None

Initialize the OLMo architecture adapter.

prepare_model(hf_model: Any) None

Patch OLMo’s in-place clamp_ to avoid backward hook conflicts.

OLMo v1 uses query_states.clamp_() when config.clip_qkv is set. In-place ops on tensors that pass through register_full_backward_hook trigger PyTorch’s “view modified inplace” error. This patch disables the in-place clamp branch during attention forward passes.

Note: clip_qkv clamping is skipped in the patched forward. In practice clip_qkv values (typically 100+) rarely activate. If exact clamping is needed, add out-of-place clamp hooks on hook_q/hook_k/hook_v.

setup_component_testing(hf_model: Any, bridge_model: Any = None) None

Set up rotary embedding references for OLMo component testing.

OLMo uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.

Parameters:
  • hf_model – The HuggingFace OLMo model instance

  • bridge_model – The TransformerBridge model (if available)