transformer_lens.model_bridge.supported_architectures.olmo module¶

OLMo architecture adapter.

class transformer_lens.model_bridge.supported_architectures.olmo.OlmoArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for OLMo (v1) models.

OLMo v1 uses a pre-norm architecture with a custom non-learnable LayerNorm (fixed weight=1, bias=0), rotary position embeddings (RoPE), and gated MLP (SwiGLU). Key differences from later OLMo variants:

Pre-norm: LayerNorm is applied BEFORE attention and BEFORE MLP.
Non-learnable LayerNorm: Weight and bias are not trainable parameters. Delegating to HF’s native forward via NormalizationBridge handles this correctly.
No Q/K normalization in attention.
Optional QKV clipping (handled by HF’s native attention forward).

Optional Parameters (may not exist in state_dict):¶

blocks.{i}.attn.b_Q - No bias on query projection
blocks.{i}.attn.b_K - No bias on key projection
blocks.{i}.attn.b_V - No bias on value projection
blocks.{i}.attn.b_O - No bias on output projection
blocks.{i}.mlp.b_in - No bias on MLP up_proj
blocks.{i}.mlp.b_gate - No bias on MLP gate_proj
blocks.{i}.mlp.b_out - No bias on MLP down_proj

__init__(cfg: Any) → None¶: Initialize the OLMo architecture adapter.

prepare_model(hf_model: Any) → None¶

Patch OLMo’s in-place clamp_ to avoid backward hook conflicts.

OLMo v1 uses query_states.clamp_() when config.clip_qkv is set. In-place ops on tensors that pass through register_full_backward_hook trigger PyTorch’s “view modified inplace” error. This patch disables the in-place clamp branch during attention forward passes.

Note: clip_qkv clamping is skipped in the patched forward. In practice clip_qkv values (typically 100+) rarely activate. If exact clamping is needed, add out-of-place clamp hooks on hook_q/hook_k/hook_v.

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶

Set up rotary embedding references for OLMo component testing.

OLMo uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.

Parameters:

hf_model – The HuggingFace OLMo model instance
bridge_model – The TransformerBridge model (if available)