transformer_lens.model_bridge.supported_architectures.olmo module¶
OLMo architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.olmo.OlmoArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for OLMo (v1) models.
OLMo v1 uses a pre-norm architecture with a custom non-learnable LayerNorm (fixed weight=1, bias=0), rotary position embeddings (RoPE), and gated MLP (SwiGLU). Key differences from later OLMo variants:
Pre-norm: LayerNorm is applied BEFORE attention and BEFORE MLP.
Non-learnable LayerNorm: Weight and bias are not trainable parameters. Delegating to HF’s native forward via NormalizationBridge handles this correctly.
No Q/K normalization in attention.
Optional QKV clipping (handled by HF’s native attention forward).
Optional Parameters (may not exist in state_dict):¶
blocks.{i}.attn.b_Q - No bias on query projection
blocks.{i}.attn.b_K - No bias on key projection
blocks.{i}.attn.b_V - No bias on value projection
blocks.{i}.attn.b_O - No bias on output projection
blocks.{i}.mlp.b_in - No bias on MLP up_proj
blocks.{i}.mlp.b_gate - No bias on MLP gate_proj
blocks.{i}.mlp.b_out - No bias on MLP down_proj
- __init__(cfg: Any) None¶
Initialize the OLMo architecture adapter.
- prepare_model(hf_model: Any) None¶
Patch OLMo’s in-place clamp_ to avoid backward hook conflicts.
OLMo v1 uses query_states.clamp_() when config.clip_qkv is set. In-place ops on tensors that pass through register_full_backward_hook trigger PyTorch’s “view modified inplace” error. This patch disables the in-place clamp branch during attention forward passes.
Note: clip_qkv clamping is skipped in the patched forward. In practice clip_qkv values (typically 100+) rarely activate. If exact clamping is needed, add out-of-place clamp hooks on hook_q/hook_k/hook_v.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for OLMo component testing.
OLMo uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.
- Parameters:
hf_model – The HuggingFace OLMo model instance
bridge_model – The TransformerBridge model (if available)