TransformerBridge Model Structure¶
This page describes the structure exposed by TransformerBridge, the canonical hook names to use, and the expected tensor shapes at each hook point.
Overview¶
TransformerBridge wraps a Hugging Face model behind a consistent TransformerLens interface. It relies on:
An ArchitectureAdapter that understands the HF module graph and provides a mapping to bridge components
Generalized components (Embedding, Attention, MLP, Normalization, Block) exposing uniform hook points
A light aliasing layer for backwards compatibility with legacy TransformerLens hook names
Construct a bridge from a HF model id:
from transformer_lens.model_bridge import TransformerBridge
bridge = TransformerBridge.boot_transformers("gpt2", device="cpu")
You can then call the familiar APIs: to_tokens, to_string, generate, run_with_hooks, run_with_cache.
Top-Level Components¶
Typical decoder-only models expose these top-level components (names vary by architecture):
embed: token embeddingpos_embed(if applicable) or rotary embeddings inside attentionblocks: list-like container of transformer blocksln_final(if applicable): final normalizationunembed: output projection to vocabulary logits
Each blocks.{i} is a BlockBridge with subcomponents:
ln1: normalization before attentionattn: attention moduleln2: normalization before MLPmlp: MLP module
Canonical Hook Names¶
Use these canonical (non-aliased) names when adding hooks or reading from the cache.
Embedding¶
embed.hook_in: token ids (batch, pos)embed.hook_out: embeddings (batch, pos, d_model)Legacy alias:
hook_embed
pos_embed.hook_in/pos_embed.hook_out: same shapes as aboveLegacy alias:
hook_pos_embed
Residual stream¶
blocks.{i}.hook_in: residual stream into block (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_resid_pre
blocks.{i}.hook_out: residual stream out of block (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_resid_post
blocks.{i}.attn.hook_out: residual stream after attention (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_resid_mid
Attention¶
blocks.{i}.attn.hook_in: (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_attn_in
blocks.{i}.attn.hook_out: (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_attn_out
blocks.{i}.attn.hook_hidden_states: primary output for caching (batch, pos, d_model)Legacy alias:
blocks.{i}.attn.hook_result
blocks.{i}.attn.hook_attn_scores: raw attention scores before softmax (batch, n_heads, pos, pos)blocks.{i}.attn.hook_pattern: attention pattern after softmax and NaN handling (n_heads, pos, pos)Legacy alias:
blocks.{i}.attn.hook_attention_weights
When present, sub-projections:
blocks.{i}.attn.q/k/v/o.hook_in/.hook_out(commonly (batch, pos, d_model))Legacy aliases:
blocks.{i}.hook_q_input,blocks.{i}.hook_k_input,blocks.{i}.hook_v_input,blocks.{i}.hook_q,blocks.{i}.hook_k,blocks.{i}.hook_v
Individual Q/K/V Hooks¶
All attention bridges provide access to individual Q, K, V activations through HookPointWrapper properties:
blocks.{i}.attn.q.hook_in/blocks.{i}.attn.q.hook_out: Q projection hooks (batch, pos, n_heads, d_head)blocks.{i}.attn.k.hook_in/blocks.{i}.attn.k.hook_out: K projection hooks (batch, pos, n_heads, d_head)blocks.{i}.attn.v.hook_in/blocks.{i}.attn.v.hook_out: V projection hooks (batch, pos, n_heads, d_head)
Joint QKV Attention (GPT-2 style)¶
For models using fused QKV projections (like GPT-2), the JointQKVAttentionBridge provides additional hooks:
blocks.{i}.attn.qkv.hook_in: input to QKV projection (batch, pos, d_model)blocks.{i}.attn.qkv.hook_out: output from QKV projection (batch, pos, 3*d_model)blocks.{i}.attn.qkv.q_hook_in: input to Q projection (batch, pos, d_model)blocks.{i}.attn.qkv.q_hook_out: output from Q projection (batch, pos, n_heads, d_head)blocks.{i}.attn.qkv.k_hook_in: input to K projection (batch, pos, d_model)blocks.{i}.attn.qkv.k_hook_out: output from K projection (batch, pos, n_heads, d_head)blocks.{i}.attn.qkv.v_hook_in: input to V projection (batch, pos, d_model)blocks.{i}.attn.qkv.v_hook_out: output from V projection (batch, pos, n_heads, d_head)
MLP¶
blocks.{i}.mlp.hook_in: (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_mlp_in
blocks.{i}.mlp.hook_pre: (batch, pos, d_mlp)Legacy alias:
blocks.{i}.hook_mlp_in(viamlp.in.hook_out)
blocks.{i}.mlp.hook_out: (batch, pos, d_model)Legacy alias:
blocks.{i}.hook_mlp_out
Normalization¶
blocks.{i}.ln1.hook_in/.hook_out: (batch, pos, d_model)Legacy aliases for
.hook_out:blocks.{i}.ln1.hook_normalized,blocks.{i}.ln1.hook_scale
Similarly for
ln2Legacy aliases for
.hook_out:blocks.{i}.ln2.hook_normalized,blocks.{i}.ln2.hook_scale
Unembedding / Logits¶
unembed.hook_in: (batch, pos, d_model)unembed.hook_out: (batch, pos, d_vocab)
Shapes at a Glance¶
Residual stream and hidden states: (batch, pos, d_model)
Attention scores: (batch, n_heads, pos, pos)
Attention patterns: (n_heads, pos, pos) - after batch dimension removal
QKV projections: (batch, pos, n_heads, d_head)
MLP pre-activation: (batch, pos, d_mlp)
Embeddings: (batch, pos, d_model)
Unembedding logits: (batch, pos, d_vocab)
LayerNorm normalized / scale: (batch, pos, d_model)
These shapes are exercised in the multi-model shape test: tests/integration/test_hook_shape_compatibility.py.
Booting from Hugging Face¶
TransformerBridge.boot_transformers(model_id, ...):
Loads the HF config/model/tokenizer
Selects the appropriate ArchitectureAdapter
Maps HF config fields to TransformerLens config (e.g.,
d_model,n_heads,n_layers,d_mlp,d_vocab,n_ctx, …)Constructs the bridge and registers all hook points
Fused QKV Attention¶
Some architectures use a fused QKV projection (like GPT-2). The bridge’s JointQKVAttentionBridge provides access to individual Q, K, V activations through the QKVBridge submodule. This allows for:
Individual Q/K/V hooking: You can hook into
blocks.{i}.attn.qkv.q_hook_out,k_hook_out, orv_hook_outto modify individual attention headsAttention pattern creation: The bridge automatically creates attention patterns from the attention scores and applies them through
hook_patternCompatibility with legacy code: Legacy hook names like
blocks.{i}.hook_vare aliased to the appropriate QKV hooks
The canonical attention hooks (attn.hook_in/out, attn.hook_pattern, etc.) retain the shapes listed above, while the QKV-specific hooks provide access to the individual attention components.
Aliases and Backwards Compatibility¶
A minimal alias layer exists to ease migration from older TransformerLens names (e.g., blocks.{i}.hook_resid_pre → blocks.{i}.hook_in). New code should prefer the canonical names documented here.
Example: Caching and Inspecting Hooks¶
prompt = "Hello world"
logits, cache = bridge.run_with_cache(prompt)
# List some attention-related hooks on the first block
for k in cache.keys():
if k.startswith("blocks.0.attn"):
print(k, cache[k].shape)
For larger examples and a multi-model shape check, see tests/integration/test_hook_shape_compatibility.py.