Migrating to TransformerLens 3

TransformerLens 3 introduces TransformerBridge, a new way of loading and instrumenting models that replaces HookedTransformer.from_pretrained as the recommended path for new code. Existing HookedTransformer code continues to run through a compatibility layer, but adopting the bridge unlocks broader architecture support and puts you on the supported path going forward.

This page explains the differences and gives side-by-side migration recipes for the most common patterns.

Why the change?

HookedTransformer was a single unified implementation that every supported architecture had to be mapped into. That was beautiful in theory — interpretability code written once worked everywhere — but in practice it meant that adding a new architecture required reimplementing its forward pass inside TransformerLens, and any divergence from the HuggingFace version was a latent source of bugs.

TransformerBridge flips the arrangement. Instead of reimplementing models, it keeps the native HuggingFace implementation and wraps it behind a consistent interface through an architecture adapter. The adapter knows how the HF module graph maps onto a small set of generalized components (embedding, attention, MLP, normalization, blocks) and registers uniform hook points over them. The result is the same familiar TransformerLens experience — hooks, caches, patching — but applied to the real HF model, and extended to 50+ architectures out of the box.

Loading a model

The loading API changes shape, but the mental model is the same: give it a HuggingFace model id, get back an object with the TransformerLens surface.

# Before — TransformerLens 2.x
from transformer_lens import HookedTransformer

model = HookedTransformer.from_pretrained("gpt2", device="cpu")

# After — TransformerLens 3.x
from transformer_lens.model_bridge import TransformerBridge

bridge = TransformerBridge.boot_transformers("gpt2", device="cpu")

Parameters that carried over

device, dtype, and tokenizer all work the same way.

Parameters that moved

Weight-processing flags (fold_ln, center_writing_weights, center_unembed, fold_value_biases, refactor_factored_attn_matrices) no longer live on the load call. They moved to enable_compatibility_mode() — see the next section.

Parameters that were removed

n_devices, move_to_device, and first_n_layers are not part of boot_transformers. If you relied on any of these, file an issue describing your use case — the right pattern for multi-GPU loads under the bridge is still being worked out.

Parameters that are new

  • load_weights: bool = True — set to False to construct the bridge with just the config (useful for shape-checking without paying the weight-load cost).

  • trust_remote_code: bool = False — pass through to HuggingFace for models that ship custom modeling code.

  • hf_config_overrides: dict | None = None — override specific fields of the HF config before the model is constructed.

  • n_ctx: int | None = None — override the model’s context length. The bridge writes to whichever HF config field this architecture uses (n_positions / max_position_embeddings / etc.) so callers don’t need to know the field name. Warns if larger than the model’s default.

  • hf_model / model_class — advanced: pass in a pre-loaded HF model or a specific model class.

Weight processing is now opt-in

This is the biggest behavioral change. HookedTransformer.from_pretrained applied fold_ln, center_writing_weights, and center_unembed by default. The bridge does not apply any of these on load — the raw HF weights are preserved.

If your existing code depends on folded/centered weights (e.g. for direct logit attribution, or any analysis that reasons about activations in the post-processed coordinate system), call enable_compatibility_mode after booting:

# Before
model = HookedTransformer.from_pretrained("gpt2")  # fold_ln=True, center_*=True by default

# After
bridge = TransformerBridge.boot_transformers("gpt2")
bridge.enable_compatibility_mode()  # applies fold_ln, center_writing_weights, center_unembed, fold_value_biases

enable_compatibility_mode defaults to the same processing HookedTransformer used to do. You can opt out of individual steps, or disable all processing with no_processing=True:

bridge.enable_compatibility_mode(
    fold_ln=True,
    center_writing_weights=True,
    center_unembed=True,
    fold_value_biases=True,
    refactor_factored_attn_matrices=False,  # same default as before
)

If you want no processing at all — the bridge’s native default — you can skip enable_compatibility_mode entirely, or call it with no_processing=True if you still want the hook/component compatibility layer without the weight transforms.

Will my numbers match HookedTransformer?

Computing

Without enable_compatibility_mode

With it

Generated text, CE loss, argmax / top-k

Identical

Identical

Raw logits

Differ by per-row constant

Match

Logit lens, direct logit attribution

Differ

Match

KL divergence vs another model

Differ

Match

Residual-stream norms, cached hook_resid_*

Differ (grows with depth)

Match

Bottom-half analyses → call enable_compatibility_mode() after booting.

Dependency changes in 3.0

TransformerLens 3.0 raises its minimum supported transformers to 5.4.0 (previously 4.56). This is enforced automatically, fresh installs and pip install -U transformer_lens will pull in a compatible release with no action on your part.

If your code calls transformers directly alongside TransformerLens (e.g. manual AutoModel.from_pretrained calls in notebooks, or a downstream library that imports both), the v4 → v5 jump may surface breaking changes outside TransformerLens’s surface area. See HuggingFace’s Transformers v5 release notes for what changed there.

A few v5-driven internal adjustments worth knowing about:

  • Gemma embedding scaling. Transformers v5 changed how Gemma applies embedding scaling; enable_compatibility_mode() compensates so legacy HookedTransformer numerics are preserved.

  • MPT block unpack arity. MptBlock returns a 2-tuple on v5 vs a 3-tuple on v4; the bridge adapts.

  • Qwen3.5. Requires a v5 release exposing Qwen3_5ForCausalLM; see Special Cases.

Hook names

The canonical hook names on the bridge use a uniform hook_in / hook_out convention. The old TransformerLens names are preserved through an alias layer, so existing code keeps working without changes:

# Both of these return the same tensor on a bridge
cache["blocks.0.hook_resid_pre"]  # legacy alias — still works
cache["blocks.0.hook_in"]          # canonical name — preferred for new code

For the full mapping of legacy → canonical names and the expected tensor shape at each hook point, see the Model Structure page.

Hook semantic notes

Two semantic differences inside enable_compatibility_mode() worth knowing if you are porting activation-patching, DLA, or attribution-patching code:

  • blocks.{i}.hook_mlp_in fires pre-ln2 (matching legacy HookedTransformer). Use bridge.set_use_hook_mlp_in(True) to enable it — setting cfg.use_hook_mlp_in = True directly is honored when blocks share the bridge’s cfg, but the setter is the supported entry point. The pre-ln2 placement means cached values from one run can be patched into another and re-flow through ln2 mlp consistently across the bridge and HookedTransformer.

  • hook_q_input / hook_k_input / hook_v_input / hook_attn_in also fire pre-ln1 in compat mode. On the per-head LN application that follows, the bridge routes through the raw HF norm rather than the NormalizationBridge wrapper, so ln1’s sub-hooks (hook_in, hook_normalized, hook_scale) do not fire once per head the way legacy LayerNormPre would. Q/K/V projections downstream still match legacy numerically; only the intermediate LN sub-hook firing is suppressed.

Post-norm architectures (OLMo 2, BERT-style encoders) and MLA blocks (DeepSeek V2/V3/R1) do not participate in the pre-ln1 capture — MLABlockBridge does not expose those aliases, and post-norm models would read the post-attention residual instead of the block input.

Additionally, HookedRootModule has been moved to its own module. Prefer from transformer_lens import HookedRootModule. The legacy from transformer_lens.hook_points import HookedRootModule still works in 3.x, but emits a DeprecationWarning. This import path will be removed in 4.0.

APIs that are unchanged

These work identically on TransformerBridge and need no migration:

  • to_tokens, to_string

  • generate

  • run_with_hooks, run_with_cache — including batched-list inputs (parity fixed in 3.x)

  • __call__ / forward — accepts both 1D [seq] and 2D [batch, seq] token tensors

  • cfg.* — the bridge exposes a .cfg with the same fields (n_layers, n_heads, d_model, d_vocab, n_ctx, …)

  • W_Q, W_K, W_V, W_O, b_Q, b_K, b_V, b_O — attention weights are exposed with the same [n_heads, d_model, d_head] shape conventions

If your code only touches these APIs, the migration is genuinely just the loading call and (optionally) enable_compatibility_mode.

BERT Next Sentence Prediction

BertNextSentencePrediction is not ported to TransformerBridge. Keep using HookedEncoder + BertNextSentencePrediction for NSP workflows. The bridge’s BERT adapter does load NSP HuggingFace checkpoints (it rewires the unembed to cls.seq_relationship), but the high-level NSP API – sentence-pair tokenization, [CLS] pooling, “sequential”/”not sequential” decoding — is not exposed. If this is feature is something you’d like added to TransformerBridge, please file an issue.

New in 3.x: streaming generation

Both HookedTransformer and TransformerBridge now expose generate_stream, which yields tokens progressively instead of returning the full completion at once:

for chunk in bridge.generate_stream("The quick brown fox", max_new_tokens=50):
    print(chunk, end="", flush=True)

Same sampling kwargs as generate (temperature, top_k, top_p, do_sample, etc.).

Model name aliases are deprecated

HookedTransformer.from_pretrained accepted a lot of short aliases ("llama-7b-hf", "gpt-neo-125M", etc.) that mapped to specific HuggingFace paths. The bridge accepts the official HuggingFace names directly, and emits a deprecation warning when you pass a legacy alias. The aliases will be removed in the next major version.

# Legacy (deprecated, still works with a warning)
TransformerBridge.boot_transformers("gpt2")

# Preferred
TransformerBridge.boot_transformers("openai-community/gpt2")

Check the TransformerBridge Models page for the canonical model ids.

Full before-and-after example

A typical HookedTransformer notebook setup:

from transformer_lens import HookedTransformer

model = HookedTransformer.from_pretrained(
    "gpt2",
    device="cuda",
    dtype=torch.float32,
)

logits, cache = model.run_with_cache("The quick brown fox")
resid_pre = cache["blocks.0.hook_resid_pre"]
pattern = cache["blocks.0.attn.hook_pattern"]

The bridge equivalent:

from transformer_lens.model_bridge import TransformerBridge

bridge = TransformerBridge.boot_transformers(
    "openai-community/gpt2",
    device="cuda",
    dtype=torch.float32,
)
bridge.enable_compatibility_mode()  # match HookedTransformer's default weight processing

logits, cache = bridge.run_with_cache("The quick brown fox")
resid_pre = cache["blocks.0.hook_in"]           # or "blocks.0.hook_resid_pre" via alias
pattern = cache["blocks.0.attn.hook_pattern"]

The cache, hook, and config APIs are the same. The only lines that had to change are the import, the load call, and — if you want the old weight-processing behavior — one extra call to enable_compatibility_mode.

When to stay on HookedTransformer

If your code runs unchanged on TransformerLens 3 via the compatibility layer and you don’t need architectures beyond what HookedTransformer already supported, there is no hard deadline to migrate. But new architectures, weight-processing controls, and hook refinements are landing on the bridge side — new work should target the bridge, and migrating existing projects is the long-term supported path.