TransformerBridge Compatibility Mode¶
TransformerBridge.boot_transformers(...) returns a bridge whose default numerics match HuggingFace — raw weights, no folding, no centering. Calling bridge.enable_compatibility_mode() afterwards puts the bridge into HookedTransformer-equivalent numerics — weights folded, centered, and the legacy hook aliases registered.
Most research code that was written against HookedTransformer.from_pretrained(...) assumes compatibility mode. Most new code that needs HF-faithful logits does not.
When to enable it¶
Use case |
Compatibility mode? |
Why |
|---|---|---|
Logit lens / direct logit attribution |
Yes |
These analyses reason in the post-fold-LN coordinate system; raw HF weights produce different (wrong) attributions. |
Residual-stream norm analysis |
Yes |
Centered weights give the residual a meaningful zero. |
Circuit analysis using HT-style hook names ( |
Yes |
Legacy aliases register only after compat mode. |
Logit parity against HuggingFace |
No |
Folding changes weights; logits will not match HF. |
Generation / inference vs HF baseline |
No |
Same reason. |
Verifying a new adapter’s forward pass |
No (initially) |
Use |
What each flag does¶
bridge.enable_compatibility_mode(
disable_warnings: bool = False,
no_processing: bool = False,
fold_ln: bool = True,
center_writing_weights: bool = True,
center_unembed: bool = True,
fold_value_biases: bool = True,
refactor_factored_attn_matrices: bool = False,
)
Flag |
Default |
Effect |
|---|---|---|
|
|
If |
|
|
Folds LayerNorm scale + bias into the subsequent linear weights so the LayerNorm modules become pure normalization. Changes weights; mathematically equivalent. |
|
|
Subtracts the mean from each “writing” weight ( |
|
|
Subtracts the mean from the unembedding matrix. Logits become mean-zero — affects logit-lens output but not argmax. |
|
|
Folds attention value biases into the output bias. Same numerics, fewer parameters. |
|
|
Refactors |
|
|
Suppresses warnings emitted by legacy component aliases when accessed. |
After processing, the bridge also:
Re-initializes the hook registry.
Calls
_setup_hook_compatibility()on every component (installs HT-style hook conversions like reshapinghook_zfrom[batch, seq, d_model]to[batch, seq, n_heads, d_head]).Registers HT-style hook aliases recursively across blocks.
compatibility_mode is then True on the bridge and on every component, so subsequent operations behave as if the bridge were loaded by HookedTransformer.from_pretrained().
Hook semantic parity¶
After enable_compatibility_mode(), these HT hook names fire on the pre-norm residual (matching HookedTransformer semantics):
blocks.{i}.attn.hook_q_input,hook_k_input,hook_v_inputblocks.{i}.hook_attn_inblocks.{i}.hook_mlp_in(gated oncfg.use_hook_mlp_in; toggle viabridge.set_use_hook_mlp_in(True))
Carve-outs (issue #1317):
Post-norm architectures (OLMo 2, BERT-style) read the post-attention residual instead, because the norm semantically lives elsewhere in the block.
MLA blocks (DeepSeek V2 / V3 / R1) do not expose the split-qkv aliases — MLA’s compressed K/V doesn’t have a clean split.
An adapter author for a new post-norm or MLA-style architecture must handle these carve-outs in setup_hook_compatibility. The Gemma1/Gemma2 adapters are exemplars of when not to override setup_hook_compatibility — GemmaTextScaledWordEmbedding already scales internally, so any added hook_conversion would double-scale embed.hook_out.
The four-quadrant test matrix¶
The integration conftest at tests/integration/model_bridge/conftest.py provides four bridge variants for every test model:
Variant |
|
|
Tests… |
|---|---|---|---|
|
off |
n/a |
HF-faithful numerics |
|
on |
|
HT-equivalent numerics |
|
on |
|
Hook aliases without weight processing — used to bisect numerical bugs |
(HT side) |
n/a |
n/a |
Reference HookedTransformer with/without weight processing |
New integration tests should use the variant that matches the property they’re testing. Tests of HF parity → gpt2_bridge. Tests of HT-API behaviour → gpt2_bridge_compat. Tests of hook semantics regardless of weights → gpt2_bridge_compat_no_processing.
Cost¶
enable_compatibility_mode() mutates the bridge’s weights in-place. It is:
One-shot: calling it twice re-runs the centering subtractions. Don’t.
Not reversible from within the bridge — re-boot for raw weights.
_setup_hook_compatibilityis idempotent; onlyprocess_weightsmutates weights.
See also¶
Creating Architecture Adapters in contributing.md — adapter contract and the four-place registration; adapter authors override
setup_hook_compatibilityonly when an architecture has the post-norm / MLA carve-outs mentioned above.Debugging Numerical Divergence — uses
no_processing=Trueas a key bisection tool.Migrating to TransformerLens 3 — when porting HT code, you almost always want
enable_compatibility_mode().