transformer_lens.model_bridge.bridge module¶

Bridge module for connecting different model architectures.

This module provides the bridge components that wrap remote model components and provide a consistent interface for accessing their weights and performing operations.

class transformer_lens.model_bridge.bridge.TransformerBridge(model: Module, adapter: ArchitectureAdapter, tokenizer: Any)¶

Bases: HookIntrospectionMixin, Module

Bridge between HuggingFace and TransformerLens models.

This class provides a standardized interface to access components of a transformer model, regardless of the underlying architecture. It uses an architecture adapter to map between the TransformerLens and HuggingFace model structures.

Tokenization notes¶

to_tokens(), to_str_tokens(), get_token_position(), forward() (string input), and generate() accept prepend_bos to control BOS prepending. Resolution: explicit arg → cfg.default_prepend_bos (defaults True, even for non-BOS-trained models — attention heads tend to use position 0 as a resting state). Pass ``prepend_bos=False`` when tokenizing a fragment of a larger prompt — off-by-one position errors usually trace back here.

Reconciliation with cfg.tokenizer_prepends_bos (tokenizers that add BOS automatically) is handled internally — pass the value you want; the bridge adds or strips manually as needed. When cfg.tokenizer_appends_eos=True (OLMo, Apertus, etc.), to_tokens() also strips trailing EOS tokens so the model receives a continuation rather than a terminated sequence; this path is bridge-specific.

BPE/SentencePiece tokenizers treat "hello", " hello", and "Hello" as distinct tokens. Concatenated prompts may not tokenize as the sum of parts — inspect with to_str_tokens() when in doubt.

BOS token and chat templates¶

model.tokenizer is configured with add_bos_token=True and is not the stock HuggingFace tokenizer. Direct .encode() calls will prepend BOS automatically.

When passing pre-applied chat-template text (i.e., the output of tokenizer.apply_chat_template(..., tokenize=False)), pass prepend_bos=False to to_tokens() to avoid a double BOS:

# Correct pattern for chat templates:
text = model.tokenizer.apply_chat_template(messages, tokenize=False)
tokens = model.to_tokens(text, prepend_bos=False)

The chat template already embeds the model’s expected BOS token in the rendered text; letting to_tokens() add another would produce a malformed sequence like [BOS, BOS, ...].

To inspect what tokens will actually be fed to the model during generation, use to_tokens() directly or pass return_input_tokens=True to generate().

property OV¶: OV circuit. On hybrids, returns attn layers only (with warning). See OV_for_attn_layers().

OV_for_attn_layers() → Tuple[List[int], FactoredMatrix]¶: OV circuit for attention layers only. Returns (layer_indices, FactoredMatrix).

property QK¶: QK circuit. On hybrids, returns attn layers only (with warning). See QK_for_attn_layers().

QK_for_attn_layers() → Tuple[List[int], FactoredMatrix]¶: QK circuit for attention layers only. Returns (layer_indices, FactoredMatrix).

property W_E: Tensor¶: Token embedding matrix (d_vocab, d_model).

property W_K: Tensor¶: Stack the key weights across all layers.

property W_O: Tensor¶: Stack the attn output weights across all layers.

property W_Q: Tensor¶: Stack the query weights across all layers.

property W_U: Tensor¶: Unembedding matrix (d_model, d_vocab). Maps residual stream to logits.

property W_V: Tensor¶: Stack the value weights across all layers.

property W_gate: Tensor | None¶: Stack the MLP gate weights across all layers (gated MLPs only).

property W_in: Tensor¶: Stack the MLP input weights across all layers.

property W_out: Tensor¶: Stack the MLP output weights across all layers.

__init__(model: Module, adapter: ArchitectureAdapter, tokenizer: Any)¶

Initialize the bridge.

Parameters:

model – The model to bridge (must be a PyTorch nn.Module or PreTrainedModel)
adapter – The architecture adapter to use
tokenizer – The tokenizer to use (required)

accumulated_bias(layer: int, mlp_input: bool = False, include_mlp_biases: bool = True) → Tensor¶

Sum of variant + MLP output biases through the residual stream up to layer.

Includes all layer types (attn, SSM, linear-attn). Set mlp_input=True to include the variant bias of the target layer itself.

add_hook(name: str | Callable[[str], bool], hook_fn, dir='fwd', is_permanent=False)¶

Add a hook to a specific component or to all components matching a filter.

Parameters:

name – Either a string hook point name (e.g. “blocks.0.attn.hook_q”) or a callable filter (str) -> bool that is applied to every hook point name; the hook is added to each point where the filter returns True.
hook_fn – The hook function (activation, hook) -> activation | None.
dir – Hook direction, "fwd" or "bwd".
is_permanent – If True the hook survives reset_hooks() calls.

add_perma_hook(name: str | Callable[[str], bool], hook_fn, dir='fwd') → None¶

Add a permanent hook that survives reset_hooks() calls.

Convenience wrapper for add_hook(..., is_permanent=True). To remove, call reset_hooks(including_permanent=True) or remove from the underlying HookPoint directly.

all_composition_scores(mode: str) → CompositionScores¶

Composition scores for all attention head pairs. Returns CompositionScores.

See https://transformer-circuits.pub/2021/framework/index.html On hybrid models, only attention layers are included; layer_indices maps tensor position i to original layer number.

property all_head_labels: list[str]¶: Human-readable labels for all attention heads, e.g. [‘L0H0’, ‘L0H1’, …].

property attn_head_labels: list[str]¶: Head labels for attention layers only — matches all_composition_scores() dims.

property b_K: Tensor¶: Stack the key biases across all layers.

property b_O: Tensor¶: Stack the attn output biases across all layers.

property b_Q: Tensor¶: Stack the query biases across all layers.

property b_U: Tensor¶: Unembedding bias (d_vocab).

property b_V: Tensor¶: Stack the value biases across all layers.

property b_in: Tensor¶: Stack the MLP input biases across all layers.

property b_out: Tensor¶: Stack the MLP output biases across all layers.

block_hooks(layer_idx: int) → List[str]¶: Sorted hook names available on block layer_idx (block-relative paths).

block_submodules(layer_idx: int) → List[str]¶: Return bridged submodule names on block layer_idx.

blocks_with(submodule: str) → List[Tuple[int, GeneralizedComponent]]¶

Return (index, block) pairs for blocks with the named bridged submodule.

Checks _modules (not hasattr) so HF-internal attrs don’t match. Use instead of assuming blocks[0] is representative on hybrid models.

Build a bridge around a small, randomly-initialized TL-native model.

No HuggingFace Hub call, no transformers import. config.init_mode and config.seed control reproducibility.

Boot a model from HuggingFace.

Parameters:

model_name – The name of the model to load.
hf_config_overrides – Optional overrides applied to the HuggingFace config before model load.
device – The device to use. If None, will be determined automatically. Mutually exclusive with device_map.
dtype – The dtype to use for the model.
tokenizer – Optional pre-initialized tokenizer to use; if not provided one will be created.
load_weights – If False, load model without weights (on meta device) for config inspection only.
model_class – Optional HuggingFace model class to use instead of the default auto-detected class. When the class name matches a key in SUPPORTED_ARCHITECTURES, the corresponding adapter is selected automatically (e.g., BertForNextSentencePrediction).
hf_model – Optional pre-loaded HuggingFace model to use instead of loading one. Useful for models loaded with custom configurations (e.g., quantization via BitsAndBytesConfig). When provided, load_weights is ignored.
device_map – HuggingFace-style device map ("auto", "balanced", dict, etc.) for dispatched inference. Explicit maps may include CPU targets; disk / meta offload targets are still rejected because Bridge component wrappers need additional offload-hook routing work. Mutually exclusive with device.
n_devices – Convenience: split the model across this many CUDA devices (translated to a max_memory dict internally). Requires CUDA with at least this many visible devices.
max_memory – Optional per-device memory budget for HF’s dispatcher.
n_ctx – Optional context length override. The bridge normally uses the model’s documented max context from the HF config. Setting this writes to whichever HF field the model uses (n_positions / max_position_embeddings / etc.), so callers don’t need to know the field name. If larger than the model’s default, a warning is emitted — quality may degrade past the trained length for rotary models.
revision – Optional HF revision string (branch, tag, or commit). Forwarded to AutoConfig.from_pretrained and AutoModelForCausalLM.from_pretrained. Mutually exclusive with checkpoint_index and checkpoint_value.
checkpoint_index – Index into the available training checkpoints for the model family. Convenience over revision for checkpointed models like EleutherAI/pythia* and stanford-crfm/*. Resolved to a revision string via the known per-family naming conventions (step{value} for Pythia, checkpoint-{value} for stanford-crfm).
checkpoint_value – Training step or token count of the desired checkpoint. Alternative to checkpoint_index; must be one of the labels returned by get_checkpoint_labels.

Returns:

The bridge to the loaded model.

static check_model_support(model_id: str) → dict¶

Check if a model is supported and get detailed support info.

This function provides detailed information about a model’s compatibility with TransformerLens, including architecture type and verification status.

Parameters:

model_id – The HuggingFace model ID to check (e.g., “gpt2”)

Returns:

is_supported: bool - Whether the model is supported
architecture_id: str | None - The architecture type if supported
verified: bool - Whether the model has been verified to work
suggestion: str | None - Suggested alternative if not supported

Return type:

Dictionary with support information

Example

>>> from transformer_lens.model_bridge.sources.transformers import check_model_support  
>>> info = check_model_support("openai-community/gpt2")  
>>> info["is_supported"]  
True

clear_hook_registry() → None¶: Clear the hook registry and force re-initialization.

composition_layer_indices() → List[int]¶: Original layer indices for attention layers (maps composition score positions).

cpu() → TransformerBridge¶

Move model to CPU.

Returns:: Self for chaining

cuda(device: int | device | None = None) → TransformerBridge¶

Move model to CUDA.

Parameters:: device – CUDA device
Returns:: Self for chaining

enable_compatibility_mode(disable_warnings: bool = False, no_processing: bool = False, fold_ln: bool = True, center_writing_weights: bool = True, center_unembed: bool = True, fold_value_biases: bool = True, refactor_factored_attn_matrices: bool = False) → None¶

Apply HookedTransformer-equivalent weight processing and legacy hook compatibility.

Defaults match HookedTransformer’s load-time processing (fold_ln + weight centering) — required for analyses that reason in HookedTransformer’s post-processed coordinate system: logit lens, direct logit attribution, residual-stream norms. Also enables legacy hook/component name aliases.

Hook semantic parity (issue #1317): hook_q_input, hook_k_input, hook_v_input, hook_attn_in, and hook_mlp_in fire on the pre-norm residual. Carve-outs: post-norm architectures (OLMo 2, BERT-style) read the post-attention residual instead, and MLA blocks (DeepSeek V2/V3/R1) do not expose the split-qkv aliases. hook_mlp_in is gated on cfg.use_hook_mlp_in; toggle it via set_use_hook_mlp_in().

Parameters:

disable_warnings – Whether to disable warnings about legacy components/hooks
no_processing – Whether to disable ALL pre-processing steps of the model. If True, overrides fold_ln, center_writing_weights, and center_unembed to False.
fold_ln – Whether to fold layer norm weights into the subsequent linear layers. Default: True. Ignored if no_processing=True.
center_writing_weights – Whether to center the writing weights (W_out in attention and MLPs). Default: True. Ignored if no_processing=True.
center_unembed – Whether to center the unembedding matrix. Default: True. Ignored if no_processing=True.
fold_value_biases – Whether to fold value biases into output bias. Default: True. Ignored if no_processing=True.
refactor_factored_attn_matrices – Whether to refactor factored attention matrices. Default: False. Ignored if no_processing=True.

Forward pass through the model.

Parameters:

input – Input to the model
return_type – Type of output to return (‘logits’, ‘loss’, ‘both’, ‘predictions’, None)
loss_per_token – Whether to return loss per token
prepend_bos – Whether to prepend BOS token
padding_side – Which side to pad on
start_at_layer – Not implemented in TransformerBridge. The bridge delegates to HuggingFace’s model.forward() which owns the layer iteration loop, making start_at_layer infeasible without monkey-patching HF internals (fragile across HF versions) or exception-based layer skipping (corrupts model state). Raises NotImplementedError if a non-None value is passed.
stop_at_layer – Layer to stop forward pass at
pixel_values – Optional image tensor for multimodal models (e.g., LLaVA, Gemma3). The tensor is passed directly to the underlying HuggingFace model. Only valid when cfg.is_multimodal is True.
input_values – Optional audio waveform tensor for audio models (e.g., HuBERT). The tensor is passed directly to the underlying HuggingFace model. Only valid when cfg.is_audio_model is True.
**kwargs – Additional arguments passed to model

Returns:

Model output based on return_type

Sample tokens from the model.

Sample tokens from the model until the model outputs eos_token or max_new_tokens is reached. This implementation is based on HookedTransformer.generate() to ensure consistent behavior.

Parameters:

input – Text string, list of strings, or tensor of tokens
max_new_tokens – Maximum number of tokens to generate
stop_at_eos – If True, stop generating tokens when the model outputs eos_token
eos_token_id – The token ID to use for end of sentence
do_sample – If True, sample from the model’s output distribution. Otherwise, use greedy search
top_k – Number of tokens to sample from. If None, sample from all tokens
top_p – Probability mass to sample from. If 1.0, sample from all tokens
temperature – Temperature for sampling. Higher values will make the model more random
freq_penalty – Frequency penalty for sampling - how much to penalise previous tokens
repetition_penalty – HuggingFace-style repetition penalty. Values > 1.0 discourage repetition by dividing positive logits and multiplying negative logits for previously seen tokens. Default 1.0 (no penalty).
use_past_kv_cache – If True, use KV caching for faster generation
prepend_bos – Whether to prepend a BOS token when tokenizing string inputs. Defaults to None (uses cfg.default_prepend_bos, typically True). Pass prepend_bos=False when the input is pre-formatted chat-template text that already contains the BOS token to avoid double-BOS. Ignored when input is already a token tensor.
padding_side – Which side to pad when tokenizing multiple strings of different lengths. For batched list inputs, left-padding is forced internally for correct generation behavior. Defaults to None (tokenizer default).
return_type – The type of output to return - ‘input’, ‘str’, or ‘tokens’
verbose – Not used in Bridge (kept for API compatibility)
output_logits – If True, return a ModelOutput with sequences and logits tuple
return_cache – If True, also return an ActivationCache for the full prompt + generated sequence, identical to run_with_cache(output), and the call returns an (output, cache) tuple. Implemented as one extra clean forward over the output, so the cache includes every hook point (attention patterns included). Supported only for single-sequence, decoder-only text generation; encoder-decoder, SSM, multimodal, batched, and inputs_embeds inputs raise NotImplementedError. The cache spans prompt + max_new_tokens and can be large, use names_filter to scope it and/or device to offload it.
return_input_tokens – If True, return an (output, input_tokens) tuple where input_tokens is the token tensor that was actually fed to the model (after BOS handling). Useful for debugging tokenization, especially when using chat templates where BOS handling can be subtle. Can be combined with return_cache to get (output, cache, input_tokens).
names_filter – Passed to run_with_cache when return_cache=True; restricts which activations are cached (str, list of str, or callable).
device – Passed through when return_cache=True to offload the cached tensors to this device (e.g. “cpu”) to save accelerator memory.
pixel_values – Optional image tensor for multimodal models. Only passed on the first generation step (the vision encoder processes the image once, then embeddings are part of the token sequence for subsequent steps).
stop_strings – Optional string or list of strings. A sequence stops once its generated text ends with one of these strings, using HuggingFace’s StopStringCriteria (partial-token-aware, end-anchored) matching. Requires a tokenizer (raises ValueError otherwise). Independent of stop_at_eos: either can stop a sequence.
stopping_criteria – Optional HuggingFace stopping criteria, a single transformers.StoppingCriteria, a list of them, or a StoppingCriteriaList. Each is called as criterion(input_ids, scores) after every step and ORed with the other stop signals, where input_ids is the running sequence and scores is this step’s logits ([batch, d_vocab]). Each criterion must return a per-row bool [batch] (or a scalar bool). stop_strings and stopping_criteria are supported only for standard decoder-only text generation. Encoder-decoder, inputs_embeds, and multimodal generation always raise NotImplementedError. Stateful/SSM models raise only when run with use_past_kv_cache=False (the default keeps them on the hooked loop). Each error names the supported alternative.

Returns:

Generated sequence as string, list of strings, or tensor depending on input type and return_type. If output_logits=True, returns a ModelOutput-like object with ‘sequences’ and ‘logits’ attributes. If return_cache=True, returns an (output, ActivationCache) tuple where output is the value that would otherwise be returned and the cache equals run_with_cache(output). If return_input_tokens=True, returns an (output, input_tokens) tuple. If both return_cache and return_input_tokens are True, returns (output, cache, input_tokens).

Example

out, cache = model.generate(prompt, max_new_tokens=20, return_cache=True) returns a normal ActivationCache over the full prompt + generated sequence (equivalent to run_with_cache(out)).

out, input_tokens = model.generate(prompt, return_input_tokens=True) returns the tokens that were fed to the model, useful for verifying BOS handling with chat templates.

generate_stream(input: str | List[str] | Tensor = '', max_new_tokens: int = 10, max_tokens_per_yield: int = 25, stop_at_eos: bool = True, eos_token_id: int | None = None, do_sample: bool = True, top_k: int | None = None, top_p: float | None = None, temperature: float = 1.0, freq_penalty: float = 0.0, repetition_penalty: float = 1.0, use_past_kv_cache: bool = True, prepend_bos: bool | None = None, padding_side: str | None = None, return_type: str | None = 'input', verbose: bool = True, stop_strings: str | List[str] | None = None, stopping_criteria: Any | None = None) → Generator[Tensor | str, None, None]¶

Stream tokens from the model as they are generated.

Yields batches of tokens progressively during generation rather than waiting for the entire sequence. Uses the same core loop as generate().

Parameters:

input – Text string, list of strings, or tensor of tokens.
max_new_tokens – Maximum number of tokens to generate.
max_tokens_per_yield – Yield accumulated tokens every this many steps.
stop_at_eos – If True, stop when eos_token is produced.
eos_token_id – Token ID(s) for end of sentence. Defaults to tokenizer’s.
do_sample – If True, sample; otherwise greedy.
top_k – Top-k sampling. None means no filtering.
top_p – Nucleus sampling threshold.
temperature – Sampling temperature.
freq_penalty – Frequency penalty for previous tokens.
repetition_penalty – HF-style repetition penalty (>1.0 discourages repeats).
use_past_kv_cache – Use KV caching for faster generation.
prepend_bos – Whether to prepend a BOS token when tokenizing string inputs. Defaults to None (uses cfg.default_prepend_bos, typically True). Pass prepend_bos=False when the input is pre-formatted chat-template text that already contains the BOS token to avoid double-BOS. Ignored when input is already a token tensor.
padding_side – Which side to pad for batched list inputs. Left-padding is forced internally for batched generation.
return_type – ‘input’ (match input type), ‘str’, or ‘tokens’.
verbose – Show progress bar.
stop_strings – Optional string or list of strings. A sequence stops once its generated text ends with one of them (HF StopStringCriteria). Requires a tokenizer. See generate() for details.
stopping_criteria – Optional transformers StoppingCriteria, list, or StoppingCriteriaList, called as criterion(input_ids, scores) each step (scores is the step’s logits). See generate() for the full contract.

Yields:

Token tensors [batch, seq_len] or strings, accumulated up to max_tokens_per_yield tokens between yields. First yield includes the input tokens; subsequent yields contain only new tokens.

get_hook_point(hook_name: str) → HookPoint | None¶: Get a hook point by name from the bridge’s hook system.

get_params()¶

Access to model parameters in the format expected by SVDInterpreter.

For missing weights, returns zero tensors of appropriate shape instead of raising exceptions. This ensures compatibility across different model architectures.

Returns:: Dictionary of parameter tensors with TransformerLens naming convention
Return type:: dict
Raises:: ValueError – If configuration is inconsistent (e.g., cfg.n_layers != len(blocks))

get_token_position(single_token: str | int, input: str | Tensor, mode='first', prepend_bos: bool | None = None, padding_side: Literal['left', 'right'] | None = None)¶

Get the position of a single_token in a string or sequence of tokens.

Raises an error if the token is not present.

When input is a string it’s tokenized internally — see the class-level “Tokenization notes” for prepend_bos semantics. Off-by-one position errors usually mean prepend_bos is on when it shouldn’t be (or vice versa); pass prepend_bos=False when input is a fragment of a larger prompt.

Parameters:

single_token (Union[str, int]) – The token to search for. Can be a token index, or a string (but the string must correspond to a single token).
input (Union[str, torch.Tensor]) – The sequence to search in. Can be a string or a rank 1 tensor of tokens or a rank 2 tensor of tokens with a dummy batch dimension.
mode (str, optional) – If there are multiple matches, which match to return. Supports “first” or “last”. Defaults to “first”.
prepend_bos (bool, optional) – Overrides self.cfg.default_prepend_bos. Only applies when input is a string. Defaults to None (use the cfg setting).
padding_side (Union[Literal["left", "right"], None], optional) – Specifies which side to pad when tokenizing multiple strings of different lengths.

Generate text using the underlying HuggingFace model with full HF API support.

This method provides direct access to HuggingFace’s generation API, forwarding all generation parameters (including output_scores, output_logits, output_attentions, output_hidden_states) directly to the underlying HF model. Use this when you need full HuggingFace generation features not supported by the standard generate() method.

For standard generation compatible with HookedTransformer, use generate() instead.

Parameters:

input – Text string, list of strings, or tensor of tokens
max_new_tokens – Maximum number of tokens to generate
stop_at_eos – If True, stop generating tokens when the model outputs eos_token
eos_token_id – The token ID to use for end of sentence
do_sample – If True, sample from the model’s output distribution
top_k – Number of tokens to sample from
top_p – Probability mass to sample from
temperature – Temperature for sampling
use_past_kv_cache – If True, use KV caching for faster generation
return_type – The type of output to return - ‘input’, ‘str’, or ‘tokens’
**generation_kwargs – Additional HuggingFace generation parameters including: - output_scores: Return generation scores - output_logits: Return generation logits - output_attentions: Return attention weights - output_hidden_states: Return hidden states - return_dict_in_generate: Return ModelOutput object - And any other HF generation parameters

Returns:

Generated sequence as string, list of strings, tensor, or HF ModelOutput depending on input type, return_type, and generation_kwargs.

Example:

# Get full HF ModelOutput with logits and attentions
from transformer_lens import HookedTransformer
model = HookedTransformer.from_pretrained("tiny-stories-1M")
result = model.hf_generate(
    "Hello world",
    max_new_tokens=5,
    output_logits=True,
    output_attentions=True,
    return_dict_in_generate=True
)
print(result.sequences)  # Generated tokens
print(result.logits)  # Logits for each generation step
print(result.attentions)  # Attention weights

hook_aliases: Dict[str, str | List[str]] = {'hook_embed': ['embed_ln.hook_out', 'embed.hook_out'], 'hook_pos_embed': ['pos_embed.hook_out', 'rotary_emb.hook_out'], 'hook_unembed': 'unembed.hook_out'}¶

property hook_dict: dict[str, HookPoint]¶: Get all HookPoint objects in the model for compatibility with TransformerLens.

hooks(fwd_hooks=[], bwd_hooks=[], reset_hooks_end=True, clear_contexts=False)¶

Context manager for temporarily adding hooks.

Parameters:

fwd_hooks – List of (hook_name, hook_fn) tuples for forward hooks
bwd_hooks – List of (hook_name, hook_fn) tuples for backward hooks
reset_hooks_end – If True, removes hooks when context exits
clear_contexts – Unused (for compatibility with HookedTransformer)

Example

with model.hooks(fwd_hooks=[(“hook_embed”, my_hook)]):: output = model(“Hello world”)

layer_types() → List[str]¶: Per-block type labels, e.g. [“attn+mlp”, “ssm+mlp”, …]. Deterministic order.

static list_supported_models(architecture: str | None = None, verified_only: bool = False) → list[str]¶

List all models supported by TransformerLens.

This function provides convenient access to the model registry API for discovering which HuggingFace models can be loaded.

Parameters:

architecture – Filter by architecture ID (e.g., “GPT2LMHeadModel”). If None, returns all supported models.
verified_only – If True, only return models that have been verified to work with TransformerLens.

Returns:

List of model IDs (e.g., [“gpt2”, “gpt2-medium”, …])

Example

>>> from transformer_lens.model_bridge.sources.transformers import list_supported_models
>>> models = list_supported_models()
>>> gpt2_models = list_supported_models(architecture="GPT2LMHeadModel")

load_state_dict(state_dict, strict=True, assign=False)¶

Load state dict into the model, handling both clean keys and original keys with _original_component references.

Parameters:

state_dict – Dictionary containing a whole state of the module
strict – Whether to strictly enforce that the keys in state_dict match the keys returned by this module’s state_dict() function
assign – Whether to assign items in the state dictionary to their corresponding keys in the module instead of copying them

Returns:

NamedTuple with missing_keys and unexpected_keys fields

loss_fn(logits: Tensor, tokens: Tensor, attention_mask: Tensor | None = None, per_token: bool = False) → Tensor¶

Calculate cross-entropy loss.

Uses the same formula as HookedTransformer (log_softmax + gather) to ensure numerically identical results when logits match.

Parameters:

logits – Model logits
tokens – Target tokens
attention_mask – Optional attention mask for padding
per_token – Whether to return per-token loss

Returns:

Loss tensor

mps() → TransformerBridge¶

Move model to MPS.

Returns:: Self for chaining

property n_params_total: int¶

Total number of parameters in the model, including embeddings, biases, and layer norm weights.

Mirrors HookedTransformer.n_params_total. Use this when you want the actual parameter count for memory budgeting, comparison with HuggingFace’s model.num_parameters(), or alignment with reported model sizes in papers (e.g. the Pythia suite).

Returns:: sum(p.numel() for p in self.parameters())
Return type:: int

named_parameters(prefix: str = '', recurse: bool = True, remove_duplicate: bool = True) → Iterator[tuple[str, Parameter]]¶

Returns named parameters following standard PyTorch semantics.

This method delegates to the underlying HuggingFace model’s named_parameters(). For TransformerLens-style generator, use tl_named_parameters() instead.

Parameters:

prefix – Prefix to prepend to all parameter names
recurse – If True, yields parameters of this module and all submodules
remove_duplicate – If True, removes duplicate parameters

Returns:

Iterator of (name, parameter) tuples

property original_model: Module¶: Get the original model.

parameters(recurse: bool = True) → Iterator[Parameter]¶

Returns parameters following standard PyTorch semantics.

This method delegates to the underlying HuggingFace model’s parameters(). For TransformerLens-style parameter generator, use tl_parameters() instead.

Parameters:: recurse – If True, yields parameters of this module and all submodules
Returns:: Iterator of nn.Parameter objects

prepare_multimodal_inputs(text: str | List[str], images: Any | None = None) → Dict[str, Tensor]¶

Prepare multimodal inputs using the model’s processor.

Converts text and images into model-ready tensors (input_ids, pixel_values, attention_mask, etc.) using the HuggingFace processor loaded during boot().

Parameters:

text – Text prompt(s), typically containing image placeholder tokens (e.g., “<image>” for LLaVA).
images – PIL Image or list of PIL Images to process. Pass None for text-only inputs on a multimodal model.

Returns:

Dictionary with ‘input_ids’, ‘pixel_values’, ‘attention_mask’, etc. All tensors are moved to the model’s device.

Raises:

ValueError – If model is not multimodal or processor is not available.

process_weights(verbose: bool = False, fold_ln: bool = True, center_writing_weights: bool = True, center_unembed: bool = True, fold_value_biases: bool = True, refactor_factored_attn_matrices: bool = False) → None¶

Process weights directly using ProcessWeights and architecture adapter.

This method applies weight processing transformations to improve model interpretability without requiring a reference HookedTransformer model. Works with all architectures supported by TransformerBridge, including GPT-OSS and other new models.

Parameters:

verbose – If True, print detailed progress messages. Default: False
fold_ln – Fold LayerNorm weights/biases into subsequent layers. Default: True
center_writing_weights – Center weights that write to residual stream. Default: True
center_unembed – Center unembedding weights (translation invariant). Default: True
fold_value_biases – Fold value biases into output bias. Default: True
refactor_factored_attn_matrices – Experimental QK/OV factorization. Default: False

real_components: Dict[str, tuple]¶

reset_hooks(clear_contexts=True)¶: Remove all hooks from the model.

run_with_cache(input: str | List[str] | Tensor, return_cache_object: Literal[True] = True, remove_batch_dim: bool = False, **kwargs) → Tuple[Any, ActivationCache]¶

run_with_cache(input: str | List[str] | Tensor, return_cache_object: Literal[False], remove_batch_dim: bool = False, **kwargs) → Tuple[Any, Dict[str, Tensor]]

Run the model and cache all activations.

Args:
input: Input to the model return_cache_object: Whether to return ActivationCache object remove_batch_dim: Whether to remove batch dimension names_filter: Filter for which activations to cache (str, list of str, or callable) stop_at_layer: Layer to stop forward pass at (uses StopAtLayerException; cleans up KV cache on stop) device: Where to store cached activations (matches ActivationCache.to;

does not move the model). Defaults to per-layer storage.

**kwargs: Additional arguments

# type: ignore[name-defined]

Returns:: Tuple of (output, cache)

Run the model with specified forward and backward hooks.

Parameters:

input – Input to the model
fwd_hooks – Forward hooks to apply
bwd_hooks – Backward hooks to apply
reset_hooks_end – Whether to reset hooks at the end
clear_contexts – Whether to clear hook contexts
return_type – What to return (“logits”, “loss”, etc.)
names_filter – Filter for hook names (not used directly, for compatibility)
stop_at_layer – Layer to stop at (uses StopAtLayerException; cleans up KV cache on stop)
remove_batch_dim – Whether to remove batch dimension from hook inputs (only works for batch_size==1)
**kwargs – Additional arguments

Returns:

Model output

set_use_attn_in(use_attn_in: bool)¶

Toggle a single 4D residual copy feeding all three Q/K/V projections.

Mutually exclusive with use_split_qkv_input — set that flag off first if it’s on. When on, hook_attn_in fires at [batch, pos, n_heads, d_model], enabling coarse-grained interventions on the residual-stream copy shared across Q/K/V.

set_use_attn_result(use_attn_result: bool)¶

Toggle whether to explicitly calculate and expose the result for each attention head.

Useful for interpretability but can easily burn through GPU memory.

set_use_hook_mlp_in(use_hook_mlp_in: bool) → None¶

Toggle the pre-ln2 hook_mlp_in HookPoint, matching legacy semantics.

See HookedTransformer.set_use_hook_mlp_in().

set_use_split_qkv_input(use_split_qkv_input: bool)¶

Toggle independent residual copies for Q/K/V so each path can be patched alone.

Mutually exclusive with use_attn_in — set that flag off first if it’s on.

stack_params_for(submodule: str, attr_path: str, reshape_fn: Callable | None = None) → Tuple[List[int], Tensor]¶

Stack a parameter across matching blocks only. Returns (layer_indices, tensor).

Use for hybrid models where not all blocks have the submodule.

state_dict(destination=None, prefix='', keep_vars=False)¶

Get state dict with TransformerLens format keys.

Converts HuggingFace format keys to TransformerLens format and filters out _original_component references and nested HuggingFace components.

This returns a clean state dict with only bridge component paths converted to TL format, excluding nested HF components (like c_fc, c_proj, c_attn) that exist inside original_component modules.

Parameters:

destination – Optional dict to store state dict in
prefix – Optional prefix to add to all keys
keep_vars – Whether to keep variables as Variables instead of tensors

Returns:

Dict containing the state dict with TransformerLens format keys

tl_named_parameters() → Iterator[tuple[str, Tensor]]¶

Returns iterator of TransformerLens-style named parameters.

This provides the same parameters as tl_parameters() but as an iterator for consistency with PyTorch’s named_parameters() API pattern.

Returns:: Iterator of (name, tensor) tuples with TransformerLens naming conventions

Example

>>> bridge = TransformerBridge.boot_transformers("gpt2")
>>> for name, param in bridge.tl_named_parameters():
...     if "attn.W_Q" in name:
...         print(f"{name}: {param.shape}")  
blocks.0.attn.W_Q: torch.Size([12, 768, 64])
...

tl_parameters() → dict[str, Tensor]¶

Returns TransformerLens-style parameter dictionary.

Parameter names follow TransformerLens conventions (e.g., ‘blocks.0.attn.W_Q’) and may include processed weights (non-leaf tensors). This format is expected by SVDInterpreter among other analysis tools.

Returns:: Dictionary mapping TransformerLens parameter names to tensors

Example

>>> bridge = TransformerBridge.boot_transformers("gpt2")
>>> tl_params = bridge.tl_parameters()
>>> W_Q = tl_params["blocks.0.attn.W_Q"]  # Shape: [n_heads, d_model, d_head]

to(*args, **kwargs) → TransformerBridge¶

Move model to device and/or change dtype.

Parameters:

args – Positional arguments for nn.Module.to
kwargs – Keyword arguments for nn.Module.to
print_details – Whether to print details about device/dtype changes (default: True)

Returns:

Self for chaining

to_single_str_token(int_token: int) → str¶

Get the single token corresponding to an int in string form.

Parameters:: int_token – The token ID
Returns:: The token string

to_single_token(string: str) → int¶

Map a string that makes up a single token to the id for that token.

Parameters:: string – The string to convert
Returns:: Token ID
Raises:: AssertionError – If string is not a single token

Map text or tokens to a list of tokens as strings.

See the class-level “Tokenization notes” for full prepend_bos semantics. Pass ``prepend_bos=False`` whenever you’re tokenizing only part of a prompt. When input is already a tensor or array, prepend_bos and padding_side are ignored.

Parameters:

input – A string, list of strings, or tensor/array of token IDs.
prepend_bos – Overrides self.cfg.default_prepend_bos. Only applies when input is a string. Defaults to None (use the cfg setting).
padding_side – Which side to pad on. Only applies when input is a string.

Returns:

List of token strings.

to_string(tokens: List[int] | Tensor | ndarray) → str | List[str]¶

Convert tokens to string(s).

Parameters:: tokens – Tokens to convert
Returns:: Decoded string(s)

to_tokens(input: str | List[str], prepend_bos: bool | None = None, padding_side: str | None = None, move_to_device: bool = True, truncate: bool = True) → Tensor¶

Converts a string to a tensor of tokens.

See the class-level “Tokenization notes” for full prepend_bos semantics, the default_prepend_bos / tokenizer_prepends_bos interaction, and the whitespace- sensitivity gotcha. Pass ``prepend_bos=False`` whenever you’re tokenizing only part of a prompt.

Parameters:

input – The input to tokenize.
prepend_bos – Overrides self.cfg.default_prepend_bos. Defaults to None (use the cfg setting). Pass True or False to override locally.
padding_side – Which side to pad on when tokenizing multiple strings of different lengths. Defaults to the tokenizer’s padding_side.
move_to_device – Whether to move the result to cfg.device.
truncate – Whether to truncate inputs longer than cfg.n_ctx.

Returns:

Token tensor of shape [batch, pos].

tokens_to_residual_directions(tokens: str | int | Tensor) → Tensor¶

Map tokens to their unembedding vectors (residual stream directions).

Returns the columns of W_U corresponding to the given tokens — i.e. the directions in the residual stream that the model dots with to produce the logit for each token.

WARNING: If you use this without folding in LayerNorm (compatibility mode), the results will be misleading because LN weights change the unembed map.

Parameters:: tokens – A single token (str, int, or scalar tensor), a 1-D tensor of token IDs, or a 2-D batch of token IDs.
Returns:: Tensor of unembedding vectors with shape matching the input token shape plus a trailing d_model dimension.

training: bool¶

transformer_lens.model_bridge.bridge.build_alias_to_canonical_map(hook_dict, prefix='')¶

Build a mapping from alias hook names to their canonical names.

Parameters:

hook_dict – Dictionary mapping hook names to HookPoint objects
prefix – Prefix for nested keys

Returns:

Dictionary mapping alias names to canonical names

Example

If hook_dict contains: - “blocks.0.hook_q” -> HookPoint(name=”blocks.0.attn.q.hook_out”)

Returns: - {“blocks.0.hook_q”: “blocks.0.attn.q.hook_out”}