transformer_lens.model_bridge.generalized_components.block module¶
Block bridge component.
This module contains the bridge component for transformer blocks.
- class transformer_lens.model_bridge.generalized_components.block.BlockBridge(name: str, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, hook_alias_overrides: Dict[str, str] | None = None)¶
Bases:
GeneralizedComponentBridge component for transformer blocks.
This component provides standardized input/output hooks and monkey-patches HuggingFace blocks to insert hooks at positions matching HookedTransformer.
- __init__(name: str, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, hook_alias_overrides: Dict[str, str] | None = None)¶
Initialize the block bridge.
- Parameters:
name – The name of the component in the model
config – Optional configuration (unused for BlockBridge)
submodules – Dictionary of submodules to register
hook_alias_overrides – Optional dictionary to override default hook aliases. For example, {“hook_attn_out”: “ln1_post.hook_out”} will make hook_attn_out point to ln1_post.hook_out instead of the default attn.hook_out.
- forward(*args: Any, **kwargs: Any) Any¶
Forward pass through the block bridge.
- Parameters:
*args – Input arguments
**kwargs – Input keyword arguments
- Returns:
The output from the original component
- Raises:
StopAtLayerException – If stop_at_layer is set and this block should stop execution
- hook_aliases: Dict[str, str | List[str]] = {'hook_attn_in': 'attn.hook_attn_in', 'hook_attn_out': 'attn.hook_out', 'hook_k_input': 'attn.hook_k_input', 'hook_mlp_in': 'mlp.hook_in', 'hook_mlp_out': 'mlp.hook_out', 'hook_q_input': 'attn.hook_q_input', 'hook_resid_mid': 'ln2.hook_in', 'hook_resid_post': 'hook_out', 'hook_resid_pre': 'hook_in', 'hook_v_input': 'attn.hook_v_input'}¶
- is_list_item: bool = True¶
- real_components: Dict[str, tuple]¶
- training: bool¶
- class transformer_lens.model_bridge.generalized_components.block.MLABlockBridge(name: str, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, hook_alias_overrides: Dict[str, str] | None = None)¶
Bases:
BlockBridgeBlock wrapping Multi-Head Latent Attention (DeepSeek V2/V3/R1).
MLA has no standalone q/k/v projections — Q flows through compressed q_a_proj→q_a_layernorm→q_b_proj, and K/V share a joint kv_a_proj_with_mqa entry point. There is no single HookPoint that represents “input that becomes Q/K/V”, so the block-level
hook_q_input/hook_k_input/hook_v_inputaliases do not apply. Type-level distinction means a reader of the adapter seesMLABlockBridgeand knows those hooks are absent.
- class transformer_lens.model_bridge.generalized_components.block.ParallelBlockBridge(name: str, config: Any | None = None, submodules: Dict[str, GeneralizedComponent] | None = None, hook_alias_overrides: Dict[str, str] | None = None)¶
Bases:
BlockBridgeBlock where attn and MLP both read the pre-attention residual.
For GPT-J, NeoX, Pythia, Phi, Cohere, CodeGen, and some Falcon variants, output = resid_pre + attn_out + mlp_out — no distinct post-attention residual exists. Matches legacy HookedTransformer which omits hook_resid_mid when
cfg.parallel_attn_mlp=True. Type-level distinction means a reader of the adapter seesParallelBlockBridgeand knows the hook is absent.