transformer_lens.HookedEncoder#
Hooked Encoder.
Contains a BERT style model. This is separate from transformer_lens.HookedTransformer
because it has a significantly different architecture to e.g. GPT style transformers.
- class transformer_lens.HookedEncoder.HookedEncoder(cfg, tokenizer=None, move_to_device=True, **kwargs)#
Bases:
HookedRootModule
This class implements a BERT-style encoder using the components in ./components.py, with HookPoints on every interesting activation. It inherits from HookedRootModule.
Limitations: - The model does not include dropouts, which may lead to inconsistent results from training or fine-tuning.
- Like HookedTransformer, it can have a pretrained Transformer’s weights loaded via .from_pretrained. There are a few features you might know from HookedTransformer which are not yet supported:
There is no preprocessing (e.g. LayerNorm folding) when loading a pretrained model
- property OV: FactoredMatrix#
Returns a FactoredMatrix object with the product of the O and V matrices for each layer and head.
- property QK: FactoredMatrix#
Returns a FactoredMatrix object with the product of the Q and K matrices for each layer and head. Useful for visualizing attention patterns.
- property W_E: Float[Tensor, 'd_vocab d_model']#
Convenience to get the embedding matrix
- property W_E_pos: Float[Tensor, 'd_vocab+n_ctx d_model']#
Concatenated W_E and W_pos. Used as a full (overcomplete) basis of the input space, useful for full QK and full OV circuits.
- property W_K: Float[Tensor, 'n_layers n_heads d_model d_head']#
Stacks the key weights across all layers
- property W_O: Float[Tensor, 'n_layers n_heads d_head d_model']#
Stacks the attn output weights across all layers
- property W_Q: Float[Tensor, 'n_layers n_heads d_model d_head']#
Stacks the query weights across all layers
- property W_U: Float[Tensor, 'd_model d_vocab']#
Convenience to get the unembedding matrix (ie the linear map from the final residual stream to the output logits)
- property W_V: Float[Tensor, 'n_layers n_heads d_model d_head']#
Stacks the value weights across all layers
- property W_in: Float[Tensor, 'n_layers d_model d_mlp']#
Stacks the MLP input weights across all layers
- property W_out: Float[Tensor, 'n_layers d_mlp d_model']#
Stacks the MLP output weights across all layers
- property W_pos: Float[Tensor, 'n_ctx d_model']#
Convenience function to get the positional embedding. Only works on models with absolute positional embeddings!
- all_head_labels() List[str] #
Returns a list of strings with the format “L{l}H{h}”, where l is the layer index and h is the head index.
- property b_K: Float[Tensor, 'n_layers n_heads d_head']#
Stacks the key biases across all layers
- property b_O: Float[Tensor, 'n_layers d_model']#
Stacks the attn output biases across all layers
- property b_Q: Float[Tensor, 'n_layers n_heads d_head']#
Stacks the query biases across all layers
- property b_U: Float[Tensor, 'd_vocab']#
Convenience to get the unembedding bias
- property b_V: Float[Tensor, 'n_layers n_heads d_head']#
Stacks the value biases across all layers
- property b_in: Float[Tensor, 'n_layers d_mlp']#
Stacks the MLP input biases across all layers
- property b_out: Float[Tensor, 'n_layers d_model']#
Stacks the MLP output biases across all layers
- cpu()#
Move all model parameters and buffers to the CPU.
Note
This method modifies the module in-place.
- Returns:
self
- Return type:
Module
- cuda()#
Move all model parameters and buffers to the GPU.
This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized.
Note
This method modifies the module in-place.
- Parameters:
device (int, optional) – if specified, all parameters will be copied to that device
- Returns:
self
- Return type:
Module
- encoder_output(tokens: Int[Tensor, 'batch pos'], token_type_ids: Optional[Int[Tensor, 'batch pos']] = None, one_zero_attention_mask: Optional[Int[Tensor, 'batch pos']] = None) Float[Tensor, 'batch pos d_vocab'] #
Processes input through the encoder layers and returns the resulting residual stream.
- Parameters:
input – Input tokens as integers with shape (batch, position)
token_type_ids – Optional binary ids indicating segment membership. Shape (batch_size, sequence_length). For example, with input “[CLS] Sentence A [SEP] Sentence B [SEP]”, token_type_ids would be [0, 0, …, 0, 1, …, 1, 1] where 0 marks tokens from sentence A and 1 marks tokens from sentence B.
one_zero_attention_mask – Optional binary mask of shape (batch_size, sequence_length) where 1 indicates tokens to attend to and 0 indicates tokens to ignore. Used primarily for handling padding in batched inputs.
- Returns:
Final residual stream tensor of shape (batch, position, d_model)
- Return type:
resid
- Raises:
AssertionError – If using string input without a tokenizer
- forward(input: Union[str, List[str], Int[Tensor, 'batch pos']], return_type: Union[Literal['logits'], Literal['predictions']], token_type_ids: Optional[Int[Tensor, 'batch pos']] = None, one_zero_attention_mask: Optional[Int[Tensor, 'batch pos']] = None) Union[Float[Tensor, 'batch pos d_vocab'], str, List[str]] #
- forward(input: Union[str, List[str], Int[Tensor, 'batch pos']], return_type: Literal[None], token_type_ids: Optional[Int[Tensor, 'batch pos']] = None, one_zero_attention_mask: Optional[Int[Tensor, 'batch pos']] = None) Optional[Union[Float[Tensor, 'batch pos d_vocab'], str, List[str]]]
Forward pass through the HookedEncoder. Performs Masked Language Modelling on the given input.
- Parameters:
input – The input to process. Can be one of: - str: A single text string - List[str]: A list of text strings - torch.Tensor: Input tokens as integers with shape (batch, position)
return_type – Optional[str]: The type of output to return. Can be one of: - None: Return nothing, don’t calculate logits - ‘logits’: Return logits tensor - ‘predictions’: Return human-readable predictions
token_type_ids – Optional[torch.Tensor]: Binary ids indicating whether a token belongs to sequence A or B. For example, for two sentences: “[CLS] Sentence A [SEP] Sentence B [SEP]”, token_type_ids would be [0, 0, …, 0, 1, …, 1, 1]. 0 represents tokens from Sentence A, 1 from Sentence B. If not provided, BERT assumes a single sequence input. This parameter gets inferred from the the tokenizer if input is a string or list of strings. Shape is (batch_size, sequence_length).
one_zero_attention_mask – Optional[torch.Tensor]: A binary mask which indicates which tokens should be attended to (1) and which should be ignored (0). Primarily used for padding variable-length sentences in a batch. For instance, in a batch with sentences of differing lengths, shorter sentences are padded with 0s on the right. If not provided, the model assumes all tokens should be attended to. This parameter gets inferred from the tokenizer if input is a string or list of strings. Shape is (batch_size, sequence_length).
- Returns:
- Depending on return_type:
None: Returns None if return_type is None
- torch.Tensor: Returns logits if return_type is ‘logits’ (or if return_type is not explicitly provided)
Shape is (batch_size, sequence_length, d_vocab)
- str or List[str]: Returns predicted words for masked tokens if return_type is ‘predictions’.
Returns a list of strings if input is a list of strings, otherwise a single string.
- Return type:
Optional[torch.Tensor]
- Raises:
AssertionError – If using string input without a tokenizer
- classmethod from_pretrained(model_name: str, checkpoint_index: Optional[int] = None, checkpoint_value: Optional[int] = None, hf_model=None, device: Optional[str] = None, tokenizer=None, move_to_device=True, dtype=torch.float32, **from_pretrained_kwargs) HookedEncoder #
Loads in the pretrained weights from huggingface. Currently supports loading weight from HuggingFace BertForMaskedLM. Unlike HookedTransformer, this does not yet do any preprocessing on the model.
- mps()#
- run_with_cache(*model_args, return_cache_object: Literal[True] = True, **kwargs) Tuple[Float[Tensor, 'batch pos d_vocab'], ActivationCache] #
- run_with_cache(*model_args, return_cache_object: Literal[False], **kwargs) Tuple[Float[Tensor, 'batch pos d_vocab'], Dict[str, Tensor]]
Wrapper around run_with_cache in HookedRootModule. If return_cache_object is True, this will return an ActivationCache object, with a bunch of useful HookedTransformer specific methods, otherwise it will return a dictionary of activations as in HookedRootModule. This function was copied directly from HookedTransformer.
- to(device_or_dtype: Union[device, str, dtype], print_details: bool = True)#
Move and/or cast the parameters and buffers.
This can be called as
- to(device=None, dtype=None, non_blocking=False)
- to(dtype, non_blocking=False)
- to(tensor, non_blocking=False)
- to(memory_format=torch.channels_last)
Its signature is similar to
torch.Tensor.to()
, but only accepts floating point or complexdtype
s. In addition, this method will only cast the floating point or complex parameters and buffers todtype
(if given). The integral parameters and buffers will be moveddevice
, if that is given, but with dtypes unchanged. Whennon_blocking
is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.See below for examples.
Note
This method modifies the module in-place.
- Parameters:
device (
torch.device
) – the desired device of the parameters and buffers in this moduledtype (
torch.dtype
) – the desired floating point or complex dtype of the parameters and buffers in this moduletensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (
torch.memory_format
) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)
- Returns:
self
- Return type:
Module
Examples:
>>> # xdoctest: +IGNORE_WANT("non-deterministic") >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
- to_tokens(input: Union[str, List[str]], move_to_device: bool = True, truncate: bool = True) Tuple[Int[Tensor, 'batch pos'], Int[Tensor, 'batch pos'], Int[Tensor, 'batch pos']] #
Converts a string to a tensor of tokens. Taken mostly from the HookedTransformer implementation, but does not support default padding sides or prepend_bos. :param input: The input to tokenize. :type input: Union[str, List[str]] :param move_to_device: Whether to move the output tensor of tokens to the device the model lives on. Defaults to True :type move_to_device: bool :param truncate: If the output tokens are too long, whether to truncate the output :type truncate: bool :param tokens to the model’s max context window. Does nothing for shorter inputs. Defaults to: :param True.: