transformer_lens.HookedEncoderDecoder#

Hooked EncoderDecoder

Contains a T5 style model. This is separate from transformer_lens.HookedTransformer because it has a significantly different architecture to e.g. GPT style transformers.

class transformer_lens.HookedEncoderDecoder.HookedEncoderDecoder(cfg: HookedTransformerConfig | Dict, tokenizer: PreTrainedTokenizerBase | None = None, move_to_device: bool = True, **kwargs: Any)#

Bases: HookedRootModule

This class implements a T5 encoder-decoder using the components in ./components.py, with HookPoints on every interesting activation. It inherits from HookedRootModule.

Limitations: - Also note that model does not include dropouts, which may lead to inconsistent results from training or fine-tuning.

Like HookedTransformer, it can have a pretrained Transformer’s weights loaded via .from_pretrained. There are a few features you might know from HookedTransformer which are not yet supported:

There is no preprocessing (e.g. LayerNorm folding) when loading a pretrained model
The model only accepts tokens as inputs, and not strings, or lists of strings

property OV: FactoredMatrix#: Returns a FactoredMatrix object with the product of the O and V matrices for each layer and head.

property QK: FactoredMatrix#: Returns a FactoredMatrix object with the product of the Q and K matrices for each layer and head. Useful for visualizing attention patterns.

property W_E: Float[Tensor, 'd_vocab d_model']#: Convenience to get the embedding matrix

property W_K: Float[Tensor, 'n_layers n_heads d_model d_head']#: Stacks the key weights across all layers

property W_O: Float[Tensor, 'n_layers n_heads d_head d_model']#: Stacks the attn output weights across all layers

property W_Q: Float[Tensor, 'n_layers n_heads d_model d_head']#: Stacks the query weights across all layers

property W_U: Float[Tensor, 'd_model d_vocab']#: Convenience to get the unembedding matrix (ie the linear map from the final residual stream to the output logits)

property W_V: Float[Tensor, 'n_layers n_heads d_model d_head']#: Stacks the value weights across all layers

property W_in: Float[Tensor, 'n_layers d_model d_mlp']#: Stacks the MLP input weights across all layers

property W_out: Float[Tensor, 'n_layers d_mlp d_model']#: Stacks the MLP output weights across all layers

property W_pos: None#: Convenience function to get the positional embedding. Only works on models with absolute positional embeddings!

all_head_labels() → List[str]#: Returns a list of strings with the format “L{l}H{h}”, where l is the layer index and h is the head index.

property b_K: Float[Tensor, 'n_layers n_heads d_head']#: Stacks the key biases across all layers

property b_O: Float[Tensor, 'n_layers d_model']#: Stacks the attn output biases across all layers

property b_Q: Float[Tensor, 'n_layers n_heads d_head']#: Stacks the query biases across all layers

property b_U: Float[Tensor, 'd_vocab']#: Convenience to get the unembedding bias

property b_V: Float[Tensor, 'n_layers n_heads d_head']#: Stacks the value biases across all layers

property b_in: Float[Tensor, 'n_layers d_mlp']#: Stacks the MLP input biases across all layers

property b_out: Float[Tensor, 'n_layers d_model']#: Stacks the MLP output biases across all layers

cpu() → T#

Move all model parameters and buffers to the CPU.

Note

This method modifies the module in-place.

Returns:: self
Return type:: Module

cuda(device: int | device | None = None) → T#

Move all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it should be called before constructing the optimizer if the module will live on GPU while being optimized.

Note

This method modifies the module in-place.

Parameters:: device (int, optional) – if specified, all parameters will be copied to that device
Returns:: self
Return type:: Module

forward(input: str | List[str] | Int[Tensor, 'batch pos'], decoder_input: Int[Tensor, 'batch decoder_pos'] | None = None, return_type: Literal['logits'] = 'logits', one_zero_attention_mask: Int[Tensor, 'batch pos'] | None = None) → Float[Tensor, 'batch pos d_vocab']#

Forward pass of the T5 model.

Parameters:

input – Input to be processed. Can be one of: - str: A single string input - List[str]: A batch of string inputs - Int[torch.Tensor, “batch pos”]: A batch of token IDs
decoder_input – Tensor of shape (batch, decoder_pos) containing the decoder input sequence. If None and input is of type str or List[str], starts with batch of beginning-of-sequence (BOS) tokens.
return_type – Specifies the model output type: - “logits”: Return logits tensor - None: Returns nothing
one_zero_attention_mask – A binary mask which indicates which tokens should be attended to (1) and which should be ignored (0). Primarily used for padding variable-length sentences in a batch. For instance, in a batch with sentences of differing lengths, shorter sentences are padded with 0s on the right. If not provided, the model assumes all tokens should be attended to. This parameter gets inferred from the tokenizer if input is a string or list of strings. Shape is (batch_size, sequence_length).

Returns:

If return_type=”logits”: Returns logits tensor of shape (batch, decoder_pos, vocab_size) If return_type=None: Returns None

Return type:

Optional[Float[torch.Tensor, “batch decoder_pos d_vocab”]]

classmethod from_pretrained(model_name: str, checkpoint_index: int | None = None, checkpoint_value: int | None = None, hf_model: Any | None = None, device: str | None = None, tokenizer: Any | None = None, move_to_device: bool = True, dtype: dtype | None = torch.float32, **from_pretrained_kwargs: Any) → T#: Loads in the pretrained weights from huggingface. Currently supports loading weight from HuggingFace BertForMaskedLM. Unlike HookedTransformer, this does not yet do any preprocessing on the model.

generate(input: str | Int[Tensor, 'batch pos'] = '', one_zero_attention_mask: Int[Tensor, 'batch pos'] | None = None, max_new_tokens: int = 10, stop_at_eos: bool = True, eos_token_id: int | List[int] | None = None, do_sample: bool = True, top_k: int | None = None, top_p: float | None = None, temperature: float = 1.0, freq_penalty: float = 0.0, return_type: str | None = 'input', verbose: bool = True) → Int[Tensor, 'batch new_tokens'] | str#

Sample tokens from the T5 encoder-decoder model.

Sample tokens from the model until the model outputs eos_token or max_new_tokens is reached. This function is primarily taken from HookedTransformer but adjusted for the HookedEncoderDecoder architecture. This function does not support key value caching and no default padding sides or prepend_bos.

To avoid fiddling with ragged tensors, if we input a batch of text and some sequences finish (by producing an EOT token), we keep running the model on the entire batch, but throw away the output for a finished sequence and just keep adding EOTs to pad.

This supports entering a single string, but not a list of strings - if the strings don’t tokenize to exactly the same length, this gets messy. If that functionality is needed, convert them to a batch of tokens and input that instead.

Parameters:

input (Union[str, Int[torch.Tensor, "batch pos"])]) – Either a batch of tokens ([batch, pos]) or a text string (this will be converted to a batch of tokens with batch size 1).
max_new_tokens (int) – Maximum number of tokens to generate.
stop_at_eos (bool) – If True, stop generating tokens when the model outputs eos_token.
eos_token_id (Optional[Union[int, Sequence]]) – The token ID to use for end of sentence. If None, use the tokenizer’s eos_token_id - required if using stop_at_eos. It’s also possible to provide a list of token IDs (not just the eos_token_id), in which case the generation will stop when any of them are output (useful e.g. for stable_lm).
do_sample (bool) – If True, sample from the model’s output distribution. Otherwise, use greedy search (take the max logit each time).
top_k (int) – Number of tokens to sample from. If None, sample from all tokens.
top_p (float) – Probability mass to sample from. If 1.0, sample from all tokens. If <1.0, we take the top tokens with cumulative probability >= top_p.
temperature (float) – Temperature for sampling. Higher values will make the model more random (limit of temp -> 0 is just taking the top token, limit of temp -> inf is sampling from a uniform distribution).
freq_penalty (float) – Frequency penalty for sampling - how much to penalise previous tokens. Higher values will make the model more random.
return_type (Optional[str]) – The type of the output to return - either a string (str), a tensor of tokens (tensor) or whatever the format of the input was (input).
verbose (bool) – If True, show tqdm progress bars for generation.

Returns:

[batch, new_tokens], generated sequence of new tokens: (by default returns same type as input).

Return type:

outputs (torch.Tensor)

mps() → T#

run_with_cache(*model_args: Any, return_cache_object: Literal[True] = True, **kwargs: Any) → Tuple[Float[Tensor, 'batch pos d_vocab'], ActivationCache]#
run_with_cache(*model_args: Any, return_cache_object: Literal[False] = False, **kwargs: Any) → Tuple[Float[Tensor, 'batch pos d_vocab'], Dict[str, Tensor]]: Wrapper around run_with_cache in HookedRootModule. If return_cache_object is True, this will return an ActivationCache object, with a bunch of useful HookedTransformer specific methods, otherwise it will return a dictionary of activations as in HookedRootModule. This function was copied directly from HookedTransformer.

to(*args: Any, **kwargs: Any) → T#

Move and/or cast the parameters and buffers.

This can be called as

to(device=None, dtype=None, non_blocking=False)

to(dtype, non_blocking=False)

to(tensor, non_blocking=False)

to(memory_format=torch.channels_last)

Its signature is similar to torch.Tensor.to(), but only accepts floating point or complex dtypes. In addition, this method will only cast the floating point or complex parameters and buffers to dtype (if given). The integral parameters and buffers will be moved device, if that is given, but with dtypes unchanged. When non_blocking is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters:

device (torch.device) – the desired device of the parameters and buffers in this module
dtype (torch.dtype) – the desired floating point or complex dtype of the parameters and buffers in this module
tensor (torch.Tensor) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
memory_format (torch.memory_format) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns:

self

Return type:

Module

Examples:

>>> # xdoctest: +IGNORE_WANT("non-deterministic")
>>> linear = nn.Linear(2, 2)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]])
>>> linear.to(torch.double)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1913, -0.3420],
        [-0.5113, -0.2325]], dtype=torch.float64)
>>> # xdoctest: +REQUIRES(env:TORCH_DOCTEST_CUDA1)
>>> gpu1 = torch.device("cuda:1")
>>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
>>> cpu = torch.device("cpu")
>>> linear.to(cpu)
Linear(in_features=2, out_features=2, bias=True)
>>> linear.weight
Parameter containing:
tensor([[ 0.1914, -0.3420],
        [-0.5112, -0.2324]], dtype=torch.float16)

>>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
>>> linear.weight
Parameter containing:
tensor([[ 0.3741+0.j,  0.2382+0.j],
        [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
>>> linear(torch.ones(3, 2, dtype=torch.cdouble))
tensor([[0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j],
        [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)

to_tokens(input: str | List[str], move_to_device: bool = True, truncate: bool = True) → Tuple[Int[Tensor, 'batch pos'], Int[Tensor, 'batch pos']]#

Converts a string to a tensor of tokens. Taken mostly from the HookedTransformer implementation, but does not support default padding sides or prepend_bos.

Parameters:

input (Union[str, List[str]]) – The input to tokenize.
move_to_device (bool) – Whether to move the output tensor of tokens to the device the model lives on. Defaults to True
truncate (bool) – If the output tokens are too long, whether to truncate the output tokens to the model’s max context window. Does nothing for shorter inputs. Defaults to True.

tokenizer: PreTrainedTokenizerBase | None#