transformer_lens.config.TransformerLensConfig module

TransformerLens Configuration.

Module with a dataclass for storing the configuration of a transformer_lens.model_bridge.TransformerBridge model.

class transformer_lens.config.TransformerLensConfig.TransformerLensConfig(d_model: int, d_head: int, n_layers: int, n_ctx: int, n_heads: int = -1, d_mlp: int | None = None, d_vocab: int = -1, device: str | None = None, use_attn_result: bool = False, use_split_qkv_input: bool = False, default_prepend_bos: bool = True, positional_embedding_type: str = 'standard', n_key_value_heads: int | None = None, attn_only: bool = False, gated_mlp: bool = False, uses_rms_norm: bool = False, eps: float = 1e-05, layer_norm_folding: bool = False, act_fn: str = 'relu', normalization_type: str | None = 'LN', num_experts: int | None = None, experts_per_token: int | None = None, final_rms: bool = False, dtype: dtype = torch.float32)

Bases: object

Configuration class for TransformerLens bridge components.

This class contains only the configuration parameters that are actually used by the system. It serves as a minimal base configuration.

Parameters:
  • parameters (# Core model architecture)

  • d_model (int) – The dimensionality of the embeddings.

  • d_head (int) – The dimensionality of each attention head.

  • n_layers (int) – The number of transformer blocks.

  • n_ctx (int) – The maximum sequence length.

  • n_heads (int) – The number of attention heads. If not specified, will be set to d_model // d_head.

  • d_mlp (int, optional) – The dimensionality of the feedforward mlp network.

  • d_vocab (int) – The size of the vocabulary. Defaults to -1, which means not set.

  • configuration (# GQA)

  • device (str, optional) – The device to use for the model. Defaults to ‘cuda’ if available, else ‘cpu’.

  • configuration

  • use_attn_result (bool) – Whether to explicitly calculate the amount each head adds to the residual stream.

  • use_split_qkv_input (bool) – Whether to explicitly calculate the input of each head separately.

  • configuration

  • default_prepend_bos (bool) – Default behavior of whether to prepend the BOS token.

  • configuration

  • positional_embedding_type (str) – The positional embedding used.

  • configuration

  • n_key_value_heads (int, optional) – The number of groups of heads that use the same key and value matrix.

act_fn: str = 'relu'
attn_only: bool = False
d_head: int
d_mlp: int | None = None
d_model: int
d_vocab: int = -1
default_prepend_bos: bool = True
device: str | None = None
dtype: dtype = torch.float32
eps: float = 1e-05
experts_per_token: int | None = None
final_rms: bool = False
classmethod from_dict(config_dict: Dict[str, Any])

Instantiates a TransformerLensConfig from a Python dictionary of parameters. Only includes fields that are defined in the TransformerLensConfig dataclass.

gated_mlp: bool = False
layer_norm_folding: bool = False
n_ctx: int
n_heads: int = -1
n_key_value_heads: int | None = None
n_layers: int
normalization_type: str | None = 'LN'
num_experts: int | None = None
positional_embedding_type: str = 'standard'
to_dict() Dict[str, Any]

Convert the config to a dictionary.

classmethod unwrap(config: Dict | TransformerLensConfig) TransformerLensConfig

Convenience function to avoid duplicate code from a common way config is passed to various components.

use_attn_result: bool = False
use_split_qkv_input: bool = False
uses_rms_norm: bool = False