transformer_lens.utilities.activation_functions module¶

Activation Functions.

Utilities for interacting with all supported activation functions.

class transformer_lens.utilities.activation_functions.XIELU(alpha_p_init: float = 0.8, alpha_n_init: float = 0.8, beta_init: float = 0.5, eps: float = -1e-06)¶

Bases: Module

Trainable xIELU activation function.

See https://arxiv.org/abs/2411.13010

Matches HuggingFace’s XIELUActivation parameterization: alpha_p and alpha_n are stored in softplus-inverse space, and beta is a non-trainable buffer.

forward(input: Float[Tensor, 'batch pos d_mlp']) → Float[Tensor, 'batch pos d_mlp']¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

transformer_lens.utilities.activation_functions.gelu_fast(input: Float[Tensor, 'batch pos d_mlp']) → Float[Tensor, 'batch pos d_mlp']¶

transformer_lens.utilities.activation_functions.gelu_new(input: Float[Tensor, 'batch pos d_mlp']) → Float[Tensor, 'batch pos d_mlp']¶

transformer_lens.utilities.activation_functions.gelu_pytorch_tanh(input: Tensor) → Tensor¶: Approximation of the gelu activation function, used in some older models.

transformer_lens.utilities.activation_functions.solu(input: Float[Tensor, 'batch pos d_mlp']) → Float[Tensor, 'batch pos d_mlp']¶

SoLU activation function as described by https://transformer-circuits.pub/2022/solu/index.html.

LayerNorm implemented by the MLP class.

transformer_lens.utilities.activation_functions.xielu(input: Float[Tensor, 'batch pos d_mlp']) → Float[Tensor, 'batch pos d_mlp']¶

Fixed-parameter xIELU activation function as described by https://arxiv.org/abs/2411.13010

Original code: https://github.com/rubber-duck-debug/xielu

Uses default parameter values. For trainable parameters, use the XIELU class.