transformer_lens.utilities.activation_functions#
Activation Functions.
Utilities for interacting with all supported activation functions.
- class transformer_lens.utilities.activation_functions.XIELU(alpha_p_init: float = 0.8, alpha_n_init: float = 0.8, beta_init: float = 0.5, eps: float = -1e-06)#
Bases:
ModuleTrainable xIELU activation function.
See https://arxiv.org/abs/2411.13010
Matches HuggingFace’s XIELUActivation parameterization: alpha_p and alpha_n are stored in softplus-inverse space, and beta is a non-trainable buffer.
- forward(input: Float[Tensor, 'batch pos d_mlp']) Float[Tensor, 'batch pos d_mlp']#
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- transformer_lens.utilities.activation_functions.gelu_fast(input: Float[Tensor, 'batch pos d_mlp']) Float[Tensor, 'batch pos d_mlp']#
- transformer_lens.utilities.activation_functions.gelu_new(input: Float[Tensor, 'batch pos d_mlp']) Float[Tensor, 'batch pos d_mlp']#
- transformer_lens.utilities.activation_functions.gelu_pytorch_tanh(input: Tensor) Tensor#
Approximation of the gelu activation function, used in some older models.
- transformer_lens.utilities.activation_functions.solu(input: Float[Tensor, 'batch pos d_mlp']) Float[Tensor, 'batch pos d_mlp']#
SoLU activation function as described by https://transformer-circuits.pub/2022/solu/index.html.
LayerNorm implemented by the MLP class.
- transformer_lens.utilities.activation_functions.xielu(input: Float[Tensor, 'batch pos d_mlp']) Float[Tensor, 'batch pos d_mlp']#
Fixed-parameter xIELU activation function as described by https://arxiv.org/abs/2411.13010
Original code: https://github.com/rubber-duck-debug/xielu
Uses default parameter values. For trainable parameters, use the XIELU class.