transformer_lens.model_bridge.generalized_components.alibi_utils module¶
Shared ALiBi (Attention with Linear Biases) utility functions.
Used by Bloom and Falcon ALiBi attention bridges to generate positional bias tensors.
- transformer_lens.model_bridge.generalized_components.alibi_utils.build_alibi_slopes(num_heads: int, device: device) Tensor¶
Compute ALiBi per-head slope values.
For power-of-2 head counts, slopes are geometric: 2^(-8/n), 2^(-16/n), … For non-power-of-2, extra slopes are interleaved from a finer geometric series. Matches the HuggingFace implementation.
- Parameters:
num_heads – Number of attention heads.
device – Device for the output tensor.
- Returns:
Slopes tensor of shape [num_heads].
- transformer_lens.model_bridge.generalized_components.alibi_utils.build_alibi_tensor(attention_mask: Tensor, num_heads: int, dtype: dtype) Tensor¶
Build ALiBi positional bias tensor.
Computes per-head linear biases from token positions, matching HuggingFace’s ALiBi implementation used in Bloom and Falcon models.
- Parameters:
attention_mask – Binary mask of shape [batch_size, seq_length].
num_heads – Number of attention heads.
dtype – Output dtype.
- Returns:
ALiBi tensor of shape [batch_size, num_heads, 1, seq_length].