transformer_lens.model_bridge.generalized_components.alibi_utils module

Shared ALiBi (Attention with Linear Biases) utility functions.

Used by Bloom and Falcon ALiBi attention bridges to generate positional bias tensors.

transformer_lens.model_bridge.generalized_components.alibi_utils.build_alibi_slopes(num_heads: int, device: device) Tensor

Compute ALiBi per-head slope values.

For power-of-2 head counts, slopes are geometric: 2^(-8/n), 2^(-16/n), … For non-power-of-2, extra slopes are interleaved from a finer geometric series. Matches the HuggingFace implementation.

Parameters:
  • num_heads – Number of attention heads.

  • device – Device for the output tensor.

Returns:

Slopes tensor of shape [num_heads].

transformer_lens.model_bridge.generalized_components.alibi_utils.build_alibi_tensor(attention_mask: Tensor, num_heads: int, dtype: dtype) Tensor

Build ALiBi positional bias tensor.

Computes per-head linear biases from token positions, matching HuggingFace’s ALiBi implementation used in Bloom and Falcon models.

Parameters:
  • attention_mask – Binary mask of shape [batch_size, seq_length].

  • num_heads – Number of attention heads.

  • dtype – Output dtype.

Returns:

ALiBi tensor of shape [batch_size, num_heads, 1, seq_length].