transformer_lens.model_bridge.generalized_components.alibi_joint_qkv_attention module¶

ALiBi joint QKV attention bridge component.

Handles models that use ALiBi (Attention with Linear Biases) with fused QKV projections. Splits fused QKV, reimplements attention with ALiBi bias and hooks at each stage.

class transformer_lens.model_bridge.generalized_components.alibi_joint_qkv_attention.ALiBiJointQKVAttentionBridge(name: str, config: Any, split_qkv_matrix: Any = None, submodules: Dict[str, GeneralizedComponent] | None = None, **kwargs: Any)¶

Bases: JointQKVAttentionBridge

Attention bridge for models using ALiBi position encoding with fused QKV.

Splits fused QKV, reimplements attention with ALiBi bias fused into scores, and fires hooks at each stage (hook_q, hook_k, hook_v, hook_attn_scores, hook_pattern). ALiBi bias is added to raw attention scores before scaling.

forward(*args: Any, **kwargs: Any) → Any¶: Forward pass: split QKV, apply ALiBi, fire hooks.

get_random_inputs(batch_size: int = 2, seq_len: int = 8, device: device | None = None, dtype: dtype | None = None) → Dict[str, Any]¶: Generate test inputs including ALiBi tensor and attention mask.