transformer_lens.model_bridge.supported_architectures.gpt2 module¶
GPT2 architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.gpt2.GPT2ArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for GPT2 models.
Optional Parameters (may not exist in state_dict):¶
GPT-2 models HAVE biases on ALL linear layers:
✓ blocks.{i}.attn.b_Q - Has bias (from combined c_attn.bias) ✓ blocks.{i}.attn.b_K - Has bias (from combined c_attn.bias) ✓ blocks.{i}.attn.b_V - Has bias (from combined c_attn.bias) ✓ blocks.{i}.attn.b_O - Has bias (c_proj.bias) ✓ blocks.{i}.mlp.b_in - Has bias (c_fc.bias) ✓ blocks.{i}.mlp.b_out - Has bias (c_proj.bias) ✓ blocks.{i}.ln1.b - LayerNorm has bias ✓ blocks.{i}.ln2.b - LayerNorm has bias ✓ ln_final.b - LayerNorm has bias
No optional parameters - all biases exist in GPT-2.
- __init__(cfg: Any) None¶
Initialize the GPT2 architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.gpt2.QKVSplitRearrangeConversion(qkv_index: int, rearrange_pattern: str, **axes_lengths)¶
Bases:
BaseTensorConversionCustom conversion that splits QKV tensor and then rearranges.
Handles two input formats: - Combined QKV tensor (from HuggingFace): one dimension is ~3x the other.
Splits into Q/K/V parts, then rearranges to TL format.
Already-split tensor (from bridge state dict): nn.Linear format [n_heads*d_head, d_model]. Rearranges directly to TL format.
- __init__(qkv_index: int, rearrange_pattern: str, **axes_lengths)¶
Initialize the conversion.
- Parameters:
qkv_index – Index of Q (0), K (1), or V (2) in the QKV tensor
rearrange_pattern – Einops pattern for rearrangement (Conv1D format)
**axes_lengths – Additional axes lengths for einops
- handle_conversion(input_value: Tensor, *full_context) Tensor¶
Split QKV tensor and rearrange the selected part.
- revert(input_value: Tensor, *full_context) Tensor¶
Revert from TL format [n_heads, d_model, d_head] to nn.Linear format.