transformer_lens.model_bridge.supported_architectures.gemma2 module¶
Gemma2 architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.gemma2.Gemma2ArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for Gemma2 models.
- __init__(cfg: Any) None¶
Initialize the Gemma2 architecture adapter.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references and attention implementation for Gemma-2 component testing.
Gemma-2 uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.
We also force the HF model to use “eager” attention to match the bridge’s implementation. The bridge uses “eager” to support output_attentions for hooks, while HF defaults to “sdpa”. These produce mathematically equivalent results but with small numerical differences due to different implementations.
- Parameters:
hf_model – The HuggingFace Gemma-2 model instance
bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)
- setup_hook_compatibility(bridge: Any) None¶
Setup hook compatibility for Gemma2 models.
Gemma2 scales embeddings by sqrt(d_model). The weights are pre-scaled via preprocess_weights(), but we still need to apply the scaling conversion to the hook output for proper hook functionality (so user modifications are correctly scaled/unscaled).
- Parameters:
bridge – The TransformerBridge instance