transformer_lens.model_bridge.supported_architectures.gemma2 module

Gemma2 architecture adapter.

class transformer_lens.model_bridge.supported_architectures.gemma2.Gemma2ArchitectureAdapter(cfg: Any)

Bases: ArchitectureAdapter

Architecture adapter for Gemma2 models.

__init__(cfg: Any) None

Initialize the Gemma2 architecture adapter.

setup_component_testing(hf_model: Any, bridge_model: Any = None) None

Set up rotary embedding references and attention implementation for Gemma-2 component testing.

Gemma-2 uses RoPE (Rotary Position Embeddings). We set the rotary_emb reference on all attention bridge instances for component testing.

We also force the HF model to use “eager” attention to match the bridge’s implementation. The bridge uses “eager” to support output_attentions for hooks, while HF defaults to “sdpa”. These produce mathematically equivalent results but with small numerical differences due to different implementations.

Parameters:
  • hf_model – The HuggingFace Gemma-2 model instance

  • bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)

setup_hook_compatibility(bridge: Any) None

Setup hook compatibility for Gemma2 models.

Gemma2 scales embeddings by sqrt(d_model). The weights are pre-scaled via preprocess_weights(), but we still need to apply the scaling conversion to the hook output for proper hook functionality (so user modifications are correctly scaled/unscaled).

Parameters:

bridge – The TransformerBridge instance