transformer_lens.model_bridge.supported_architectures.gemma3 module¶

Gemma3 architecture adapter.

class transformer_lens.model_bridge.supported_architectures.gemma3.Gemma3ArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for Gemma3 models.

__init__(cfg: Any) → None¶: Initialize the Gemma3 architecture adapter.

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶

Set up rotary embedding references and native autograd for Gemma-3 component testing.

Gemma-3 uses dual RoPE (global + local). We set local RoPE (used by 85% of layers) on all attention bridge instances for component testing.

We also enable use_native_layernorm_autograd on all normalization bridges to ensure they delegate to HuggingFace’s exact implementation instead of using manual computation.

Additionally, we force the HF model to use “eager” attention to match the bridge’s implementation. The bridge uses “eager” to support output_attentions for hooks, while HF defaults to “sdpa”. These produce mathematically equivalent results but with small numerical differences due to different implementations.

Note: Layers 5, 11, 17, 23 use global RoPE but will use local in component tests. This is an acceptable tradeoff given the shared-instance constraint.

Parameters:

hf_model – The HuggingFace Gemma-3 model instance
bridge_model – The TransformerBridge model (if available, set rotary_emb on actual instances)