transformer_lens.model_bridge.supported_architectures.t5gemma module¶

T5Gemma architecture adapter.

T5GemmaForConditionalGeneration is an encoder-decoder model combining: - Gemma-style RoPE, GQA, gated MLP, and RMSNorm with offset (+1.0) - Encoder-decoder cross-attention in the decoder stack - Nested config: encoder/decoder dims live in cfg.encoder / cfg.decoder

Key differences from plain T5: - Uses model.encoder.layers / model.decoder.layers (not .block) - No relative position bias; uses RoPE instead - All norms are Gemma-style (weight + 1.0) - lm_head is T5GemmaLMHead wrapping out_proj (no .weight at the top level)

class transformer_lens.model_bridge.supported_architectures.t5gemma.T5GemmaArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for T5GemmaForConditionalGeneration.

Encoder: BlockBridge over model.encoder.layers (Gemma-style, no cross-attn) Decoder: T5GemmaDecoderBlockBridge over model.decoder.layers (adds cross-attn hooks)

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶

Set up rotary embedding references for T5Gemma component testing.

Both the encoder and decoder carry their own rotary_emb. We set the reference on all PositionEmbeddingsAttentionBridge instances so that component-level forward calls can compute RoPE correctly.