transformer_lens.model_bridge.supported_architectures.t5gemma module¶
T5Gemma architecture adapter.
T5GemmaForConditionalGeneration is an encoder-decoder model combining: - Gemma-style RoPE, GQA, gated MLP, and RMSNorm with offset (+1.0) - Encoder-decoder cross-attention in the decoder stack - Nested config: encoder/decoder dims live in cfg.encoder / cfg.decoder
Key differences from plain T5: - Uses model.encoder.layers / model.decoder.layers (not .block) - No relative position bias; uses RoPE instead - All norms are Gemma-style (weight + 1.0) - lm_head is T5GemmaLMHead wrapping out_proj (no .weight at the top level)
- class transformer_lens.model_bridge.supported_architectures.t5gemma.T5GemmaArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for T5GemmaForConditionalGeneration.
Encoder: BlockBridge over model.encoder.layers (Gemma-style, no cross-attn) Decoder: T5GemmaDecoderBlockBridge over model.decoder.layers (adds cross-attn hooks)
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for T5Gemma component testing.
Both the encoder and decoder carry their own rotary_emb. We set the reference on all PositionEmbeddingsAttentionBridge instances so that component-level forward calls can compute RoPE correctly.