transformer_lens.model_bridge.supported_architectures.deepseek_v2 module¶

DeepSeek V2 architecture adapter.

Supports DeepSeek-V2, DeepSeek-V2-Lite, and DeepSeek-Coder-V2 models (all use DeepseekV2ForCausalLM).

Key features: - Multi-Head Latent Attention (MLA): Q and KV compressed via LoRA-style projections.

DeepSeek-V2-Lite sets q_lora_rank=None, skipping Q compression and using a direct q_proj instead — MLAAttentionBridge.forward handles both paths automatically.

Mixture of Experts (MoE) with shared experts on most layers
Dense MLP on first first_k_dense_replace layers

class transformer_lens.model_bridge.supported_architectures.deepseek_v2.DeepSeekV2ArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for DeepSeek V2 / V2-Lite / Coder-V2 models.

Uses RMSNorm, MLA with compressed Q/KV projections (or direct Q projection when q_lora_rank is None), partial RoPE, MoE on most layers (dense MLP on first few), and no biases.

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶: Set up rotary embedding references for component testing.