transformer_lens.model_bridge.supported_architectures.deepseek_v2 module¶
DeepSeek V2 architecture adapter.
Supports DeepSeek-V2, DeepSeek-V2-Lite, and DeepSeek-Coder-V2 models (all use DeepseekV2ForCausalLM).
Key features: - Multi-Head Latent Attention (MLA): Q and KV compressed via LoRA-style projections.
DeepSeek-V2-Lite sets q_lora_rank=None, skipping Q compression and using a direct q_proj instead — MLAAttentionBridge.forward handles both paths automatically.
Mixture of Experts (MoE) with shared experts on most layers
Dense MLP on first first_k_dense_replace layers
- class transformer_lens.model_bridge.supported_architectures.deepseek_v2.DeepSeekV2ArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for DeepSeek V2 / V2-Lite / Coder-V2 models.
Uses RMSNorm, MLA with compressed Q/KV projections (or direct Q projection when q_lora_rank is None), partial RoPE, MoE on most layers (dense MLP on first few), and no biases.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for component testing.