transformer_lens.model_bridge.supported_architectures.deepseek_v3 module¶
DeepSeek V3 architecture adapter.
Supports DeepSeek V3 and DeepSeek-R1 models (both use DeepseekV3ForCausalLM). Key features: - Multi-Head Latent Attention (MLA): Q and KV compressed via LoRA-style projections - Mixture of Experts (MoE) with shared experts on most layers - Dense MLP on first first_k_dense_replace layers
- class transformer_lens.model_bridge.supported_architectures.deepseek_v3.DeepSeekV3ArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for DeepSeek V3 / R1 models.
Uses RMSNorm, MLA with compressed Q/KV projections, partial RoPE, MoE on most layers (dense MLP on first few), and no biases.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Set up rotary embedding references for component testing.