transformer_lens.model_bridge.supported_architectures.deepseek_v3 module¶

DeepSeek V3 architecture adapter.

Supports DeepSeek V3 and DeepSeek-R1 models (both use DeepseekV3ForCausalLM). Key features: - Multi-Head Latent Attention (MLA): Q and KV compressed via LoRA-style projections - Mixture of Experts (MoE) with shared experts on most layers - Dense MLP on first first_k_dense_replace layers

class transformer_lens.model_bridge.supported_architectures.deepseek_v3.DeepSeekV3ArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for DeepSeek V3 / R1 models.

Uses RMSNorm, MLA with compressed Q/KV projections, partial RoPE, MoE on most layers (dense MLP on first few), and no biases.

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶: Set up rotary embedding references for component testing.