transformer_lens.pretrained.weight_conversions.openai#

Weight conversion for OpenAI GPT-OSS models.

GPT-OSS has a unique MoE architecture: - GptOssExperts stores all expert weights in merged tensors (not individual modules) - gate_up_proj: (num_experts, hidden_size, 2*expert_dim) with interleaved gate/up columns - down_proj: (num_experts, expert_dim, hidden_size) - Router (GptOssTopKRouter) uses weight + bias

transformer_lens.pretrained.weight_conversions.openai.convert_gpt_oss_weights(gpt_oss, cfg: HookedTransformerConfig)#