transformer_lens.model_bridge.supported_architectures.internlm2 module¶
InternLM2 architecture adapter.
- class transformer_lens.model_bridge.supported_architectures.internlm2.InternLM2ArchitectureAdapter(cfg: Any)¶
Bases:
ArchitectureAdapterArchitecture adapter for InternLM2 models.
InternLM2 uses remote code (trust_remote_code=True) and differs from Llama in: - Fused interleaved GQA wqkv weight (not standard [Q|K|V] split) - Non-standard module names: tok_embeddings, output, attention, feed_forward,
wqkv/wo, w1(gate)/w3(up)/w2(down), attention_norm, ffn_norm
Per-layer rotary_emb (no model-level shared instance)
supports_fold_ln=False: fold_ln is done manually in preprocess_weights because the bridge state dict has the fused qkv key, not split q/k/v keys, so fold_layer_norm’s extract_attention_tensors_for_folding would silently skip attn.
Optional parameters (may not exist in state_dict): - blocks.{i}.attn.b_Q / b_K / b_V / b_O — config.bias=False on shipped models - blocks.{i}.mlp.b_gate / b_in / b_out — MLP always bias=False - blocks.{i}.ln1.b / ln2.b / ln_final.b — RMSNorm has no bias
- prepare_loading(model_name: str, model_kwargs: dict) None¶
Patch transformers v5 incompatibilities before from_pretrained runs.
- preprocess_weights(state_dict: dict[str, Tensor]) dict[str, Tensor]¶
Fold layer norms into QKV and MLP weights.
Standard fold_ln can’t reach split Q/K/V when wqkv is fused in the bridge state dict. We extract and fold here, then write split keys so RearrangeTensorConversion can follow. MLP projections (w1/w2/w3) are separate linears so they fold normally. Mirrors phi3.py.preprocess_weights, adapted for InternLM2’s layout.
- setup_component_testing(hf_model: Any, bridge_model: Any = None) None¶
Inject per-layer rotary embedding for component testing.