transformer_lens.model_bridge.supported_architectures.internlm2 module¶

InternLM2 architecture adapter.

class transformer_lens.model_bridge.supported_architectures.internlm2.InternLM2ArchitectureAdapter(cfg: Any)¶

Bases: ArchitectureAdapter

Architecture adapter for InternLM2 models.

InternLM2 uses remote code (trust_remote_code=True) and differs from Llama in: - Fused interleaved GQA wqkv weight (not standard [Q|K|V] split) - Non-standard module names: tok_embeddings, output, attention, feed_forward,

wqkv/wo, w1(gate)/w3(up)/w2(down), attention_norm, ffn_norm

Per-layer rotary_emb (no model-level shared instance)
supports_fold_ln=False: fold_ln is done manually in preprocess_weights because the bridge state dict has the fused qkv key, not split q/k/v keys, so fold_layer_norm’s extract_attention_tensors_for_folding would silently skip attn.

Optional parameters (may not exist in state_dict): - blocks.{i}.attn.b_Q / b_K / b_V / b_O — config.bias=False on shipped models - blocks.{i}.mlp.b_gate / b_in / b_out — MLP always bias=False - blocks.{i}.ln1.b / ln2.b / ln_final.b — RMSNorm has no bias

prepare_loading(model_name: str, model_kwargs: dict) → None¶: Patch transformers v5 incompatibilities before from_pretrained runs.

preprocess_weights(state_dict: dict[str, Tensor]) → dict[str, Tensor]¶

Fold layer norms into QKV and MLP weights.

Standard fold_ln can’t reach split Q/K/V when wqkv is fused in the bridge state dict. We extract and fold here, then write split keys so RearrangeTensorConversion can follow. MLP projections (w1/w2/w3) are separate linears so they fold normally. Mirrors phi3.py.preprocess_weights, adapted for InternLM2’s layout.

setup_component_testing(hf_model: Any, bridge_model: Any = None) → None¶: Inject per-layer rotary embedding for component testing.