transformer_lens.utilities.architectures module¶

Centralized architecture classification for TransformerLens.

Single source of truth for architecture type detection. Used by the bridge loading pipeline, benchmarks, and verification tools.

transformer_lens.utilities.architectures.classify_architecture(architecture: str) → str¶

Classify an architecture string into a model type.

Returns one of: “seq2seq”, “masked_lm”, “multimodal”, “audio”, “causal_lm”

transformer_lens.utilities.architectures.classify_model_config(config) → str¶

Classify a model by its HF config.

Checks config.is_encoder_decoder first, then falls back to architecture list. Returns one of: “seq2seq”, “masked_lm”, “multimodal”, “audio”, “causal_lm”

transformer_lens.utilities.architectures.classify_model_name(model_name: str, trust_remote_code: bool = False, token: str | None = None) → str¶

Classify a model by its HuggingFace model name.

Loads the config once, classifies from it. If token is None, reads HF_TOKEN from the environment automatically. Returns one of: “seq2seq”, “masked_lm”, “multimodal”, “audio”, “causal_lm”

transformer_lens.utilities.architectures.get_architectures_for_config(config) → list[str]¶: Extract architecture strings from an HF config object.

transformer_lens.utilities.architectures.is_audio_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) → bool¶: Check if a model is an audio encoder model (HuBERT, wav2vec2).

transformer_lens.utilities.architectures.is_encoder_decoder_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) → bool¶: Check if a model is an encoder-decoder architecture (T5, BART, etc.).

transformer_lens.utilities.architectures.is_masked_lm_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) → bool¶: Check if a model is a masked language model (BERT-style).

transformer_lens.utilities.architectures.is_multimodal_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) → bool¶: Check if a model is a multimodal vision-language model (LLaVA, Gemma3).