transformer_lens.utilities.architectures module¶
Centralized architecture classification for TransformerLens.
Single source of truth for architecture type detection. Used by the bridge loading pipeline, benchmarks, and verification tools.
- transformer_lens.utilities.architectures.classify_architecture(architecture: str) str¶
Classify an architecture string into a model type.
Returns one of: “seq2seq”, “masked_lm”, “multimodal”, “audio”, “causal_lm”
- transformer_lens.utilities.architectures.classify_model_config(config) str¶
Classify a model by its HF config.
Checks config.is_encoder_decoder first, then falls back to architecture list. Returns one of: “seq2seq”, “masked_lm”, “multimodal”, “audio”, “causal_lm”
- transformer_lens.utilities.architectures.classify_model_name(model_name: str, trust_remote_code: bool = False, token: str | None = None) str¶
Classify a model by its HuggingFace model name.
Loads the config once, classifies from it. If token is None, reads HF_TOKEN from the environment automatically. Returns one of: “seq2seq”, “masked_lm”, “multimodal”, “audio”, “causal_lm”
- transformer_lens.utilities.architectures.get_architectures_for_config(config) list[str]¶
Extract architecture strings from an HF config object.
- transformer_lens.utilities.architectures.is_audio_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) bool¶
Check if a model is an audio encoder model (HuBERT, wav2vec2).
- transformer_lens.utilities.architectures.is_encoder_decoder_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) bool¶
Check if a model is an encoder-decoder architecture (T5, BART, etc.).
- transformer_lens.utilities.architectures.is_masked_lm_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) bool¶
Check if a model is a masked language model (BERT-style).
- transformer_lens.utilities.architectures.is_multimodal_model(model_name: str, trust_remote_code: bool = False, token: str | None = None) bool¶
Check if a model is a multimodal vision-language model (LLaVA, Gemma3).