transformer_lens.tools.model_registry.verify_models module¶

Batch model verification tool for the TransformerLens model registry.

Iterates through supported models, estimates memory requirements, runs benchmarks phase-by-phase, and updates the registry with status, phase scores, and notes.

Usage:: python -m transformer_lens.tools.model_registry.verify_models [options]

Examples

# Dry run to see what would be tested python -m transformer_lens.tools.model_registry.verify_models –dry-run

# Verify top 10 models per architecture on CPU python -m transformer_lens.tools.model_registry.verify_models –device cpu

# Verify only GPT2 models, limit to 3 python -m transformer_lens.tools.model_registry.verify_models –architectures GPT2LMHeadModel –limit 3

# Resume from a previous interrupted run python -m transformer_lens.tools.model_registry.verify_models –resume

# Re-verify already-tested models for a specific architecture python -m transformer_lens.tools.model_registry.verify_models –reverify –architectures Olmo2ForCausalLM

class transformer_lens.tools.model_registry.verify_models.ModelCandidate(model_id: str, architecture_id: str, estimated_params: int | None = None, estimated_memory_gb: float | None = None)¶

Bases: object

A model selected for verification.

architecture_id: str¶

estimated_memory_gb: float | None = None¶

estimated_params: int | None = None¶

model_id: str¶

class transformer_lens.tools.model_registry.verify_models.VerificationProgress(tested: list[str] = <factory>, skipped: list[str] = <factory>, failed: list[str] = <factory>, verified: list[str] = <factory>, start_time: str | None = None)¶

Bases: object

Tracks progress across a verification run.

failed: list[str]¶

classmethod from_dict(data: dict) → VerificationProgress¶

skipped: list[str]¶

start_time: str | None = None¶

tested: list[str]¶

to_dict() → dict¶

verified: list[str]¶

transformer_lens.tools.model_registry.verify_models.estimate_benchmark_memory_gb(n_params: int, dtype: str = 'float32', phases: list[int] | None = None, use_hf_reference: bool = True) → float¶

Estimate peak memory needed for benchmark suite.

Phases run sequentially, so peak memory is the maximum of any single phase, not the sum. The multiplier represents how many model copies exist at peak:

Phase 1 (HF ref on): HF ref + Bridge → 2.0x peak Phase 1 (HF ref off): Bridge only → 1.0x peak Phase 2: Bridge + HookedTransformer (separate copy) → 2.0x model + overhead Phase 3: Same as Phase 2 (processed versions) → 2.0x model + overhead Phase 4: Bridge + GPT-2 scorer (~500MB) → ~1.0x model + 0.5 GB

Parameters:

n_params – Number of model parameters
dtype – Data type for memory calculation
phases – Which phases will be run (None = all phases)
use_hf_reference – Whether Phase 1 loads an HF reference alongside the Bridge. Mirrors the --no-hf-reference CLI flag.

Returns:

Estimated peak memory in GB

transformer_lens.tools.model_registry.verify_models.estimate_model_params(model_id: str) → int¶

Estimate parameter count using AutoConfig (lightweight, no model download).

Fetches only the config JSON (~KB) and computes n_params from dimensions using the same formula as HookedTransformerConfig.__post_init__.

Parameters:: model_id – HuggingFace model ID
Returns:: Estimated number of parameters
Raises:: Exception – If config cannot be fetched or parsed

transformer_lens.tools.model_registry.verify_models.get_available_memory_gb(device: str) → float¶

Detect available memory on the target device.

Parameters:: device – “cpu” or “cuda”
Returns:: Available memory in GB

transformer_lens.tools.model_registry.verify_models.main() → None¶: CLI entry point for batch model verification.

transformer_lens.tools.model_registry.verify_models.select_models_for_verification(per_arch: int = 10, architectures: list[str] | None = None, limit: int | None = None, resume_progress: VerificationProgress | None = None, retry_failed: bool = False, reverify: bool = False) → list[ModelCandidate]¶

Select models for verification from the registry.

Loads supported_models.json (already sorted by downloads). Takes the top N unverified models per architecture.

Parameters:

per_arch – Maximum models to verify per architecture
architectures – Filter to specific architectures (None = all)
limit – Total model cap (None = no cap)
resume_progress – If resuming, skip already-tested models
retry_failed – If True, include previously failed models for re-testing
reverify – If True, ignore previous status and re-test all matching models

Returns:

List of ModelCandidate objects to verify

transformer_lens.tools.model_registry.verify_models.verify_models(candidates: list[ModelCandidate], device: str = 'cpu', max_memory_gb: float | None = None, dtype: str = 'float32', use_hf_reference: bool = True, use_ht_reference: bool = True, phases: list[int] | None = None, quiet: bool = False, progress: VerificationProgress | None = None) → VerificationProgress¶

Run verification benchmarks on a list of model candidates.

Parameters:

candidates – Models to verify
device – Device for benchmarks
max_memory_gb – Memory limit (auto-detected if None)
dtype – Dtype for memory estimation
use_hf_reference – Whether to compare against HuggingFace model
use_ht_reference – Whether to compare against HookedTransformer
phases – Which benchmark phases to run (default: [1, 2, 3, 4])
quiet – Suppress verbose output
progress – Existing progress for resume

Returns:

VerificationProgress with results