transformer_lens.benchmarks package

Submodules

Module contents

Benchmark utilities for TransformerBridge testing.

This module provides reusable benchmark functions for comparing TransformerBridge with HuggingFace models and HookedTransformer implementations.

class transformer_lens.benchmarks.BenchmarkResult(name: str, severity: BenchmarkSeverity, message: str, details: Dict[str, Any] | None = None, passed: bool = True, phase: int | None = None)

Bases: object

Result of a benchmark test.

details: Dict[str, Any] | None = None
message: str
name: str
passed: bool = True
phase: int | None = None
print_immediate() None

Print this result immediately to console.

severity: BenchmarkSeverity
class transformer_lens.benchmarks.BenchmarkSeverity(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Severity levels for benchmark results.

DANGER = 'danger'
ERROR = 'error'
INFO = 'info'
SKIPPED = 'skipped'
WARNING = 'warning'
class transformer_lens.benchmarks.PhaseReferenceData(hf_logits: Tensor | None = None, hf_loss: float | None = None, test_text: str | None = None)

Bases: object

Float32 reference data from Phase 1 for Phase 3 equivalence comparison.

hf_logits: Tensor | None = None
hf_loss: float | None = None
test_text: str | None = None
transformer_lens.benchmarks.benchmark_activation_cache(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, tolerance: float = 0.001) BenchmarkResult

Benchmark activation cache values against reference model.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • tolerance – Tolerance for activation comparison

Returns:

BenchmarkResult with cache value comparison details

transformer_lens.benchmarks.benchmark_activation_cache_structure(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, prepend_bos: bool | None = None) BenchmarkResult

Benchmark activation cache for structural correctness (keys, shapes).

This checks: - Cache returns expected keys - Cache tensor shapes are compatible - run_with_cache works correctly

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer for comparison

  • prepend_bos – Whether to prepend BOS token. If None, uses model default.

Returns:

BenchmarkResult with structural validation details

transformer_lens.benchmarks.benchmark_backward_hooks(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, abs_tolerance: float = 0.2, rel_tolerance: float = 0.0003) BenchmarkResult

Benchmark all backward hooks for gradient matching.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • abs_tolerance – Absolute tolerance for gradient comparison

  • rel_tolerance – Relative tolerance for gradient comparison

Returns:

BenchmarkResult with backward hook comparison details

transformer_lens.benchmarks.benchmark_backward_hooks_structure(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, prepend_bos: bool | None = None) BenchmarkResult

Benchmark backward hooks for structural correctness (existence, firing, shapes).

This checks: - All reference backward hooks exist in bridge - Hooks can be registered - Hooks fire during backward pass - Gradient tensor shapes are compatible

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer for comparison

  • prepend_bos – Whether to prepend BOS token. If None, uses model default.

Returns:

BenchmarkResult with structural validation details

transformer_lens.benchmarks.benchmark_critical_backward_hooks(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, abs_tolerance: float = 0.2, rel_tolerance: float = 0.0003) BenchmarkResult

Benchmark critical backward hooks for gradient matching.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • abs_tolerance – Absolute tolerance for gradient comparison

  • rel_tolerance – Relative tolerance for gradient comparison

Returns:

BenchmarkResult with critical backward hook comparison details

transformer_lens.benchmarks.benchmark_critical_forward_hooks(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, tolerance: float = 0.02) BenchmarkResult

Benchmark critical forward hooks commonly used in interpretability research.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • tolerance – Tolerance for activation comparison

Returns:

BenchmarkResult with critical hook comparison details

transformer_lens.benchmarks.benchmark_forward_hooks(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, tolerance: float = 0.5, prepend_bos: bool | None = None) BenchmarkResult

Benchmark all forward hooks for activation matching.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer for comparison

  • tolerance – Tolerance for activation matching (fraction of mismatches allowed)

  • prepend_bos – Whether to prepend BOS token. If None, uses model default.

Returns:

BenchmarkResult with hook activation comparison details

transformer_lens.benchmarks.benchmark_forward_hooks_structure(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, prepend_bos: bool | None = None) BenchmarkResult

Benchmark forward hooks for structural correctness (existence, firing, shapes).

This checks: - All reference hooks exist in bridge - Hooks can be registered - Hooks fire during forward pass - Hook tensor shapes are compatible

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer for comparison

  • prepend_bos – Whether to prepend BOS token. If None, uses model default.

Returns:

BenchmarkResult with structural validation details

transformer_lens.benchmarks.benchmark_forward_pass(bridge: TransformerBridge, test_input: str | Tensor, reference_model: HookedTransformer | Module | None = None, reference_logits: Tensor | None = None, atol: float = 0.001, rtol: float = 0.03) BenchmarkResult

Benchmark forward pass between TransformerBridge and reference model.

Parameters:
  • bridge – TransformerBridge model to test

  • test_input – Input text string or audio waveform tensor for testing

  • reference_model – Optional reference model (HookedTransformer or HF model)

  • reference_logits – Optional pre-computed reference logits/hidden states tensor (e.g., saved from a prior HF forward pass to avoid needing both models in memory)

  • atol – Absolute tolerance for comparison

  • rtol – Relative tolerance for comparison

Returns:

BenchmarkResult with comparison details

transformer_lens.benchmarks.benchmark_gated_hooks_fire(bridge: TransformerBridge, test_text: str = 'The quick brown fox', prepend_bos: bool | None = None) BenchmarkResult

Verify each cfg-gated attention hook fires when its flag is enabled.

Hooks like hook_result, hook_q_input, hook_attn_in exist unconditionally on the attention bridge but are only populated when the corresponding config flag is set (keeping default-path cost at zero). This benchmark toggles each flag in turn, runs a short forward, and asserts at least one layer’s matching hook actually captured an activation.

use_attn_in and use_split_qkv_input are mutually exclusive, so each flag runs in its own forward pass. Plain AttentionBridge (non-PEA/JPEA) adapters raise NotImplementedError from the setter — recorded as skipped rather than failed, since the applicability gate is intentional.

transformer_lens.benchmarks.benchmark_generation(bridge: TransformerBridge, test_text: str, max_new_tokens: int = 10, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark basic text generation.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for generation

  • max_new_tokens – Number of tokens to generate

  • reference_model – Optional HookedTransformer reference model (not used)

Returns:

BenchmarkResult with generation details

transformer_lens.benchmarks.benchmark_generation_with_kv_cache(bridge: TransformerBridge, test_text: str, max_new_tokens: int = 10, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark text generation with KV caching enabled.

This ensures that the KV cache is properly passed through attention layers during generation, and that the cache update logic works correctly.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for generation

  • max_new_tokens – Number of tokens to generate

  • reference_model – Optional HookedTransformer reference model (not used)

Returns:

BenchmarkResult with generation details

transformer_lens.benchmarks.benchmark_gradient_computation(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, atol: float = 0.001) BenchmarkResult

Benchmark basic gradient computation.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • atol – Absolute tolerance for gradient comparison

Returns:

BenchmarkResult with gradient computation comparison details

transformer_lens.benchmarks.benchmark_hook_functionality(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, atol: float = 0.002) BenchmarkResult

Benchmark hook system functionality through ablation effects.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • atol – Absolute tolerance for effect comparison

Returns:

BenchmarkResult with hook functionality comparison details

transformer_lens.benchmarks.benchmark_hook_registry(bridge: TransformerBridge, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark hook registry completeness.

Parameters:
  • bridge – TransformerBridge model to test

  • reference_model – Optional HookedTransformer reference model

Returns:

BenchmarkResult with registry comparison details

transformer_lens.benchmarks.benchmark_logits_equivalence(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, reference_logits: Tensor | None = None, atol: float = 0.03, rtol: float = 0.03) BenchmarkResult

Benchmark logits output between TransformerBridge and HookedTransformer.

Note: Uses relaxed tolerance (3e-2) as forward pass implementations differ slightly, leading to accumulated numerical precision differences.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • reference_logits – Optional pre-computed reference logits tensor (e.g., from Phase 1)

  • atol – Absolute tolerance for comparison

  • rtol – Relative tolerance for comparison

Returns:

BenchmarkResult with comparison details

transformer_lens.benchmarks.benchmark_loss_equivalence(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, reference_loss: float | None = None, atol: float = 0.001) BenchmarkResult

Benchmark loss computation between TransformerBridge and HookedTransformer.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • reference_loss – Optional pre-computed reference loss value (e.g., from Phase 1)

  • atol – Absolute tolerance for comparison

Returns:

BenchmarkResult with comparison details

transformer_lens.benchmarks.benchmark_multiple_generation_calls(bridge: TransformerBridge, test_prompts: list, max_new_tokens: int = 5, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark multiple generation calls to ensure KV cache handling is robust.

Parameters:
  • bridge – TransformerBridge model to test

  • test_prompts – List of input prompts for generation

  • max_new_tokens – Number of tokens to generate per prompt

  • reference_model – Optional HookedTransformer reference model (not used)

Returns:

BenchmarkResult with multiple generation details

transformer_lens.benchmarks.benchmark_run_with_cache(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark run_with_cache functionality.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

Returns:

BenchmarkResult with cache functionality details

transformer_lens.benchmarks.benchmark_text_quality(bridge: TransformerBridge, test_text: str, max_new_tokens: int = 50, scoring_model_name: str = 'gpt2', pass_threshold: float = 85.0, device: str = 'cpu', scoring_model: PreTrainedModel | None = None, scoring_tokenizer: PreTrainedTokenizerBase | None = None) BenchmarkResult

Benchmark text generation quality using continuation-only perplexity scoring.

Generates text from multiple diverse prompts, scores each continuation using GPT-2 perplexity (prompt tokens masked), applies a repetition penalty, and returns the averaged score.

Parameters:
  • bridge – TransformerBridge model to test.

  • test_text – Primary input prompt (additional diverse prompts are also used).

  • max_new_tokens – Number of tokens to generate per prompt.

  • scoring_model_name – HuggingFace model to use as scorer.

  • pass_threshold – Minimum average score to pass (default 95.0).

  • device – Device for the scoring model.

  • scoring_model – Optional pre-loaded scoring model. When provided alongside scoring_tokenizer, skips loading and avoids cleanup (caller owns lifecycle).

  • scoring_tokenizer – Optional pre-loaded tokenizer for the scoring model.

Returns:

BenchmarkResult with quality score details.

transformer_lens.benchmarks.benchmark_weight_modification(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark that weight modifications propagate correctly.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model (not used)

Returns:

BenchmarkResult with weight modification verification details

transformer_lens.benchmarks.benchmark_weight_processing(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None) BenchmarkResult

Benchmark weight processing (folding, centering) application.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

Returns:

BenchmarkResult with weight processing verification details

transformer_lens.benchmarks.benchmark_weight_sharing(bridge: TransformerBridge, test_text: str, reference_model: HookedTransformer | None = None, atol: float = 0.001) BenchmarkResult

Benchmark weight sharing and modification effects.

Parameters:
  • bridge – TransformerBridge model to test

  • test_text – Input text for testing

  • reference_model – Optional HookedTransformer reference model

  • atol – Absolute tolerance for effect comparison

Returns:

BenchmarkResult with weight sharing verification details

transformer_lens.benchmarks.run_benchmark_suite(model_name: str, device: str = 'cpu', dtype: dtype = torch.float32, test_text: str | None = None, use_hf_reference: bool = True, use_ht_reference: bool = True, enable_compatibility_mode: bool = True, verbose: bool = True, track_memory: bool = False, test_weight_processing_individually: bool = False, phases: list[int] | None = None, trust_remote_code: bool = False, scoring_model: PreTrainedModel | None = None, scoring_tokenizer: PreTrainedTokenizerBase | None = None) List[BenchmarkResult]

Run comprehensive benchmark suite for TransformerBridge.

This function implements an optimized multi-phase approach to minimize model reloading: Phase 1: HF + Bridge (unprocessed) - Compare against raw HuggingFace model Phase 2: Bridge (unprocessed) + HT (unprocessed) - Compare unprocessed models Phase 3: Bridge (processed) + HT (processed) - Full compatibility mode testing Phase 4: Text Quality - Perplexity-based legibility scoring via GPT-2 Phase 5: Individual Weight Processing Flags (optional) Phase 6: Combined Weight Processing Flags (optional)

When test_weight_processing_individually=True, Phases 5 & 6 run after Phase 3, testing each weight processing flag individually and in combinations.

Parameters:
  • model_name – Name of the model to benchmark (e.g., “gpt2”)

  • device – Device to run on (“cpu” or “cuda”)

  • dtype – Precision for model loading (default: torch.float32). Use torch.bfloat16 to halve memory for larger models. Phase 2/3 comparisons automatically upcast to float32 for precision.

  • test_text – Optional test text (default: standard test prompt)

  • use_hf_reference – Whether to compare against HuggingFace model

  • use_ht_reference – Whether to compare against HookedTransformer

  • enable_compatibility_mode – Whether to enable compatibility mode on bridge

  • verbose – Whether to print results to console

  • track_memory – Whether to track and report memory usage (requires psutil)

  • test_weight_processing_individually – Whether to run granular weight processing tests that check each processing flag individually (default: False)

  • phases – Optional list of phase numbers to run (e.g., [1, 2, 3]). If None, runs all phases.

  • trust_remote_code – Whether to trust remote code for custom architectures.

  • scoring_model – Optional pre-loaded GPT-2 scoring model for Phase 4. When provided with scoring_tokenizer, avoids reloading for each model in batch.

  • scoring_tokenizer – Optional pre-loaded tokenizer for Phase 4 scoring model.

Returns:

List of BenchmarkResult objects