transformer_lens.benchmarks.audio module

Audio benchmarks for TransformerBridge.

Tests that audio encoder models (HuBERT, wav2vec2, etc.) correctly handle audio waveform inputs through forward(), run_with_cache(), and produce stable representations.

transformer_lens.benchmarks.audio.benchmark_audio_cache(bridge: TransformerBridge, test_audio: Tensor) BenchmarkResult

Benchmark run_with_cache() for audio models.

Verifies that critical audio-specific hooks fire and produce valid tensors.

Parameters:
  • bridge – TransformerBridge model to test

  • test_audio – Audio waveform tensor [batch, num_samples]

transformer_lens.benchmarks.audio.benchmark_audio_ctc_decode(bridge: TransformerBridge) BenchmarkResult

Benchmark CTC decoding for HubertForCTC models.

Loads a small sample from librispeech_asr_dummy, decodes via greedy CTC, and reports the decoded text. Skipped for bare encoder models (no CTC head) and tiny-random models.

Parameters:

bridge – TransformerBridge model to test

transformer_lens.benchmarks.audio.benchmark_audio_feature_extractor(bridge: TransformerBridge, test_audio: Tensor) BenchmarkResult

Verify CNN feature extractor hook outputs.

Checks that the audio_feature_extractor.hook_out produces tensors with correct shape and non-degenerate values.

Parameters:
  • bridge – TransformerBridge model to test

  • test_audio – Audio waveform tensor [batch, num_samples]

transformer_lens.benchmarks.audio.benchmark_audio_forward(bridge: TransformerBridge, test_audio: Tensor, reference_model: Module | None = None) BenchmarkResult

Benchmark forward pass with audio input.

Compares bridge output against HF native model on the same waveform. For bare encoder models, compares last_hidden_state. For CTC models, compares logits.

Parameters:
  • bridge – TransformerBridge model to test

  • test_audio – Audio waveform tensor [batch, num_samples]

  • reference_model – Optional HF reference model for comparison

transformer_lens.benchmarks.audio.benchmark_audio_representation_stability(bridge: TransformerBridge, test_audio: Tensor) BenchmarkResult

Benchmark representation stability under small input perturbations.

Verifies that the model produces stable representations: similar audio inputs should produce similar hidden states. Skip for tiny-random models (random weights won’t produce stable representations).

Parameters:
  • bridge – TransformerBridge model to test

  • test_audio – Audio waveform tensor [batch, num_samples]

transformer_lens.benchmarks.audio.run_audio_benchmarks(bridge: TransformerBridge, test_audio: Tensor | None = None, verbose: bool = True) List[BenchmarkResult]

Run all audio benchmarks.

Parameters:
  • bridge – TransformerBridge model to test

  • test_audio – Optional audio waveform tensor. If None, generates synthetic audio.

  • verbose – Whether to print progress

Returns:

List of BenchmarkResult objects