transformer_lens.benchmarks.audio module¶
Audio benchmarks for TransformerBridge.
Tests that audio encoder models (HuBERT, wav2vec2, etc.) correctly handle audio waveform inputs through forward(), run_with_cache(), and produce stable representations.
- transformer_lens.benchmarks.audio.benchmark_audio_cache(bridge: TransformerBridge, test_audio: Tensor) BenchmarkResult¶
Benchmark run_with_cache() for audio models.
Verifies that critical audio-specific hooks fire and produce valid tensors.
- Parameters:
bridge – TransformerBridge model to test
test_audio – Audio waveform tensor [batch, num_samples]
- transformer_lens.benchmarks.audio.benchmark_audio_ctc_decode(bridge: TransformerBridge) BenchmarkResult¶
Benchmark CTC decoding for HubertForCTC models.
Loads a small sample from librispeech_asr_dummy, decodes via greedy CTC, and reports the decoded text. Skipped for bare encoder models (no CTC head) and tiny-random models.
- Parameters:
bridge – TransformerBridge model to test
- transformer_lens.benchmarks.audio.benchmark_audio_feature_extractor(bridge: TransformerBridge, test_audio: Tensor) BenchmarkResult¶
Verify CNN feature extractor hook outputs.
Checks that the audio_feature_extractor.hook_out produces tensors with correct shape and non-degenerate values.
- Parameters:
bridge – TransformerBridge model to test
test_audio – Audio waveform tensor [batch, num_samples]
- transformer_lens.benchmarks.audio.benchmark_audio_forward(bridge: TransformerBridge, test_audio: Tensor, reference_model: Module | None = None) BenchmarkResult¶
Benchmark forward pass with audio input.
Compares bridge output against HF native model on the same waveform. For bare encoder models, compares last_hidden_state. For CTC models, compares logits.
- Parameters:
bridge – TransformerBridge model to test
test_audio – Audio waveform tensor [batch, num_samples]
reference_model – Optional HF reference model for comparison
- transformer_lens.benchmarks.audio.benchmark_audio_representation_stability(bridge: TransformerBridge, test_audio: Tensor) BenchmarkResult¶
Benchmark representation stability under small input perturbations.
Verifies that the model produces stable representations: similar audio inputs should produce similar hidden states. Skip for tiny-random models (random weights won’t produce stable representations).
- Parameters:
bridge – TransformerBridge model to test
test_audio – Audio waveform tensor [batch, num_samples]
- transformer_lens.benchmarks.audio.run_audio_benchmarks(bridge: TransformerBridge, test_audio: Tensor | None = None, verbose: bool = True) List[BenchmarkResult]¶
Run all audio benchmarks.
- Parameters:
bridge – TransformerBridge model to test
test_audio – Optional audio waveform tensor. If None, generates synthetic audio.
verbose – Whether to print progress
- Returns:
List of BenchmarkResult objects