transformer_lens.benchmarks.component_outputs module

Comprehensive component benchmarking utility for TransformerBridge.

This module provides utilities to benchmark all standard components in a TransformerBridge model against their HuggingFace equivalents, ensuring output parity.

class transformer_lens.benchmarks.component_outputs.BenchmarkReport(model_name: str, total_components: int, passed_components: int, failed_components: int, component_results: ~typing.List[~transformer_lens.benchmarks.component_outputs.ComponentTestResult] = <factory>)

Bases: object

Complete benchmark report for all components.

component_results: List[ComponentTestResult]
failed_components: int
get_component_type_summary() Dict[str, Dict[str, int]]

Get a summary of results grouped by component type.

Returns:

Dictionary mapping component types to their pass/fail counts

get_failure_by_severity() Dict[str, List[ComponentTestResult]]

Group failures by severity level.

Returns:

Dictionary mapping severity levels to lists of failed components

model_name: str
property pass_rate: float

Calculate the pass rate as a percentage.

passed_components: int
print_detailed_analysis() None

Print detailed analysis of benchmark results.

print_summary(verbose: bool = False) None

Print a summary of the benchmark results.

Parameters:

verbose – If True, print details for all components. If False, only print failures.

total_components: int
class transformer_lens.benchmarks.component_outputs.ComponentBenchmarker(bridge_model: Module, hf_model: Module, adapter: ArchitectureAdapter, cfg: TransformerBridgeConfig, atol: float = 0.0001, rtol: float = 0.0001)

Bases: object

Benchmarking utility for testing TransformerBridge components against HuggingFace.

__init__(bridge_model: Module, hf_model: Module, adapter: ArchitectureAdapter, cfg: TransformerBridgeConfig, atol: float = 0.0001, rtol: float = 0.0001)

Initialize the component benchmarker.

Parameters:
  • bridge_model – The TransformerBridge model

  • hf_model – The HuggingFace model

  • adapter – The architecture adapter for mapping components

  • cfg – The model configuration

  • atol – Absolute tolerance for comparing outputs

  • rtol – Relative tolerance for comparing outputs

benchmark_all_components(test_inputs: Dict[str, Tensor] | None = None, skip_components: List[str] | None = None) BenchmarkReport

Benchmark all components in the model.

Parameters:
  • test_inputs – Optional dictionary of pre-generated test inputs. If None, will generate default inputs.

  • skip_components – Optional list of component paths to skip

Returns:

BenchmarkReport with results for all tested components

class transformer_lens.benchmarks.component_outputs.ComponentTestResult(component_path: str, component_type: str, passed: bool, max_diff: float, mean_diff: float, output_shape: Tuple[int, ...], error_message: str | None = None, percentile_diffs: Dict[str, float] | None = None)

Bases: object

Result of testing a single component.

component_path: str
component_type: str
error_message: str | None = None
get_failure_severity() str

Categorize the severity of a failure.

Returns:

“critical”, “high”, “medium”, “low”, or “pass”

Return type:

Severity level

max_diff: float
mean_diff: float
output_shape: Tuple[int, ...]
passed: bool
percentile_diffs: Dict[str, float] | None = None
transformer_lens.benchmarks.component_outputs.benchmark_model(model_name: str, device: str = 'cpu', atol: float = 0.0001, rtol: float = 0.0001, skip_components: List[str] | None = None, verbose: bool = False) BenchmarkReport

Benchmark all components in a model.

Parameters:
  • model_name – Name of the HuggingFace model to benchmark

  • device – Device to run on

  • atol – Absolute tolerance for comparisons

  • rtol – Relative tolerance for comparisons

  • skip_components – Optional list of component paths to skip

  • verbose – If True, print detailed results for all components

Returns:

BenchmarkReport with results for all components