transformer_lens.utilities.multi_gpu module

Multi-GPU utilities.

Utilities for managing multiple GPU devices and distributing model layers across them.

transformer_lens.utilities.multi_gpu.AvailableDeviceMemory

This type is passed around between different CUDA memory operations. The first entry of each tuple will be the device index. The second entry will be how much memory is currently available.

alias of list[tuple[int, int]]

transformer_lens.utilities.multi_gpu.calculate_available_device_cuda_memory(i: int) int

Calculates how much memory is available at this moment for the device at the indicated index

Parameters:

i (int) – The index we are looking at

Returns:

How memory is available

Return type:

int

transformer_lens.utilities.multi_gpu.determine_available_memory_for_available_devices(max_devices: int) list[tuple[int, int]]

Gets all available CUDA devices with their current memory calculated

Returns:

The list of all available devices with memory precalculated

Return type:

AvailableDeviceMemory

transformer_lens.utilities.multi_gpu.get_best_available_cuda_device(max_devices: int | None = None) device

Gets whichever cuda device has the most available amount of memory for use

Raises:

EnvironmentError – If there are no available devices, this will error out

Returns:

The specific device that should be used

Return type:

torch.device

transformer_lens.utilities.multi_gpu.get_best_available_device(cfg: Any) device

Gets the best available device to be used based on the passed in arguments

Parameters:

cfg – The HookedTransformerConfig object containing device configuration

Returns:

The best available device

Return type:

torch.device

transformer_lens.utilities.multi_gpu.get_device_for_block_index(index: int, cfg: Any, device: str | device | None = None)

Determine the device for a given layer index based on the model configuration.

This function assists in distributing model layers across multiple devices. The distribution is based on the configuration’s number of layers (cfg.n_layers) and devices (cfg.n_devices).

Parameters:
  • index (int) – Model layer index.

  • cfg – Model and device configuration.

  • device (Optional[Union[torch.device, str]], optional) – Initial device used for determining the target device. If not provided, the function uses the device specified in the configuration (cfg.device).

Returns:

The device for the specified layer index.

Return type:

torch.device

Deprecated:

This function did not take into account a few factors for multi-GPU support. You should now use get_best_available_device in order to properly run models on multiple devices. This will be removed in 3.0

transformer_lens.utilities.multi_gpu.sort_devices_based_on_available_memory(devices: list[tuple[int, int]]) list[tuple[int, int]]

Sorts all available devices with devices with the most available memory returned first

Parameters:

devices (AvailableDeviceMemory) – All available devices with memory calculated

Returns:

The same list of passed through devices sorted with devices with most available memory first

Return type:

AvailableDeviceMemory