transformer_lens.utilities.devices#

Devices.

Utilities to get the correct device, and assist in distributing model layers across multiple devices.

transformer_lens.utilities.devices.AvailableDeviceMemory#

This type is passed around between different CUDA memory operations. The first entry of each tuple will be the device index. The second entry will be how much memory is currently available.

alias of list[tuple[int, int]]

transformer_lens.utilities.devices.calculate_available_device_cuda_memory(i: int) int#

Calculates how much memory is available at this moment for the device at the indicated index

Parameters:

i (int) – The index we are looking at

Returns:

How memory is available

Return type:

int

transformer_lens.utilities.devices.determine_available_memory_for_available_devices(max_devices: int) list[tuple[int, int]]#

Gets all available CUDA devices with their current memory calculated

Returns:

The list of all available devices with memory precalculated

Return type:

AvailableDeviceMemory

transformer_lens.utilities.devices.get_best_available_cuda_device(max_devices: Optional[int] = None) device#

Gets whichever cuda device has the most available amount of memory for use

Raises:

EnvironmentError – If there are no available devices, this will error out

Returns:

The specific device that should be used

Return type:

torch.device

transformer_lens.utilities.devices.get_best_available_device(cfg: HookedTransformerConfig) device#

Gets the best available device to be used based on the passed in arguments

Parameters:

device (Union[torch.device, str]) – Either the existing torch device or the string identifier

Returns:

The best available device

Return type:

torch.device

transformer_lens.utilities.devices.get_device_for_block_index(index: int, cfg: HookedTransformerConfig, device: Optional[Union[device, str]] = None)#

Determine the device for a given layer index based on the model configuration.

This function assists in distributing model layers across multiple devices. The distribution is based on the configuration’s number of layers (cfg.n_layers) and devices (cfg.n_devices).

Parameters:
  • index (int) – Model layer index.

  • cfg (HookedTransformerConfig) – Model and device configuration.

  • device (Optional[Union[torch.device, str]], optional) – Initial device used for determining the target device. If not provided, the function uses the device specified in the configuration (cfg.device).

Returns:

The device for the specified layer index.

Return type:

torch.device

Deprecated:

This function did not take into account a few factors for multi-GPU support. You should now use get_best_available_device in order to properly run models on multiple devices. This will be removed in 3.0

transformer_lens.utilities.devices.move_to_and_update_config(model: Union[HookedTransformer, HookedEncoder, HookedEncoderDecoder], device_or_dtype: Union[device, str, dtype], print_details=True)#

Wrapper around to that also updates model.cfg.

transformer_lens.utilities.devices.sort_devices_based_on_available_memory(devices: list[tuple[int, int]]) list[tuple[int, int]]#

Sorts all available devices with devices with the most available memory returned first

Parameters:

devices (AvailableDeviceMemory) – All available devices with memory calculated

Returns:

The same list of passed through devices sorted with devices with most available memory first

Return type:

AvailableDeviceMemory