transformer_lens.utilities.multi_gpu module

Multi-GPU utilities.

Utilities for managing multiple GPU devices and distributing model layers across them.

transformer_lens.utilities.multi_gpu.AvailableDeviceMemory

This type is passed around between different CUDA memory operations. The first entry of each tuple will be the device index. The second entry will be how much memory is currently available.

alias of list[tuple[int, int]]

transformer_lens.utilities.multi_gpu.calculate_available_device_cuda_memory(i: int) int

Calculates how much memory is available at this moment for the device at the indicated index

Parameters:

i (int) – The index we are looking at

Returns:

How memory is available

Return type:

int

transformer_lens.utilities.multi_gpu.count_unique_devices(hf_model: Any) int

Count the number of unique devices across a dispatched HF model’s hf_device_map.

Returns 1 if the model has no hf_device_map (single-device load).

transformer_lens.utilities.multi_gpu.determine_available_memory_for_available_devices(max_devices: int) list[tuple[int, int]]

Gets all available CUDA devices with their current memory calculated

Returns:

The list of all available devices with memory precalculated

Return type:

AvailableDeviceMemory

transformer_lens.utilities.multi_gpu.find_embedding_device(hf_model: Any) device | None

Return the device that input tokens should be placed on for a dispatched HF model.

When a model is loaded with device_map, accelerate populates hf_device_map and inserts pre/post-forward hooks that route activations. Input tensors must land on the device of whichever module first consumes them — the input embedding. Returns None for single-device models (no hf_device_map set).

Resolves via hf_model.get_input_embeddings() rather than dict insertion order to cover encoder-decoder / multimodal / audio architectures where the first entry in hf_device_map is not the text-token embedding (e.g. the vision tower on LLaVA).

transformer_lens.utilities.multi_gpu.get_best_available_cuda_device(max_devices: int | None = None) device

Gets whichever cuda device has the most available amount of memory for use

Raises:

EnvironmentError – If there are no available devices, this will error out

Returns:

The specific device that should be used

Return type:

torch.device

transformer_lens.utilities.multi_gpu.get_best_available_device(cfg: Any) device

Gets the best available device to be used based on the passed in arguments

Parameters:

cfg – The HookedTransformerConfig object containing device configuration

Returns:

The best available device

Return type:

torch.device

transformer_lens.utilities.multi_gpu.get_device_for_block_index(index: int, cfg: Any, device: str | device | None = None)

Determine the device for a given layer index based on the model configuration.

This function assists in distributing model layers across multiple devices. The distribution is based on the configuration’s number of layers (cfg.n_layers) and devices (cfg.n_devices).

Parameters:
  • index (int) – Model layer index.

  • cfg – Model and device configuration.

  • device (Optional[Union[torch.device, str]], optional) – Initial device used for determining the target device. If not provided, the function uses the device specified in the configuration (cfg.device).

Returns:

The device for the specified layer index.

Return type:

torch.device

Deprecated:

This function did not take into account a few factors for multi-GPU support. You should now use get_best_available_device in order to properly run models on multiple devices. This will be removed in 3.0

transformer_lens.utilities.multi_gpu.resolve_device_map(n_devices: int | None, device_map: str | Dict[str, str | int] | None, device: str | device | None, max_memory: Dict[str | int, str] | None = None) Tuple[str | Dict[str, str | int] | None, Dict[str | int, str] | None]

Resolve n_devices / device_map / device into HF from_pretrained kwargs.

Returns (device_map, max_memory) tuple ready to pass into model_kwargs.

Semantics:
  • Explicit device_map wins; it’s validated and passed through unchanged (user- provided max_memory is passed through too).

  • n_devices=None or 1: returns (None, None) — single-device path.

  • n_devices > 1: returns ("balanced", {0: "auto", ..., n-1: "auto"}). "balanced" is accelerate’s string directive for balanced layer dispatch; the max_memory dict caps visibility to exactly n_devices GPUs.

transformer_lens.utilities.multi_gpu.sort_devices_based_on_available_memory(devices: list[tuple[int, int]]) list[tuple[int, int]]

Sorts all available devices with devices with the most available memory returned first

Parameters:

devices (AvailableDeviceMemory) – All available devices with memory calculated

Returns:

The same list of passed through devices sorted with devices with most available memory first

Return type:

AvailableDeviceMemory