transformer_lens.utilities.multi_gpu module¶
Multi-GPU utilities.
Utilities for managing multiple GPU devices and distributing model layers across them.
- transformer_lens.utilities.multi_gpu.AvailableDeviceMemory¶
This type is passed around between different CUDA memory operations. The first entry of each tuple will be the device index. The second entry will be how much memory is currently available.
alias of
list[tuple[int,int]]
- transformer_lens.utilities.multi_gpu.calculate_available_device_cuda_memory(i: int) int¶
Calculates how much memory is available at this moment for the device at the indicated index
- Parameters:
i (int) – The index we are looking at
- Returns:
How memory is available
- Return type:
int
- transformer_lens.utilities.multi_gpu.count_unique_devices(hf_model: Any) int¶
Count the number of unique devices across a dispatched HF model’s
hf_device_map.Returns 1 if the model has no
hf_device_map(single-device load).
- transformer_lens.utilities.multi_gpu.determine_available_memory_for_available_devices(max_devices: int) list[tuple[int, int]]¶
Gets all available CUDA devices with their current memory calculated
- Returns:
The list of all available devices with memory precalculated
- Return type:
AvailableDeviceMemory
- transformer_lens.utilities.multi_gpu.find_embedding_device(hf_model: Any) device | None¶
Return the device that input tokens should be placed on for a dispatched HF model.
When a model is loaded with
device_map, accelerate populateshf_device_mapand inserts pre/post-forward hooks that route activations. Input tensors must land on the device of whichever module first consumes them — the input embedding. ReturnsNonefor single-device models (nohf_device_mapset).Resolves via
hf_model.get_input_embeddings()rather than dict insertion order to cover encoder-decoder / multimodal / audio architectures where the first entry inhf_device_mapis not the text-token embedding (e.g. the vision tower on LLaVA).
- transformer_lens.utilities.multi_gpu.get_best_available_cuda_device(max_devices: int | None = None) device¶
Gets whichever cuda device has the most available amount of memory for use
- Raises:
EnvironmentError – If there are no available devices, this will error out
- Returns:
The specific device that should be used
- Return type:
torch.device
- transformer_lens.utilities.multi_gpu.get_best_available_device(cfg: Any) device¶
Gets the best available device to be used based on the passed in arguments
- Parameters:
cfg – The HookedTransformerConfig object containing device configuration
- Returns:
The best available device
- Return type:
torch.device
- transformer_lens.utilities.multi_gpu.get_device_for_block_index(index: int, cfg: Any, device: str | device | None = None)¶
Determine the device for a given layer index based on the model configuration.
This function assists in distributing model layers across multiple devices. The distribution is based on the configuration’s number of layers (cfg.n_layers) and devices (cfg.n_devices).
- Parameters:
index (int) – Model layer index.
cfg – Model and device configuration.
device (Optional[Union[torch.device, str]], optional) – Initial device used for determining the target device. If not provided, the function uses the device specified in the configuration (cfg.device).
- Returns:
The device for the specified layer index.
- Return type:
torch.device
- Deprecated:
This function did not take into account a few factors for multi-GPU support. You should now use get_best_available_device in order to properly run models on multiple devices. This will be removed in 3.0
- transformer_lens.utilities.multi_gpu.resolve_device_map(n_devices: int | None, device_map: str | Dict[str, str | int] | None, device: str | device | None, max_memory: Dict[str | int, str] | None = None) Tuple[str | Dict[str, str | int] | None, Dict[str | int, str] | None]¶
Resolve
n_devices/device_map/deviceinto HFfrom_pretrainedkwargs.Returns
(device_map, max_memory)tuple ready to pass intomodel_kwargs.- Semantics:
Explicit
device_mapwins; it’s validated and passed through unchanged (user- providedmax_memoryis passed through too).n_devices=Noneor1: returns(None, None)— single-device path.n_devices > 1: returns("balanced", {0: "auto", ..., n-1: "auto"})."balanced"is accelerate’s string directive for balanced layer dispatch; themax_memorydict caps visibility to exactlyn_devicesGPUs.
- transformer_lens.utilities.multi_gpu.sort_devices_based_on_available_memory(devices: list[tuple[int, int]]) list[tuple[int, int]]¶
Sorts all available devices with devices with the most available memory returned first
- Parameters:
devices (AvailableDeviceMemory) – All available devices with memory calculated
- Returns:
The same list of passed through devices sorted with devices with most available memory first
- Return type:
AvailableDeviceMemory