Skip to content

fix(testing): check torch.cuda.is_available() before get_device_capability#45697

Closed
PHclaw wants to merge 1 commit intohuggingface:mainfrom
PHclaw:fix/cuda-headless-crash
Closed

fix(testing): check torch.cuda.is_available() before get_device_capability#45697
PHclaw wants to merge 1 commit intohuggingface:mainfrom
PHclaw:fix/cuda-headless-crash

Conversation

@PHclaw
Copy link
Copy Markdown

@PHclaw PHclaw commented Apr 29, 2026

Summary

Fixes #45341

get_device_properties() crashes with an error on machines that have CUDA installed (torch.version.cuda is not None) but no physical GPU available. This happens on cloud instances like Lightning AI Studio where CUDA runtime is present but no GPU is attached.

Bug

# Line 3207-3210
if IS_CUDA_SYSTEM or IS_ROCM_SYSTEM:
    import torch
    major, minor = torch.cuda.get_device_capability()  # Crashes if no GPU!

IS_CUDA_SYSTEM is set to True when torch.version.cuda is not None, but this only means CUDA runtime is available, not that a GPU is actually present.

Fix

  • Add torch.cuda.is_available() check to the condition on line 3207
  • Remove the redundant import torch (torch is already imported at module level for IS_CUDA_SYSTEM/IS_ROCM_SYSTEM checks)
if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) and torch.cuda.is_available():
    major, minor = torch.cuda.get_device_capability()

This gracefully falls through to the else branch when no GPU is available, returning (torch_device, None, None) instead of crashing.

@Rocketknight1
Copy link
Copy Markdown
Member

PR already open at #45351

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

A little bug in testing_utils.py

2 participants