fix(testing_utils): guard get_device_capability() with torch.cuda.is_available() by Aftabbs · Pull Request #45427 · huggingface/transformers

Aftabbs · 2026-04-14T08:56:45Z

What does this PR do?

get_device_properties() in testing_utils.py calls torch.cuda.get_device_capability() whenever IS_CUDA_SYSTEM or IS_ROCM_SYSTEM is True. This raises a RuntimeError on environments where CUDA drivers are installed (so torch.version.cuda is not None) but no physical GPU is attached (e.g., Lightning AI Studio CPU-only instances, CI runners with CUDA drivers but no GPU).

Root cause: IS_CUDA_SYSTEM reflects whether the CUDA toolkit is present, not whether a CUDA-capable device is available at runtime. torch.cuda.get_device_capability() requires an actual device.

Fix: Add and torch.cuda.is_available() to the condition so that get_device_capability() is only called when a CUDA/ROCm device is actually present. When CUDA is installed but no device is available, the function falls through to the generic else branch and returns (torch_device, None, None).

-    if IS_CUDA_SYSTEM or IS_ROCM_SYSTEM:
+    if (IS_CUDA_SYSTEM or IS_ROCM_SYSTEM) and torch.cuda.is_available():

Before submitting

This PR fixes a bug (non-breaking change that fixes an issue)
This PR is a new feature (non-breaking change that adds functionality)
This PR is a breaking change (fix or feature that would cause existing functionality not to work as expected)
This PR adds tests that prove my fix is effective or that my feature works: N/A — the crash only occurs on systems with CUDA installed but no GPU, which is not a typical CI environment; the fix is a one-line guard matching existing patterns elsewhere in the file (e.g. line 995).

torch.cuda.get_device_capability() raises RuntimeError when CUDA is installed (IS_CUDA_SYSTEM=True) but no physical GPU is present (torch.cuda.is_available()=False). This happens on cloud environments like Lightning AI Studio that have CUDA drivers but no attached GPU. Add torch.cuda.is_available() to the condition so the function falls through to the generic else-branch (returning (torch_device, None, None)) when the CUDA/ROCm system flag is set but no device is actually available. Fixes huggingface#45341

Rocketknight1 · 2026-04-15T10:38:19Z

There was already an open PR doing basically the same thing! Please check for other PRs first before sending your agent to fix issues

Rocketknight1 closed this Apr 15, 2026

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(testing_utils): guard get_device_capability() with torch.cuda.is_available()#45427

fix(testing_utils): guard get_device_capability() with torch.cuda.is_available()#45427
Aftabbs wants to merge 1 commit intohuggingface:mainfrom
Aftabbs:fix/testing-utils-cuda-available-check

Aftabbs commented Apr 14, 2026 •

edited

Loading

Uh oh!

Rocketknight1 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Aftabbs commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

Rocketknight1 commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Aftabbs commented Apr 14, 2026 •

edited

Loading