fix(testing_utils): guard get_device_capability with torch.cuda.is_available() by RudrenduPaul · Pull Request #45351 · huggingface/transformers

RudrenduPaul · 2026-04-09T21:51:42Z

What does this PR do?

Fixes a crash in get_device_properties() in testing_utils.py when CUDA is installed on the system but no GPU device is present (e.g., a CPU-only cloud studio with CUDA libraries installed).

The function called torch.cuda.get_device_capability() immediately after checking IS_CUDA_SYSTEM (which is True whenever torch.version.cuda is not None), without first verifying that an actual GPU is available. On CUDA-installed but GPU-less systems, get_device_capability() raises an error.

Fixes #45341

Changes

src/transformers/testing_utils.py: Add if not torch.cuda.is_available(): return (torch_device, None, None) guard inside the IS_CUDA_SYSTEM or IS_ROCM_SYSTEM branch of get_device_properties(), before the get_device_capability() call.

Tests

This is a fix to the test infrastructure itself (testing_utils.py). The change prevents a crash that occurs in environments where IS_CUDA_SYSTEM=True but no physical GPU is present (e.g., running pytest on a CPU-only Lightning AI studio).

No new tests were added because the existing test suite runs in environments where torch.cuda.is_available() is True — the crash scenario only reproduces on CUDA-installed, no-GPU systems.

Note: This PR was developed with AI assistance (Claude Code). I have reviewed every line and understand the change. This is not a duplicate of any existing open PR (checked open PRs searching for issue 45341 in body and keyword searches for get_device_capability + is_available).

…ailable()

Rocketknight1 · 2026-04-10T12:37:04Z

cc @remi-or to this as well as #45341, feel free to merge this if you're happy with it!

RudrenduPaul · 2026-04-11T06:14:19Z

Hi @remi-or — the run_tests CircleCI check is showing a failure. Investigating whether this is related to this PR or a pre-existing issue on main.

The change in this PR is a 2-line guard: adding and torch.cuda.is_available() to get_device_capability() in testing_utils.py. This should only affect test utilities when CUDA is installed but no GPU is present — it shouldn't affect processors tests at all.

Happy to look into the CircleCI logs more closely if you can confirm this is expected to be investigated before merge. Thanks!

remi-or

LGTM! The processors failure is unrelated.

MHRDYN7 · 2026-04-13T13:30:09Z

@remi-or @RudrenduPaul the current fix is neat, but doesn't it mean that if both cuda and xpu are installed and there is no gpu, the xpu case will be ignored? Also, actually this was also the case earlier due to the if elif pattern.

remi-or · 2026-04-14T00:46:44Z

Ok, then what about this @MHRDYN7 @RudrenduPaul

    if IS_CUDA_SYSTEM or IS_ROCM_SYSTEM:
        import torch

        if torch.cuda.is_available():
            major, minor = torch.cuda.get_device_capability()
            if IS_ROCM_SYSTEM:
                return ("rocm", major, minor)
            else:
                 return ("cuda", major, minor)
    if IS_XPU_SYSTEM:
        import torch

        if torch.xpu.is_available():
            ...

that way we escape of the the cuda / rocm block if cuda is not available, and we can enter the XPU block afterwards, and exit it for the same reason then.

@remi-or

…blocks Change elif chain to separate if blocks so that when CUDA is installed but no GPU is available, the code falls through to check XPU (and then NPU). Per @remi-or's suggestion in review. Built by Rudrendu Paul, developed with Claude Code

remi-or · 2026-04-16T09:20:10Z

+            gen_mask = 0x000000FF00000000
+            gen = (arch & gen_mask) >> 32
+            return ("xpu", gen, None)
+    if IS_NPU_SYSTEM:


Can we add a TODO so after torch 2.5.1 we also use if hasattr(torch, 'npu') and torch.npu.is_available() there ? To stay consistent. Thanks

RudrenduPaul · 2026-04-17T02:42:53Z

Thanks @remi-or @MHRDYN7 — that refactored structure looks great. It cleanly handles the case where both CUDA and XPU are installed but neither has a device available, and it keeps the early-import guard intact. I'll implement that pattern and push an update.

I'll also dig into the tests_torch / run_tests failure to confirm whether it's related to this change or a pre-existing flake on main.

RudrenduPaul · 2026-04-19T04:31:50Z

Implemented @remi-or's refactored structure — the elif chain has been replaced with separate if blocks so CUDA/ROCm and XPU paths are fully independent:

if IS_CUDA_SYSTEM or IS_ROCM_SYSTEM:
    import torch
    if torch.cuda.is_available():
        major, minor = torch.cuda.get_device_capability()
        if IS_ROCM_SYSTEM:
            return ("rocm", major, minor)
        else:
            return ("cuda", major, minor)
if IS_XPU_SYSTEM:
    import torch
    if torch.xpu.is_available():
        arch = torch.xpu.get_device_capability()["architecture"]
        ...
        return ("xpu", gen, None)
if IS_NPU_SYSTEM:
    return ("npu", None, None)
return (torch_device, None, None)

This handles the case @MHRDYN7 raised — if both CUDA and XPU are installed but neither has a device available, the code now falls through cleanly to check XPU (and then NPU) rather than returning early.

remi-or · 2026-04-22T01:35:54Z

Hey @RudrenduPaul , can you add the TODO I requested please? That way we can close this. Thanks!

RudrenduPaul and others added 3 commits April 9, 2026 14:51

fix(testing_utils): guard get_device_capability with torch.cuda.is_av…

891c722

…ailable()

Merge branch 'main' into fix/testing-utils-cuda-available-check

27f3322

Merge branch 'main' into fix/testing-utils-cuda-available-check

186ee03

remi-or approved these changes Apr 13, 2026

View reviewed changes

Merge branch 'main' into fix/testing-utils-cuda-available-check

c12cdf9

remi-or mentioned this pull request Apr 13, 2026

fix: check CUDA availability before calling get_device_capability #45371

Closed

remi-or self-requested a review April 14, 2026 00:46

RudrenduPaul and others added 2 commits April 13, 2026 17:49

Merge branch 'main' into fix/testing-utils-cuda-available-check

8a7ed09

remi-or reviewed Apr 16, 2026

View reviewed changes

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Rocketknight1 mentioned this pull request Apr 29, 2026

fix(testing): check torch.cuda.is_available() before get_device_capability #45697

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(testing_utils): guard get_device_capability with torch.cuda.is_available()#45351

fix(testing_utils): guard get_device_capability with torch.cuda.is_available()#45351
RudrenduPaul wants to merge 6 commits intohuggingface:mainfrom
RudrenduPaul:fix/testing-utils-cuda-available-check

RudrenduPaul commented Apr 9, 2026

Uh oh!

Rocketknight1 commented Apr 10, 2026

Uh oh!

RudrenduPaul commented Apr 11, 2026

Uh oh!

remi-or left a comment

Uh oh!

MHRDYN7 commented Apr 13, 2026

Uh oh!

remi-or commented Apr 14, 2026

Uh oh!

remi-or Apr 16, 2026

Uh oh!

RudrenduPaul commented Apr 17, 2026

Uh oh!

RudrenduPaul commented Apr 19, 2026

Uh oh!

remi-or commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

RudrenduPaul commented Apr 9, 2026

What does this PR do?

Changes

Tests

Uh oh!

Rocketknight1 commented Apr 10, 2026

Uh oh!

RudrenduPaul commented Apr 11, 2026

Uh oh!

remi-or left a comment

Choose a reason for hiding this comment

Uh oh!

MHRDYN7 commented Apr 13, 2026

Uh oh!

remi-or commented Apr 14, 2026

Uh oh!

remi-or Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

RudrenduPaul commented Apr 17, 2026

Uh oh!

RudrenduPaul commented Apr 19, 2026

Uh oh!

remi-or commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants