fix(testing_utils): guard get_device_capability() with torch.cuda.is_available()#45427
Closed
Aftabbs wants to merge 1 commit intohuggingface:mainfrom
Closed
fix(testing_utils): guard get_device_capability() with torch.cuda.is_available()#45427Aftabbs wants to merge 1 commit intohuggingface:mainfrom
Aftabbs wants to merge 1 commit intohuggingface:mainfrom
Conversation
torch.cuda.get_device_capability() raises RuntimeError when CUDA is installed (IS_CUDA_SYSTEM=True) but no physical GPU is present (torch.cuda.is_available()=False). This happens on cloud environments like Lightning AI Studio that have CUDA drivers but no attached GPU. Add torch.cuda.is_available() to the condition so the function falls through to the generic else-branch (returning (torch_device, None, None)) when the CUDA/ROCm system flag is set but no device is actually available. Fixes huggingface#45341
Member
|
There was already an open PR doing basically the same thing! Please check for other PRs first before sending your agent to fix issues |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Fixes #45341.
get_device_properties()intesting_utils.pycallstorch.cuda.get_device_capability()wheneverIS_CUDA_SYSTEM or IS_ROCM_SYSTEMisTrue. This raises aRuntimeErroron environments where CUDA drivers are installed (sotorch.version.cuda is not None) but no physical GPU is attached (e.g., Lightning AI Studio CPU-only instances, CI runners with CUDA drivers but no GPU).Root cause:
IS_CUDA_SYSTEMreflects whether the CUDA toolkit is present, not whether a CUDA-capable device is available at runtime.torch.cuda.get_device_capability()requires an actual device.Fix: Add
and torch.cuda.is_available()to the condition so thatget_device_capability()is only called when a CUDA/ROCm device is actually present. When CUDA is installed but no device is available, the function falls through to the genericelsebranch and returns(torch_device, None, None).Before submitting