Fix CUDA GPU detection crash when running with device=cpu on machines without GPU#6759
Fix CUDA GPU detection crash when running with device=cpu on machines without GPU#6759
Conversation
- Add runtime check for PARAM.inp.device before calling GPU detection functions - Prevent cudaErrcheck from exiting when no GPU is available but device is set to "cpu" - Apply same logic as non-MPI version to MPI version in output_device_info Co-authored-by: dzzz2001 <153698752+dzzz2001@users.noreply.github.com>
|
LGTM |
|
@copilot 帮我进一步重构:### 核心思路
第 1 步:在
|
- Add probe_gpu_availability() function that safely checks GPU without exiting - Refactor get_device_flag() to support device=auto with clear user feedback - Implement MPI broadcast for consistent device selection across all ranks - Add safety check in get_device_kpar() to prevent GPU calls when unavailable - Add informative messages for device selection (INFO/WARNING) Co-authored-by: dzzz2001 <153698752+dzzz2001@users.noreply.github.com>
- Remove redundant probe_gpu_availability check that could cause MPI inconsistency - Function is only called when device=gpu is already confirmed - Add comment clarifying the function should only be called in GPU mode Co-authored-by: dzzz2001 <153698752+dzzz2001@users.noreply.github.com>
I've implemented the refactoring as requested. Here's what was done: Commits:
Key Changes:
The implementation follows your suggested approach with proper MPI synchronization and clear user messaging. |
Completed Implementation
This PR addresses the original bug and implements the requested
device=autorefactoring.Changes Summary
1. Original Bug Fix (commit 996199b)
device=cpuon GPU-less machinesoutput_device_info()to skip GPU detection unlessdevice=gpu2. Device Auto-Selection Feature (commits f4653cb, e12b1cd)
probe_gpu_availability()- safe GPU detection without program exitget_device_flag()- supports cpu/gpu/auto modes with MPI broadcastget_device_kpar()- simplified to avoid MPI synchronization issuesBehavior Matrix
Technical Details
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.