Fix GPU single precision energy error in dav_subspace solver#6946
Merged
mohanchen merged 7 commits intodeepmodeling:developfrom Feb 4, 2026
Merged
Fix GPU single precision energy error in dav_subspace solver#6946mohanchen merged 7 commits intodeepmodeling:developfrom
mohanchen merged 7 commits intodeepmodeling:developfrom
Conversation
It appears that GEMM with dimension 1 can be buggy for GPU (cuBLAS)
- Restore d_precondition host-to-device sync that was commented out in deepmodeling#5199 (this caused uninitialized GPU memory to be used as the preconditioner) - Fix cuBLAS gemv calls using incx instead of incy for Y parameter - Fix gemv_batched using incy instead of incx for x parameter Fixes GPU single precision energy being ~0.027 eV off from correct value.
Fixed 3 hipBLAS gemv calls that incorrectly used incx instead of incy for the Y vector stride parameter: - hipblasDgemv (double) - hipblasCgemv (complex<float>) - hipblasZgemv (complex<double>) This is the same bug that was fixed in the CUDA version (math_kernel_op.cu). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Collaborator
|
LGTM, Thanks for this fix, I will check the result on ROCM. |
dyzheng
approved these changes
Feb 2, 2026
Critsium-xy
approved these changes
Feb 2, 2026
Cstandardlib
approved these changes
Feb 2, 2026
Collaborator
|
And there are some code conflicts that need to be resolved, caused by #6936, which unify the CUBLAS check macro(cublasErrcheck->CHECK_CUBLAS). |
… fix-gpu-double-cu128
…develop into fix-gpu-double-cu128
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
d_preconditionhost-to-device sync that was commented out in Feature: update new version of dav_subspace with higher performance #5199 (this caused uninitialized GPU memory to be used as the preconditioner)incxinstead ofincyfor Y parameterincyinstead ofincxfor x parameterRoot Cause for the dav_subspace solver problem
The
syncmem_var_h2d_op()call ford_preconditionwas commented out in commit a5c35d9 (#5199), causing the GPU preconditioner array to contain uninitialized memory. This led to incorrect preconditioning in the Davidson subspace iterations, resulting in wrong energies for GPU single precision calculations (~0.027 eV error).Note: The problem may not always be reproducible since uninitialized memory may contain random data.
Reason for replacing gemm with gemv
I found the code crashes with GPU double precision in dav and dav_supspace. Not a problem with the CG solver. The crash occurs at the gemm call with
notconv=1.Test Results
Tested with
examples/02_scf/pw_Si2on CUDA 12.8 with NVIDIA GeForce RTX 5090:Fixes #6867