Skip to content

Randomized solver (both cuda and hip)#83

Merged
pelesh merged 9 commits intodevelopfrom
rand-gmres-dev2
Dec 2, 2023
Merged

Randomized solver (both cuda and hip)#83
pelesh merged 9 commits intodevelopfrom
rand-gmres-dev2

Conversation

@kswirydo
Copy link
Copy Markdown
Collaborator

@kswirydo kswirydo commented Nov 28, 2023

Working randomized GMRES with preconditioner. You can make it into FGMRES by setting flexible to 1.

Caveat: the cuda version uses ancient version of triangular solve cusparseDcsrsv2 (I am using ILU0 preconditioner that needs triangular solve); despite my best efforts, I was not able to get the more modern version, cusparseSpSV to generate correct results. Hence, the code generates correct results... but uses older version of triangular solve so it compiles with tones of warnings on deprecated CUDA functions. I followed this example from cuda samples to no avail. See previous commits.

What has been added:

  • Two randomized classes (for fast Walsh Hadamard tranform based sketching and for Count-Sketch based sketching); cuda and hip kerneles for both,
  • Randomized solver that deploys Theta (sketching).
  • A cuda and hip-based examples.

To test, run runResolveOneMatrix with matrix of your choice (I tested with vas_stokes_1M from SuiteSparse).

Also fixes #84

Copy link
Copy Markdown
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work! My main suggestion is to have functionality tests for CUDA and HIP backends implemented and added to this PR. There are a few minor issues I also mentioned in this PR.

Comment thread resolve/GramSchmidt.cpp Outdated
Comment thread resolve/CMakeLists.txt
Comment thread resolve/CMakeLists.txt Outdated
Comment thread resolve/LinSolverDirectRocSparseILU0.cpp Outdated
Comment thread resolve/LinSolverIterativeFGMRES.hpp Outdated
Comment thread resolve/cuda/cudaKernels.cu
Comment thread resolve/hip/hipKernels.hip
Comment thread resolve/hip/hipKernels.hip
Comment thread runResolveOneMatrix Outdated
Comment thread profile.sysinfo.txt Outdated
@pelesh
Copy link
Copy Markdown
Collaborator

pelesh commented Nov 28, 2023

Ascent build fails with following error message:

Building CUDA object resolve/cuda/CMakeFiles/resolve_backend_cuda.dir/MemoryUtils.cu.o
/gpfs/wolf/proj-shared/csc359/ci/489235/resolve/cuda/cudaKernels.cu(157): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (double *, double)
1 error detected in the compilation of "/gpfs/wolf/proj-shared/csc359/ci/489235/resolve/cuda/cudaKernels.cu".
make[2]: *** [resolve/cuda/CMakeFiles/resolve_backend_cuda.dir/build.make:76: resolve/cuda/CMakeFiles/resolve_backend_cuda.dir/cudaKernels.cu.o] Error 1

@kswirydo
Copy link
Copy Markdown
Collaborator Author

You need cuda architecture 7.0 (or 70) to build this. Othewise atomicAdd is defined only for floats but not for doubles. Default is 5.2 somehow.

@cameronrutherford
Copy link
Copy Markdown
Collaborator

I don't know why PNNL tests aren't running, but now ORNL tests are passing @ryandanehy

@kswirydo
Copy link
Copy Markdown
Collaborator Author

My lastest commit has a working version of cuda 11+ compatible SpSV triangular solve in ILU0 class. I also applied Slaven's dependency comments to 2 files.

Copy link
Copy Markdown
Collaborator

@cameronrutherford cameronrutherford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good!

Comment thread examples/r_randGMRES.cpp
Comment thread examples/CMakeLists.txt
Comment thread resolve/CMakeLists.txt
Comment thread resolve/LinSolverDirectCuSparseILU0.cpp
Comment thread resolve/LinSolverDirectCuSparseILU0.hpp
Comment thread resolve/LinSolverIterativeFGMRES.hpp
Comment thread resolve/LinSolverIterativeRandFGMRES.hpp
Comment thread resolve/RandSketchingManager.hpp
Comment thread resolve/cuda/cudaKernels.cu
Comment thread runResolveOneMatrix Outdated
@ryandanehy
Copy link
Copy Markdown
Collaborator

@kswirydo to get this to run PNNL tests the pr conflicts need to be fixed (probally best to do it through rebase). Then PNNL tests should start running.

Implementations for AMD and NVIDIA GPUs
@pelesh
Copy link
Copy Markdown
Collaborator

pelesh commented Nov 29, 2023

@kswirydo to get this to run PNNL tests the pr conflicts need to be fixed (probally best to do it through rebase). Then PNNL tests should start running.

It seems all we needed was to rebase to develop, not sure why.

Comment thread resolve/LinSolverDirectRocSparseILU0.hpp Outdated
Comment thread resolve/LinSolverDirectRocSparseILU0.cpp Outdated
Co-authored-by: pelesh <peless@ornl.gov>
Copy link
Copy Markdown
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining items before this can be merged:

  • Add functionality tests for randomized (F)GMRES on AMD and NVIDIA GPUs
  • Use STL math on the host (this will take care of a big chunk of warnings)
  • Fix other warnings
  • Remove roctx.h header from LinSolverDirectRocSparseILU0.hpp.

Almost there ;)

Co-authored-by: pelesh <peless@ornl.gov>
@pelesh
Copy link
Copy Markdown
Collaborator

pelesh commented Nov 30, 2023

I fixed our little mishap with rebasing. All is needed to do is to:

  • Generate test matrices inside functionality tests. This is lesser evil than keeping 170MB test matrix in a repo.
  • Add again randomized solvers tests to CMake build and testing.

@cameronrutherford
Copy link
Copy Markdown
Collaborator

I fixed our little mishap with rebasing. All is needed to do is to:

  • Generate test matrices inside functionality tests. This is lesser evil than keeping 170MB test matrix in a repo.
  • Add again randomized solvers tests to CMake build and testing.

GitHub shouldn't allow files that big. pre-commit can also help detect when these large files are added, and custom thresholds can be configured cc @ryandanehy

Copy link
Copy Markdown
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All review issues have been addressed. Merge once CI tests pass.

@pelesh pelesh self-assigned this Dec 2, 2023
@pelesh pelesh merged commit 4c009e6 into develop Dec 2, 2023
@pelesh pelesh deleted the rand-gmres-dev2 branch December 11, 2023 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CUDA capability not set on Ascent pipeline

4 participants