GPU implementation of sparse equality constraint Jacobian#48
GPU implementation of sparse equality constraint Jacobian#48kswirydo wants to merge 5 commits intoolcf-hackathon-2026-devfrom
Conversation
Replace the PETSc-based equality constraint Jacobian computation in the PBPOLRAJAHIOPSPARSE model with direct GPU kernels using RAJA, eliminating the D2H-compute-H2D round trip. The sparsity pattern is now computed on the host during setup and the values are computed entirely on device. Key changes: - Add ComputeEqJacValuesGPU_PBPOLRAJAHIOPSPARSE in new gpu.cpp/hpp files - Add device arrays for flat-array indices (bus eqjacsp_selfidx, line eqjacsp_idx/eqjacsp_diag_idx/isdcline, gen xpdevidx/xpsetidx) - Fix nnz counting bugs (missing gen/load entries, off-by-one in line loop) and populate flat-array indices during model setup - Replace PETSc MatGetRow extraction in sparsity and values phases - Handle parallel lines by sharing off-diagonal positions with atomicAdd - Use pre-computed nnz in get_sparse_blocks_info instead of PETSc query - Add correctness test (test_eqjac_compare) and performance benchmark (test_eqjac_perf) Made-with: Cursor
Integrates the inequality constraint Jacobian GPU port (#40) with the equality constraint Jacobian GPU port. Both GPU kernels now coexist in pbpolrajahiopsparse_gpu.cpp. Made-with: Cursor
|
All Frontier tests pass! |
pelesh
left a comment
There was a problem hiding this comment.
Since this is fully functional and the target branch is not develop, I suggest we merge this. We will need one refactoring/improved testing PR before the hackathon branch is ready to be merged to develop.
Thank you, @kswirydo !
This is currently failing the I agree the need for refactoring is not a blocker. |
nkoukpaizan
left a comment
There was a problem hiding this comment.
I found out the issue is that the equality Jacobian sparsity pattern is not sorted. The other tests pass, because we compare the indices and values in an unordered way, but the HiOp test fails because the solvers expect a sorted CSR.
This also made me realize that for the inequality Jacobian, we are still copying the sparsity pattern from the PETSc matrix.
Good catch! As far as I remember HiOp expects sorted COO (which is not a big difference from sorted CSR). |
Rebased and fixed merge conflicts.
Port equality constraint Jacobian from PETSc to RAJA GPU kernels
Replace the PETSc-based equality constraint Jacobian computation in the
PBPOLRAJAHIOPSPARSE model with direct GPU kernels using RAJA, eliminating
the D2H-compute-H2D round trip. The sparsity pattern is now computed on
the host during setup and the values are computed entirely on device.
Key changes:
eqjacsp_idx/eqjacsp_diag_idx/isdcline, gen xpdevidx/xpsetidx)
loop) and populate flat-array indices during model setup
(test_eqjac_perf)
Made-with: Cursor