Skip to content

GPU implementation of sparse equality constraint Jacobian#48

Open
kswirydo wants to merge 5 commits intoolcf-hackathon-2026-devfrom
ks-eq-const-REBASED
Open

GPU implementation of sparse equality constraint Jacobian#48
kswirydo wants to merge 5 commits intoolcf-hackathon-2026-devfrom
ks-eq-const-REBASED

Conversation

@kswirydo
Copy link
Copy Markdown
Collaborator

@kswirydo kswirydo commented Apr 23, 2026

Rebased and fixed merge conflicts.

Port equality constraint Jacobian from PETSc to RAJA GPU kernels

Replace the PETSc-based equality constraint Jacobian computation in the
PBPOLRAJAHIOPSPARSE model with direct GPU kernels using RAJA, eliminating
the D2H-compute-H2D round trip. The sparsity pattern is now computed on
the host during setup and the values are computed entirely on device.

Key changes:

  • Add ComputeEqJacValuesGPU_PBPOLRAJAHIOPSPARSE in new gpu.cpp/hpp files
  • Add device arrays for flat-array indices (bus eqjacsp_selfidx, line
    eqjacsp_idx/eqjacsp_diag_idx/isdcline, gen xpdevidx/xpsetidx)
  • Fix nnz counting bugs (missing gen/load entries, off-by-one in line
    loop) and populate flat-array indices during model setup
  • Replace PETSc MatGetRow extraction in sparsity and values phases
  • Handle parallel lines by sharing off-diagonal positions with atomicAdd
  • Use pre-computed nnz in get_sparse_blocks_info instead of PETSc query
  • Add correctness test (test_eqjac_compare) and performance benchmark
    (test_eqjac_perf)

Made-with: Cursor

Replace the PETSc-based equality constraint Jacobian computation in the
PBPOLRAJAHIOPSPARSE model with direct GPU kernels using RAJA, eliminating
the D2H-compute-H2D round trip. The sparsity pattern is now computed on
the host during setup and the values are computed entirely on device.

Key changes:
- Add ComputeEqJacValuesGPU_PBPOLRAJAHIOPSPARSE in new gpu.cpp/hpp files
- Add device arrays for flat-array indices (bus eqjacsp_selfidx, line
  eqjacsp_idx/eqjacsp_diag_idx/isdcline, gen xpdevidx/xpsetidx)
- Fix nnz counting bugs (missing gen/load entries, off-by-one in line
  loop) and populate flat-array indices during model setup
- Replace PETSc MatGetRow extraction in sparsity and values phases
- Handle parallel lines by sharing off-diagonal positions with atomicAdd
- Use pre-computed nnz in get_sparse_blocks_info instead of PETSc query
- Add correctness test (test_eqjac_compare) and performance benchmark
  (test_eqjac_perf)

Made-with: Cursor
Integrates the inequality constraint Jacobian GPU port (#40) with
the equality constraint Jacobian GPU port. Both GPU kernels now
coexist in pbpolrajahiopsparse_gpu.cpp.

Made-with: Cursor
@kswirydo kswirydo requested review from nkoukpaizan and pelesh April 23, 2026 19:51
@pelesh
Copy link
Copy Markdown
Collaborator

pelesh commented Apr 23, 2026

All Frontier tests pass!

@pelesh pelesh changed the title Ks eq const rebased GPU implementation of sparse equality constraint Jacobian Apr 23, 2026
@pelesh pelesh added enhancement New feature or request opflow Related to ACOPF computations labels Apr 23, 2026
Copy link
Copy Markdown
Collaborator

@pelesh pelesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is fully functional and the target branch is not develop, I suggest we merge this. We will need one refactoring/improved testing PR before the hackathon branch is ready to be merged to develop.

Thank you, @kswirydo !

@nkoukpaizan
Copy link
Copy Markdown
Collaborator

Since this is fully functional and the target branch is not develop, I suggest we merge this. We will need one refactoring/improved testing PR before the hackathon branch is ready to be merged to develop.

Thank you, @kswirydo !

This is currently failing the FUNCTIONALITY_TEST_OPFLOW_RAJAHIOP_SPARSE_GPU_TOML_TESTSUITE on the CUDA backend. I should find out why that is before merging to the target branch.

I agree the need for refactoring is not a blocker.

Copy link
Copy Markdown
Collaborator

@nkoukpaizan nkoukpaizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found out the issue is that the equality Jacobian sparsity pattern is not sorted. The other tests pass, because we compare the indices and values in an unordered way, but the HiOp test fails because the solvers expect a sorted CSR.

This also made me realize that for the inequality Jacobian, we are still copying the sparsity pattern from the PETSc matrix.

@pelesh
Copy link
Copy Markdown
Collaborator

pelesh commented Apr 25, 2026

I found out the issue is that the equality Jacobian sparsity pattern is not sorted. The other tests pass, because we compare the indices and values in an unordered way, but the HiOp test fails because the solvers expect a sorted CSR.

This also made me realize that for the inequality Jacobian, we are still copying the sparsity pattern from the PETSc matrix.

Good catch! As far as I remember HiOp expects sorted COO (which is not a big difference from sorted CSR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request opflow Related to ACOPF computations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants