Chm02 CUDA: mainline correctness fixes, tests, and timing surface#84
Chm02 CUDA: mainline correctness fixes, tests, and timing surface#84
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 302f13b036
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| CUDA_CALL(cudaMalloc((void **)&DeviceFailures, sizeof(*DeviceFailures))); | ||
| CUDA_CALL(cudaMemsetAsync(DeviceFailures, 0, sizeof(*DeviceFailures), Stream)); |
There was a problem hiding this comment.
Free verify scratch buffer on CUDA API errors
After DeviceFailures is allocated, subsequent calls still use CUDA_CALL(...), which returns immediately on failure instead of jumping to End. In this function that skips the cudaFree(DeviceFailures) cleanup path, so any failure in cudaMemsetAsync, later memcpy/sync calls, etc. leaks device memory and can snowball across repeated verify attempts in long-running runs.
Useful? React with 👍 / 👎.
Closes #79
Summary
This PR mainlines the validated
Chm02CUDA bring-up work as a focused correctness-first integration slice.It promotes the legacy
Chm02CUDA path from a CPU-assisted bring-up flow toward a first-class correctness path for single-graph runs by moving the major solve phases onto the GPU, fixing Linux compatibility issues, adding focused regression coverage, and exposing explicit CUDA timing fields.Included
Graph.cu/GraphCu.cfixes for CUDA add-keys, peel/order capture, assignment, and verify.Chm02Compat/ file-work fixes needed for no-file-io and file-io parity.Chm02:Assigned16casedocs/chm02-cuda-mainline.mddescribing scope, non-goals, compatibility notes, and staged acceptance.Non-goals
Chm02CUDA kernels.Validation
Ran:
cmake --build build-cuda -j2ctest --test-dir build-cuda --output-on-failure -R 'perfecthash\.cuda\.chm02'Current focused CUDA
Chm02test result:Notes
This remains a correctness-first CUDA path. The serial CUDA kernels are intentionally bring-up oriented and should not be read as a throughput claim.