Chm02 CUDA: mainline correctness fixes, tests, and timing surface by tpn · Pull Request #84 · tpn/perfecthash

tpn · 2026-03-31T21:37:47Z

Closes #79

Summary

This PR mainlines the validated Chm02 CUDA bring-up work as a focused correctness-first integration slice.

It promotes the legacy Chm02 CUDA path from a CPU-assisted bring-up flow toward a first-class correctness path for single-graph runs by moving the major solve phases onto the GPU, fixing Linux compatibility issues, adding focused regression coverage, and exposing explicit CUDA timing fields.

Included

Graph.cu / GraphCu.c fixes for CUDA add-keys, peel/order capture, assignment, and verify.
Linux Chm02Compat / file-work fixes needed for no-file-io and file-io parity.
Focused regression coverage for CUDA Chm02:
- Hologram known-seed no-file-io
- Hologram known-seed file-io
- generated non-Assigned16 case
- timing-field presence
Explicit per-phase CUDA timing fields in CSV output.
docs/chm02-cuda-mainline.md describing scope, non-goals, compatibility notes, and staged acceptance.

Non-goals

The standalone batched GPU peeling POC.
Large-scale GPU constructor/performance work.
Throughput-optimized Chm02 CUDA kernels.

Validation

Ran:

cmake --build build-cuda -j2
ctest --test-dir build-cuda --output-on-failure -R 'perfecthash\.cuda\.chm02'

Current focused CUDA Chm02 test result:

4/4 passed

Notes

This remains a correctness-first CUDA path. The serial CUDA kernels are intentionally bring-up oriented and should not be read as a throughput claim.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 302f13b036

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-31T21:46:49Z

src/PerfectHashCuda/Graph.cu

+    CUDA_CALL(cudaMalloc((void **)&DeviceFailures, sizeof(*DeviceFailures)));
+    CUDA_CALL(cudaMemsetAsync(DeviceFailures, 0, sizeof(*DeviceFailures), Stream));


Free verify scratch buffer on CUDA API errors

After DeviceFailures is allocated, subsequent calls still use CUDA_CALL(...), which returns immediately on failure instead of jumping to End. In this function that skips the cudaFree(DeviceFailures) cleanup path, so any failure in cudaMemsetAsync, later memcpy/sync calls, etc. leaks device memory and can snowball across repeated verify attempts in long-running runs.

Useful? React with 👍 / 👎.

tpn added 13 commits March 31, 2026 14:36

Chm02 CUDA: advance single-graph bring-up

7c06b9c

Chm02 CUDA: fix no-file-io completion path

e5f2696

Chm02 CUDA: add GPU assignment and verify path

1425eaa

Chm02 CUDA: fix Linux compat file-work path

0d09976

Chm02 CUDA: validate peel order against CPU oracle

9daa957

Chm02 CUDA: add peel-order validation and GPU verify

3613919

Chm02 CUDA: add first-class regression tests

c627a56

Chm02 CUDA: add generated non-Assigned16 regression

dd72151

Chm02 CUDA: add explicit timing surface

03f0b10

Chm02 CUDA: address review findings

ccdcd5e

Chm02 CUDA: expand mainline design note

c119c5b

Chm02 CUDA: tighten review contracts

7507e4b

Chm02 CUDA: fix managed bitmap and test contract

302f13b

chatgpt-codex-connector bot reviewed Mar 31, 2026

View reviewed changes

Chm02 CUDA: add non-debug regression coverage

67f957a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chm02 CUDA: mainline correctness fixes, tests, and timing surface#84

Chm02 CUDA: mainline correctness fixes, tests, and timing surface#84
tpn wants to merge 14 commits intomainfrom
issue-79-chm02-cuda-mainline

tpn commented Mar 31, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		CUDA_CALL(cudaMalloc((void *)&DeviceFailures, sizeof(DeviceFailures)));
		CUDA_CALL(cudaMemsetAsync(DeviceFailures, 0, sizeof(*DeviceFailures), Stream));

Conversation

tpn commented Mar 31, 2026

Summary

Included

Non-goals

Validation

Notes

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant