Skip to content

perf: Sequential reads in commit_columns_bit_reversed#560

Draft
nicole-graus wants to merge 5 commits into
mainfrom
commit-columns-sequential-reads
Draft

perf: Sequential reads in commit_columns_bit_reversed#560
nicole-graus wants to merge 5 commits into
mainfrom
commit-columns-sequential-reads

Conversation

@nicole-graus
Copy link
Copy Markdown
Collaborator

Optimization extracted from PR #545.

Replaces the scattered columns[col][br(row)] reads inside commit_columns_bit_reversed with sequential columns[col][k] reads, then applies a single in_place_bit_reverse_permute over the hashed-digest vector at the end. The scatter moves from the ~2 GB column-read side to the ~64 MB digest-output side, producing a byte-identical Merkle root.

@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

The ordering rewrite in commit_columns_bit_reversed looks equivalent: hashing k in natural order and then applying in_place_bit_reverse_permute yields leaves indexed as hash(row[reverse_index(i)]), matching the previous behavior.

Verification note: I attempted cargo test -p stark commit --lib, but it could not run because rustup tried to write under read-only /home/runner/.rustup/tmp.

})
.collect();

in_place_bit_reverse_permute(&mut hashed_leaves);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Medium – No test validates the equivalence directly

The mathematical transformation is correct: sequential_hash[k] = hash(col_0[k] || ...), and after in_place_bit_reverse_permute, leaves[i] = hash(col_0[br(i)] || ...) — identical to the old scatter read. The prove-verify roundtrip tests cover this implicitly, but since this is a correctness-critical cryptographic primitive, a direct unit test comparing the Merkle root produced by both approaches (old scatter-read vs. new sequential + permute) would be valuable insurance against future regressions.

Low – in_place_bit_reverse_permute silently miscomputes for non-power-of-two lengths in release builds

The debug_assert! at line 414–417 only fires in debug mode. in_place_bit_reverse_permute itself has no such guard — it calls size.trailing_zeros() which is only log2(size) when size is a power of two. In a release build with a non-power-of-two num_rows, the permutation would silently produce a wrong (but non-panicking) digest ordering. This is a pre-existing issue (the old reverse_index call had the same gap), but it's now applied to the committed digest array. Consider promoting the assert or adding a release-mode guard in in_place_bit_reverse_permute.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 24, 2026

Review: perf: Sequential reads in commit_columns_bit_reversed

Summary

The optimization moves the bit-reversal work from a scattered column read inside the hashing loop (2 GB of field element data) to a single in-place permutation of the 32-byte digest vector (num_rows × 32 bytes). The transformation is mathematically correct and the resulting Merkle root is identical.

Correctness ✓

The equivalence holds:

  • Old: leaves[i] = hash(col_0[br(i)] || col_1[br(i)] || ...)
  • New: hash col_0[k] || col_1[k] || ...raw[k], then in_place_bit_reverse_permuteleaves[i] = raw[br(i)] = hash(col_0[br(i)] || ...)

Parallel safety is also fine — rayon's par_iter().collect::<Vec<_>>() preserves element order.

Issues

Medium – Missing unit test for the equivalence (see inline)
The prove-verify roundtrip tests cover this implicitly, but a focused test with known inputs that asserts both approaches produce the same Merkle root would be valuable for a correctness-critical cryptographic primitive.

Low – in_place_bit_reverse_permute has no release-mode power-of-two guard (see inline)
Pre-existing issue, but now applied to the committed digest array. Promoting the existing debug_assert! or adding a guard inside the function would eliminate silent wrong output in release builds for malformed inputs.

Overall

Clean, well-motivated change. The code is readable, the docstring explains the invariant clearly, and the diff is minimal. The two issues above are not blockers.

@nicole-graus
Copy link
Copy Markdown
Collaborator Author

/bench 10

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Benchmark — fib_iterative_8M (median of 3)

Table parallelism: 1

Metric main PR Δ
Peak heap 49533 MB 49533 MB +0 MB (+0.0%) ⚪
Prove time 70.715s 71.978s +1.263s (+1.8%) ⚪

✅ No significant change.

✅ Low variance (time: 0.8%, heap: 0.0%)

Commit: 4df366b · Baseline: built from main · Runner: self-hosted bench

@nicole-graus
Copy link
Copy Markdown
Collaborator Author

/bench k=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants