Parallelize inplace_batch_inverse#561
Conversation
|
/bench |
Codex Code ReviewFound one issue. Low - Partial mutation on I didn’t find security issues in the successful inversion path. Verification note: I attempted |
Review: Parallelize inplace_batch_inverseGood direction overall — Rayon-based chunked parallelism is a natural fit here and the algorithm is correct (each chunk is fully independent). Two issues to address before merging: Medium — Unconditional
|
Benchmark — fib_iterative_8M (median of 3)Table parallelism: 1
Commit: 52e16f2 · Baseline: built from main · Runner: self-hosted bench |
|
/bench 10 |
|
/bench k=1 |
Codex Code ReviewFound one issue. Potential Bug / Low security impact The new parallel path can partially mutate This is especially risky because callers may handle the error and still reuse the original buffer. Keep the error behavior all-or-nothing, for example by validating/inverting chunk products before mutating chunks, or by falling back to the old sequential path when preserving that behavior matters. I couldn’t run |
Review: Parallelize inplace_batch_inverseOverall: The approach is correct — chunked batch inversion is mathematically sound since each element's inverse is independent. The implementation is clean and consistent with how the existing FFT parallelism is structured. Medium — No test coverage for the parallel pathThe existing tests use vectors of ≤10 elements (far below the 65536 threshold), so the new The FFT code sets a good precedent in
Low —
|
|
/claude |
Review SummaryOverall: The approach is correct and well-tested. The Montgomery batch-inverse per chunk is a sound strategy — each chunk is independent (no zero divisors in a field, so the product of non-zero elements is non-zero within a chunk), and the all-or-nothing invariant is properly preserved via the zero pre-scan. What works well
IssuesMedium — Double data traversal overhead Low — Low — Benchmark zero exclusion biases toward if v == 0 { v = 1; }Maps |
This PR adds a parallel path to inplace_batch_inverse: when the parallel feature is on and the slice has ≥ 2^16 elements, it splits the input into one chunk per Rayon thread and runs the existing Montgomery batch-inverse on each chunk in parallel. Otherwise it falls back to the sequential version