Skip to content

cleanup(merkle-tree): reject non-power-of-two leaves, dedupe helpers#603

Open
diegokingston wants to merge 2 commits into
mainfrom
cleanup/merkle-tree
Open

cleanup(merkle-tree): reject non-power-of-two leaves, dedupe helpers#603
diegokingston wants to merge 2 commits into
mainfrom
cleanup/merkle-tree

Conversation

@diegokingston
Copy link
Copy Markdown
Collaborator

Behavior-preserving code-quality pass on crypto/crypto/src/merkle_tree (Poseidon backends, FriMerkleTree/Keccak256Backend, and the BatchProof / arity-4 machinery were explicitly kept).

Padding removed (the one real footgun)

  • build_from_hashed_leaves padded a non-power-of-two leaf set by repeating the last leaf. That makes the root of [.., x] collide with the root of [.., x, x] — the root stopped binding the leaf count. Every prover commitment is already over a power-of-two domain, so the padding never fired in production. build/build_from_hashed_leaves now return None for a non-power-of-two count; complete_until_power_of_two is deleted.

Helper dedup / std reuse

  • utils.rs had two index-helper pairs computing the same thing: sibling_index/parent_index (underflow at the root) and the root-safe get_sibling_pos/get_parent_pos. Consolidated onto the root-safe pair; build_merkle_path updated.
  • Deleted the custom is_power_of_two (reinvented usize::is_power_of_two, and underflowed at 0).
  • utils is now pub(crate) — internal plumbing, not public API.

Dead code

  • Removed Proof's hand-rolled Serializable/Deserializable: the live path is the #[derive(serde::*)] on Proof + bincode, these had zero callers, and the Deserializable impl's hardcoded chunks(8) was wrong for the actual 32-byte node type.
  • Inlined create_proof (a wrapper that always returned Some).

Misc

  • Pair/vector backends finalize digests with .finalize().into(), like FieldElementBackend already did (was [0u8; N] + copy_from_slice).
  • Comment fixes (banner comments -> doc comments, typos).
  • Tests updated to power-of-two inputs; the odd_set_of_leaves test now asserts the None rejection.

44 crypto + 124 stark lib tests pass; lint clean; disk-spill builds.

Behavior-preserving code-quality pass on `crypto/crypto/src/merkle_tree`
(Poseidon backends, FriMerkleTree/Keccak256Backend, and the BatchProof /
arity-4 machinery were explicitly kept).

Padding removed (the one real footgun)
- `build_from_hashed_leaves` padded a non-power-of-two leaf set by
  repeating the last leaf. That makes the root of `[.., x]` collide with
  the root of `[.., x, x]` — the root stopped binding the leaf count.
  Every prover commitment is already over a power-of-two domain, so the
  padding never fired in production. `build`/`build_from_hashed_leaves`
  now return `None` for a non-power-of-two count; `complete_until_power_of_two`
  is deleted.

Helper dedup / std reuse
- `utils.rs` had two index-helper pairs computing the same thing:
  `sibling_index`/`parent_index` (underflow at the root) and the
  root-safe `get_sibling_pos`/`get_parent_pos`. Consolidated onto the
  root-safe pair; `build_merkle_path` updated.
- Deleted the custom `is_power_of_two` (reinvented `usize::is_power_of_two`,
  and underflowed at 0).
- `utils` is now `pub(crate)` — internal plumbing, not public API.

Dead code
- Removed `Proof`'s hand-rolled `Serializable`/`Deserializable`: the live
  path is the `#[derive(serde::*)]` on `Proof` + bincode, these had zero
  callers, and the `Deserializable` impl's hardcoded `chunks(8)` was
  wrong for the actual 32-byte node type.
- Inlined `create_proof` (a wrapper that always returned `Some`).

Misc
- Pair/vector backends finalize digests with `.finalize().into()`, like
  `FieldElementBackend` already did (was `[0u8; N]` + `copy_from_slice`).
- Comment fixes (banner comments -> doc comments, typos).
- Tests updated to power-of-two inputs; the `odd_set_of_leaves` test now
  asserts the `None` rejection.

44 crypto + 124 stark lib tests pass; lint clean; disk-spill builds.
@github-actions
Copy link
Copy Markdown

Codex Code Review

Findings: none.

I did not find security vulnerabilities, functional bugs, or significant performance issues in the PR diff. The main behavior change, rejecting non-power-of-two Merkle leaf counts instead of padding, is explicit and covered by updated tests.

Verification: git diff --check passed. I could not run cargo check because the sandbox cannot write to /home/runner/.rustup / /home/runner/.cargo for rustup and git dependency setup.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Review: cleanup/merkle-tree

Good cleanup overall. The padding-collision fix is correct and important. Two issues worth addressing before or shortly after merge:


Bug [Medium] — BatchProof::verify panics on num_leaves = 0

File: crypto/crypto/src/merkle_tree/proof.rs, lines 80 and 106

let first_leaf_index = num_leaves - 1;    // wraps/underflows when num_leaves = 0
let num_levels = (2 * num_leaves).ilog2(); // always panics on 0 per stdlib contract

num_leaves is caller-supplied with no guard. In debug builds line 80 panics immediately; in release builds it wraps to usize::MAX, then line 106 unconditionally panics. A malformed proof triggers this.

Suggested fix — add at the top of verify:

if num_leaves == 0 || !num_leaves.is_power_of_two() {
    return false;
}

The doc comment already says "must be a power of 2" — this just enforces it.


Simplicity [Low] — .expect() inside an error-returning function

File: crypto/crypto/src/merkle_tree/merkle.rs, line 213

let sibling = get_sibling_pos(pos).expect("non-root node has a sibling");

The invariant holds (the while pos != ROOT guard ensures it), but the function already returns Result<_, Error>, so this .expect() is inconsistent with the surrounding style. Prefer:

let sibling = get_sibling_pos(pos).ok_or(Error::OutOfBounds)?;

Everything else is correct: the padding-collision fix, removal of the broken Deserializable impl (chunks(8) for 32-byte nodes), duplicate index helpers, and is_power_of_two(0) == true latent bug are all clean wins.

while pos != ROOT {
let Some(node) = self.node_get(sibling_index(pos)) else {
// `pos != ROOT` guarantees a sibling exists.
let sibling = get_sibling_pos(pos).expect("non-root node has a sibling");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This .expect() is inside a function that already returns Result<_, Error>. Since the invariant is maintained by the while pos != ROOT guard this won't panic today, but it's inconsistent with the surrounding style and is a latent panic if the loop condition ever changes.

Suggested change
let sibling = get_sibling_pos(pos).expect("non-root node has a sibling");
let sibling = get_sibling_pos(pos).ok_or(Error::OutOfBounds)?;

@diegokingston diegokingston marked this pull request as ready for review May 21, 2026 19:25
@github-actions
Copy link
Copy Markdown

Codex Code Review

No issues found in the PR diff.

I reviewed the Merkle tree changes for security, correctness, performance, and simplicity. The non-power-of-two rejection is consistent through implementation and tests, and I didn’t see introduced panics, memory safety problems, proof-verification regressions, or significant performance issues.

Verification:

  • git diff --check passed.
  • I could not run cargo test -p lambdaworks-crypto merkle --all-features because rustup tried to write to read-only /home/runner/.rustup.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Code Review

This is a solid cleanup PR. The security fix and dead-code removal are correct and well-motivated. A few findings below.


High – Fixed ✓ (Root collision via leaf padding)

build_from_hashed_leaves previously padded non-power-of-two leaf sets by repeating the last leaf, making root([.., x]) == root([.., x, x]). The root no longer bound the leaf count. Now None is returned for non-power-of-two inputs. The fix is correct.

Medium – Fixed ✓ (Buggy Deserializable impl removed)

The old Deserializable for Proof hardcoded chunks(8) but nodes are 32 or 64 bytes (Keccak256/SHA-512 backends). It would have silently deserialised corrupt data. Correctly removed; the #[derive(serde::*)] path is unaffected.

Low – Fixed ✓ (Underflow in old sibling_index / parent_index)

sibling_index(0) and parent_index(0) subtracted from a usize root — panic in debug, wraparound in release. Replaced with safe variants. Similarly, the custom is_power_of_two(0) would panic (0_usize - 1) in debug; now uses usize::is_power_of_two() which correctly returns false for 0.


Minor issues

is_empty() guard is redundant (merkle.rs:137)
0usize.is_power_of_two() is false, so the is_empty() branch can never fire once !len().is_power_of_two() is checked. Harmless, but slightly misleading — looks like it handles a distinct case.

get_parent_pos silent sentinel for root (utils.rs:18–21)
Returning node_index (i.e. 0) for the root is safe today because build_merkle_path guards while pos != ROOT. But the function signature (-> usize) makes it easy for a future caller to miss this. -> Option<usize> would make the "no parent" case impossible to ignore. Not blocking, but worth noting.

/// already over a power-of-two domain, so this is a caller-side invariant.
pub fn build_from_hashed_leaves(hashed_leaves: Vec<B::Node>) -> Option<Self> {
if hashed_leaves.is_empty() {
if hashed_leaves.is_empty() || !hashed_leaves.len().is_power_of_two() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: is_empty() is redundant here — 0usize.is_power_of_two() returns false, so the second condition already rejects empty inputs. The explicit check is harmless, but reads as if it guards a distinct case. Consider:

Suggested change
if hashed_leaves.is_empty() || !hashed_leaves.len().is_power_of_two() {
if !hashed_leaves.len().is_power_of_two() {

Comment on lines 18 to 20
pub fn get_parent_pos(node_index: usize) -> usize {
// Root node (index 0) has no parent, return itself to avoid underflow
if node_index == 0 {
return node_index;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Low: Returning self (0) for the root is a silent sentinel. The current caller (build_merkle_path) is safe because the loop guards while pos != ROOT, but -> Option<usize> would make the "no parent" case explicit and impossible for a future caller to miss. Not blocking for this PR.

Copy link
Copy Markdown
Contributor

@MauroToscano MauroToscano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one we should be careful, the padding is fine for the merkle tree, and we may need the serde serialization. The double serialization has the purpose of being able to serialize in binary modes or text modes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants