cleanup(merkle-tree): reject non-power-of-two leaves, dedupe helpers#603
cleanup(merkle-tree): reject non-power-of-two leaves, dedupe helpers#603diegokingston wants to merge 2 commits into
Conversation
Behavior-preserving code-quality pass on `crypto/crypto/src/merkle_tree` (Poseidon backends, FriMerkleTree/Keccak256Backend, and the BatchProof / arity-4 machinery were explicitly kept). Padding removed (the one real footgun) - `build_from_hashed_leaves` padded a non-power-of-two leaf set by repeating the last leaf. That makes the root of `[.., x]` collide with the root of `[.., x, x]` — the root stopped binding the leaf count. Every prover commitment is already over a power-of-two domain, so the padding never fired in production. `build`/`build_from_hashed_leaves` now return `None` for a non-power-of-two count; `complete_until_power_of_two` is deleted. Helper dedup / std reuse - `utils.rs` had two index-helper pairs computing the same thing: `sibling_index`/`parent_index` (underflow at the root) and the root-safe `get_sibling_pos`/`get_parent_pos`. Consolidated onto the root-safe pair; `build_merkle_path` updated. - Deleted the custom `is_power_of_two` (reinvented `usize::is_power_of_two`, and underflowed at 0). - `utils` is now `pub(crate)` — internal plumbing, not public API. Dead code - Removed `Proof`'s hand-rolled `Serializable`/`Deserializable`: the live path is the `#[derive(serde::*)]` on `Proof` + bincode, these had zero callers, and the `Deserializable` impl's hardcoded `chunks(8)` was wrong for the actual 32-byte node type. - Inlined `create_proof` (a wrapper that always returned `Some`). Misc - Pair/vector backends finalize digests with `.finalize().into()`, like `FieldElementBackend` already did (was `[0u8; N]` + `copy_from_slice`). - Comment fixes (banner comments -> doc comments, typos). - Tests updated to power-of-two inputs; the `odd_set_of_leaves` test now asserts the `None` rejection. 44 crypto + 124 stark lib tests pass; lint clean; disk-spill builds.
Codex Code ReviewFindings: none. I did not find security vulnerabilities, functional bugs, or significant performance issues in the PR diff. The main behavior change, rejecting non-power-of-two Merkle leaf counts instead of padding, is explicit and covered by updated tests. Verification: |
Review: cleanup/merkle-treeGood cleanup overall. The padding-collision fix is correct and important. Two issues worth addressing before or shortly after merge: Bug [Medium] —
|
| while pos != ROOT { | ||
| let Some(node) = self.node_get(sibling_index(pos)) else { | ||
| // `pos != ROOT` guarantees a sibling exists. | ||
| let sibling = get_sibling_pos(pos).expect("non-root node has a sibling"); |
There was a problem hiding this comment.
This .expect() is inside a function that already returns Result<_, Error>. Since the invariant is maintained by the while pos != ROOT guard this won't panic today, but it's inconsistent with the surrounding style and is a latent panic if the loop condition ever changes.
| let sibling = get_sibling_pos(pos).expect("non-root node has a sibling"); | |
| let sibling = get_sibling_pos(pos).ok_or(Error::OutOfBounds)?; |
Codex Code ReviewNo issues found in the PR diff. I reviewed the Merkle tree changes for security, correctness, performance, and simplicity. The non-power-of-two rejection is consistent through implementation and tests, and I didn’t see introduced panics, memory safety problems, proof-verification regressions, or significant performance issues. Verification:
|
Code ReviewThis is a solid cleanup PR. The security fix and dead-code removal are correct and well-motivated. A few findings below. High – Fixed ✓ (Root collision via leaf padding)
Medium – Fixed ✓ (Buggy
|
| /// already over a power-of-two domain, so this is a caller-side invariant. | ||
| pub fn build_from_hashed_leaves(hashed_leaves: Vec<B::Node>) -> Option<Self> { | ||
| if hashed_leaves.is_empty() { | ||
| if hashed_leaves.is_empty() || !hashed_leaves.len().is_power_of_two() { |
There was a problem hiding this comment.
Low: is_empty() is redundant here — 0usize.is_power_of_two() returns false, so the second condition already rejects empty inputs. The explicit check is harmless, but reads as if it guards a distinct case. Consider:
| if hashed_leaves.is_empty() || !hashed_leaves.len().is_power_of_two() { | |
| if !hashed_leaves.len().is_power_of_two() { |
| pub fn get_parent_pos(node_index: usize) -> usize { | ||
| // Root node (index 0) has no parent, return itself to avoid underflow | ||
| if node_index == 0 { | ||
| return node_index; |
There was a problem hiding this comment.
Low: Returning self (0) for the root is a silent sentinel. The current caller (build_merkle_path) is safe because the loop guards while pos != ROOT, but -> Option<usize> would make the "no parent" case explicit and impossible for a future caller to miss. Not blocking for this PR.
MauroToscano
left a comment
There was a problem hiding this comment.
This one we should be careful, the padding is fine for the merkle tree, and we may need the serde serialization. The double serialization has the purpose of being able to serialize in binary modes or text modes
Behavior-preserving code-quality pass on
crypto/crypto/src/merkle_tree(Poseidon backends, FriMerkleTree/Keccak256Backend, and the BatchProof / arity-4 machinery were explicitly kept).Padding removed (the one real footgun)
build_from_hashed_leavespadded a non-power-of-two leaf set by repeating the last leaf. That makes the root of[.., x]collide with the root of[.., x, x]— the root stopped binding the leaf count. Every prover commitment is already over a power-of-two domain, so the padding never fired in production.build/build_from_hashed_leavesnow returnNonefor a non-power-of-two count;complete_until_power_of_twois deleted.Helper dedup / std reuse
utils.rshad two index-helper pairs computing the same thing:sibling_index/parent_index(underflow at the root) and the root-safeget_sibling_pos/get_parent_pos. Consolidated onto the root-safe pair;build_merkle_pathupdated.is_power_of_two(reinventedusize::is_power_of_two, and underflowed at 0).utilsis nowpub(crate)— internal plumbing, not public API.Dead code
Proof's hand-rolledSerializable/Deserializable: the live path is the#[derive(serde::*)]onProof+ bincode, these had zero callers, and theDeserializableimpl's hardcodedchunks(8)was wrong for the actual 32-byte node type.create_proof(a wrapper that always returnedSome).Misc
.finalize().into(), likeFieldElementBackendalready did (was[0u8; N]+copy_from_slice).odd_set_of_leavestest now asserts theNonerejection.44 crypto + 124 stark lib tests pass; lint clean; disk-spill builds.