feat: add Element::ProvableSumTree + AggregateSumOnRange query#661
feat: add Element::ProvableSumTree + AggregateSumOnRange query#661QuantumExplorer wants to merge 37 commits into
Conversation
…tension Phase 1 of the ProvableSumTree feature — the missing parallel to ProvableCountTree that bakes the per-node sum into the node hash, making aggregate-sum range queries cryptographically verifiable. This commit adds the types-only foundation (no hash divergence yet — Phase 2 will introduce node_hash_with_sum and the new proof Node variants). DISCRIMINANTS - Element::ProvableSumTree at variant index 17 / bincode discriminant 17 (next free after the NotSummed wrapper byte at 16). This will renumber to 19 when PR #657 (CountIndexedTree) lands and reclaims 17/18. - NonCountedProvableSumTree = 0x80 | 17 = 145. - The NonCounted twin range widened from 0x80..=0x8F (4-bit base) to 0x80..=0x9F (5-bit base) — is_non_counted() now checks the top 3 bits (& 0xe0 == 0x80) instead of the top 4. Existing twins 128..=142 stay put. - The NotSummed twin scheme rebases analogously: prefix 0xb0 -> 0xa0, base mask 0x0F -> 0x1F, family range 0xA0..=0xBF. Existing twins move: NotSummedSumTree 180 -> 164 NotSummedBigSumTree 181 -> 165 NotSummedCountSumTree 183 -> 167 NotSummedProvableCountSumTree 186 -> 170 Plus the new NotSummedProvableSumTree = 0xa0 | 17 = 177. Safe because V1 is pre-shipping. is_not_summed() now uses & 0xe0 == 0xa0. NEW APIS - ElementType::ProvableSumTree, ElementType::NonCountedProvableSumTree, ElementType::NotSummedProvableSumTree. - TreeType::ProvableSumTree (discriminant 11, is_sum_bearing = true, allows_sum_item = true, inner_node_type = ProvableSumNode). - NodeType::ProvableSumNode and TreeFeatureType::ProvableSummedMerkNode(i64) with encode tag byte 7 and a parallel zero_sum() helper alongside zero_count(). - Element::new_provable_sum_tree*, empty_provable_sum_tree*, plus helpers (as_provable_sum_tree_value, is_provable_sum_tree). - Element::NotSummed now accepts ProvableSumTree as a sum-tree inner type (constructor, serialize, deserialize). PROOF DISPATCH ProvableSumTree joins the "provable aggregate parent" family alongside ProvableCountTree / ProvableCountSumTree in ElementType::proof_node_type: subtree children use KvValueHashFeatureType and item children use KvCount. PHASE-1 SCOPE BOUNDARIES ProvableSumTree behaves identically to SumTree for storage, aggregation, and hashing in Phase 1. The divergent node_hash_with_sum and the new proof Node variants (KVSum, KVHashSum, etc.) land in Phase 2. TreeFeatureType::ProvableSummedMerkNode maps to AggregateData::Sum at the Element/aggregate level for now; Phase 2 may introduce a dedicated variant once the hash diverges. Workspace cargo test --all-features green (1497+ tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the ProvableSumTree feature — bakes the per-node sum into the
node hash so it becomes cryptographically committed via the parent's
hash chain, parallel to how `node_hash_with_count` commits the count
for `ProvableCountTree`. After this commit a `ProvableSumTree` with the
same {key/value/sum} contents as a plain `SumTree` produces a different
root hash, which is the whole point of the Phase 2 divergence.
Phase 1 (commit c95cf74) was types-only; aggregation, storage, and
hashing all used the SumTree code paths. Phase 2 introduces the new
hash function, the new proof-node variants needed to transport sums
through proofs, and the dispatch wiring on both prover and verifier
sides. Phases 3 (insert/read), 4 (verify_grovedb walk), and 5
(AggregateSumOnRange) remain.
HASH DISPATCH
- `merk::tree::hash::node_hash_with_sum(kv, l, r, i64)` mirrors
`node_hash_with_count` byte-for-byte except the appended 8-byte
field is `i64::to_be_bytes()`. Negative sums hash via their
two's-complement BE form, which is platform-independent.
- New `AggregateData::ProvableSum(i64)` variant. The
`From<TreeFeatureType>` conversion now maps
`ProvableSummedMerkNode(v) -> ProvableSum(v)` (was `Sum(v)` in
Phase 1) so `Tree::hash_for_link` and the commit path can
dispatch through the new arm.
- `Tree::hash_for_link(TreeType::ProvableSumTree)` and both commit
paths (left/right Link::Modified arms) now call
`node_hash_with_sum` when the aggregate is `ProvableSum`.
`Tree::aggregate_data` for `ProvableSummedMerkNode` yields
`ProvableSum` instead of `Sum`.
- Helper updates: `child_aggregate_sum_data_as_i64` /
`child_aggregate_sum_data_as_i128` treat `ProvableSum`
identically to `Sum`; `child_aggregate_count_data_as_u64`
returns 0. `child_ref_and_sum_size` covers the new variant.
- `Link::encode_into` / `decode_into` learn tag byte 7 for
`AggregateData::ProvableSum` (parallel to the existing
`ProvableSummedMerkNode` tag byte 7 in `TreeFeatureType`).
- `grovedb::batch` `InsertTreeWithRootHash` now reconstructs an
`Element::ProvableSumTree` when seeing `AggregateData::ProvableSum`.
PROOF NODE VARIANTS
Five new `Node` enum variants in `grovedb-query/src/proofs/mod.rs`,
mirroring the Count family member-for-member but with `i64` sums:
- `KVSum(key, value, sum)` — sum analogue of `KVCount`
- `KVHashSum(kv_hash, sum)` — analogue of `KVHashCount`
- `KVRefValueHashSum(key, ref_value, ref_elem_hash, sum)`
- `KVDigestSum(key, value_hash, sum)` — analogue of `KVDigestCount`
- `HashWithSum(kv_hash, l, r, sum)` — analogue of `HashWithCount`
`merk::proofs::tree::Tree::hash()` now dispatches each new variant
through `node_hash_with_sum`. `KVValueHashFeatureType` /
`...WithChildHash` handling gains a `ProvableSummedMerkNode` arm so
proof-tree hashes recomputed from a Sum-bearing feature_type match
the Merk-tree side. `aggregate_data()` returns `ProvableSum(sum)`
for `KVSum` and `HashWithSum`; `key()` lists the three key-bearing
new variants alongside their Count counterparts.
`grovedb-element::ProofNodeType` gains `KvSum` and
`KvRefValueHashSum`; `ElementType::proof_node_type` now picks them
when the parent is `ProvableSumTree` (Phase 1 routed Sum-tree
children through the Count dispatch). Subtrees inside ProvableSum
still use `KvValueHashFeatureType` since the feature_type carries
the sum.
Proof generation in `merk/src/proofs/query/mod.rs` adds
`to_kv_sum_node`, `to_kvhash_sum_node`, `to_kvdigest_sum_node`
(parallel to the Count helpers) and an `is_provable_sum_tree`
branch that emits Sum-bearing variants. `chunks.rs`'s
`create_proof_node_for_chunk` dispatches the new ProofNodeType
arms.
GroveDB-side reference post-processing in
`grovedb/src/operations/proof/generate.rs` rewrites the merk-level
`KVValueHashFeatureType(_, _, _, ProvableSummedMerkNode(sum))` to
`KVRefValueHashSum`, mirroring the existing
`KVValueHashFeatureType -> KVRefValueHashCount` path. Both
ref-rewriting loops in that file are updated.
The regular query verifier in `merk/src/proofs/query/verify.rs`
rejects `HashWithSum` at non-aggregate positions (fail-fast,
matching the existing `HashWithCount` guard). `KVSum`,
`KVDigestSum`, and `KVRefValueHashSum` are dispatched via
`execute_node`. `KVHashSum` joins `KVHash` / `KVHashCount` in the
"non-data-bearing on path" branch and in the absence-proof
boundary set.
WIRE FORMAT
Tag bytes 0x30..=0x3D in the previously-unused 0x30..0x3F range:
Push variants (V0 short + V1 wrapper for KV-style large values):
0x30 = KVSum (small), 0x31 = KVSum (large)
0x32 = KVHashSum
0x33 = KVRefValueHashSum (small), 0x34 = KVRefValueHashSum (large)
0x35 = KVDigestSum
0x36 = HashWithSum
PushInverted parallel:
0x37 = KVSum (small), 0x38 = KVSum (large)
0x39 = KVHashSum
0x3a = KVRefValueHashSum (small), 0x3b = KVRefValueHashSum (large)
0x3c = KVDigestSum
0x3d = HashWithSum
0x3e and 0x3f are intentionally reserved.
The on-wire i64 sum uses varint (via `ed::Encode for i64`) for
compactness, matching the Count family. The hash recomputation in
`node_hash_with_sum` uses the fixed 8-byte big-endian form
independently — wire encoding and hash input are deliberately
decoupled. `encoding_length()` and `Decode` arms parallel the Count
family verbatim.
V0 wire format is unchanged. All new tags are V1-only.
TESTS
- `merk::tree::hash` (4): `node_hash_with_sum` differs from
`node_hash` even at sum=0; different sums give different hashes;
`i64::MIN` / `i64::MAX` are distinct; determinism.
- `merk::tree` (2): a `ProvableSummedMerkNode` tree aggregates to
`ProvableSum`, `hash_for_link(ProvableSumTree)` matches
`node_hash_with_sum(...)` and diverges from plain
`Tree::hash()`; mutating a node sum changes the root hash.
- `merk::proofs::tree` (4): forged sums on `HashWithSum`,
`KVSum`, `KVHashSum` change the recomputed node hash;
Phase 1 -> Phase 2 cornerstone — same {key/value/sum} contents
give a different ProvableSumTree hash than a plain SumTree.
- `grovedb-query::proofs::encoding` (4): round-trip every new
variant through `Op::Push` and `Op::PushInverted` at sum
values {`i64::MIN`, -42, -1, 0, 1, 42, `i64::MAX`}; tag-byte
sanity check for all 10 new tags.
- `merk::tree::tree_feature_type`: extended every existing
`AggregateData` test to cover the new `ProvableSum` variant.
Workspace `cargo test --all-features` green: 2881 tests passing,
zero failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1.6 — revert the Phase 1.5 mask widening. The NotSummed family returns to its original 4-bit prefix `0xb0` / range `0xB0..=0xBF`. The five legal sum-tree inner types are mapped to twin slots explicitly, 1-at-a-time, instead of via the `prefix | base` bitwise formula: SumTree (base 4) -> 180 (0xB4) BigSumTree (base 5) -> 181 (0xB5) CountSumTree (base 7) -> 183 (0xB7) ProvableCountSumTree (base 10) -> 186 (0xBA) ProvableSumTree (base 17) -> 177 (0xB1) Existing twins return to their original discriminant values; only the new ProvableSumTree slot is freshly assigned. Pre-shipping V1, so this discriminant churn is fine. WHY EXPLICIT MAPPING `prefix | inner_byte` can only generate twin slots when the inner discriminant fits in the prefix's complement nibble. For ProvableSumTree at base 17, the formula `0xb0 | 17` would produce `0xB1` AND then `disc & 0x0F` would invert it back to base 1 (Reference) — a collision. Widening the mask to 5 bits in Phase 1.5 rebased every existing twin discriminant; reverting to per-variant mapping keeps the historical values stable while still allowing arbitrary new slot assignments. CONSEQUENCES - `NOT_SUMMED_TWIN_PREFIX` stays as a const but is now only a family-range marker, never composed with a base byte. - `NOT_SUMMED_BASE_MASK` removed — no remaining callers. - `is_not_summed()` back to `& 0xf0 == 0xb0`. - `base()` for NotSummed now uses an explicit per-variant match. - `from_serialized_value` NotSummed branch uses an explicit `inner_byte → twin_variant` match. NonCounted is unaffected — it still uses the bitwise formula because all its bases fit cleanly in the low 5 bits under `0x80`. Workspace cargo test --all-features green (2881 tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of the ProvableSumTree feature — wires the variant through the extension traits, cost calculator, reconstruction helper, batch propagation, and read-path subtree validation so that direct insertion, nested aggregation, and child-sum mutation behave correctly end-to-end. Phase 1 (commit c95cf74) added the variant and its twins; Phase 2 (commit 3364f08) introduced node_hash_with_sum and the proof Node family. The "behave like SumTree" fallback Phase 1 leaned on covered most surfaces but several dispatch sites guarded subsequent operations through explicit per-variant match arms — those sites would silently drop ProvableSumTree or fail to traverse into it. Phase 3 fills each in deliberately. EXTENSION TRAITS — merk/src/element/tree_type.rs ElementTreeTypeExtensions had six trait methods that enumerated tree variants explicitly: root_key_and_tree_type_owned, root_key_and_tree_type, tree_flags_and_type, tree_type, maybe_tree_type, tree_feature_type. Each was missing its ProvableSumTree arm — get_feature_type was the only one already wired (Phase 1). Adding the missing arms unblocks callers across get, batch, and visualize that thread tree types through these helpers to decide layout, hashing, and aggregate-data extraction. tree_feature_type now maps ProvableSumTree -> ProvableSummedMerkNode(sum) explicitly, matching the parallel ProvableCountedMerkNode wiring. COST CALCULATOR — merk/src/element/costs.rs get_specialized_cost, the layered_value_byte_cost path in specialized_costs_for_key_value, the layered_value_defined_cost type filter, and the value_defined_cost dispatch all enumerated the eight Merk-tree variants and would have either returned None or mis-sized a ProvableSumTree element. Added explicit ProvableSumTree => SUM_TREE_COST_SIZE arms (parity with SumTree as established in Phase 1) and the matching LayeredValueDefinedCost branch. RECONSTRUCTION — merk/src/element/reconstruct.rs ElementReconstructExtensions::reconstruct_with_root_key, used by batch propagation to rebuild a tree element after a root-key update, returned None for ProvableSumTree. Added the arm that pulls aggregate_data.as_sum_i64() into Element::ProvableSumTree(root, sum, flags); without this, batch operations that mutated a ProvableSumTree subtree would lose their tree element entirely during the parent's upward propagation. as_sum_i64 already handles the AggregateData::ProvableSum case (Phase 2). BATCH PROPAGATION — grovedb/src/batch/mod.rs The InsertTreeWithRootHash else-if chain that transcribes a Merk-tree mutation into the appropriate root-hash-bearing operation enumerated each tree-element variant explicitly. ProvableSumTree was missing, so a batch that mutated a ProvableSumTree subtree would fall through to the CommitmentTree arm — wrong shape entirely. Mirrored the ProvableCountSumTree arm directly. The accompanying tree-cost match list above (used by the apply_batch storage-cost callback) was also missing the variant. READ-PATH SUBTREE VALIDATION — grovedb/src/operations/get/mod.rs check_subtree_exists rejected paths whose final segment resolved to a ProvableSumTree because the variant wasn't in its accepted-tree match list. This would have broken every query that traversed INTO a ProvableSumTree. TESTS — grovedb/src/tests/provable_sum_tree_tests.rs Ten tests covering Phase 3's externally-observable surface: - Round-trip insert/read with aggregate-sum tracking. - Aggregation across mixed positive/negative/zero values + i64::MIN/ i64::MAX extremes. - Root-hash divergence vs a plain SumTree with identical children (the Phase 2 cornerstone, verified end-to-end via open_transactional_merk_at_path). - Nested ProvableSumTree[A] -> ProvableSumTree[B] aggregate propagation; mutation of B's children shifts the grovedb root hash. - Wrapper interactions: NonCounted(ProvableSumTree) contributes 0 to a CountTree parent; NotSummed(ProvableSumTree) contributes 0 to a SumTree parent; the wrapped tree's own aggregate is preserved (verified via get_raw, which retains the wrapper byte). - Deleting a SumItem child shifts the ProvableSumTree root hash because the aggregate sum is hash-bound. - Direct insert of a ProvableSumTree built from an existing template. The wrapper round-trip tests use db.get_raw rather than db.get because db.get strips wrappers via into_underlying — by design. Workspace cargo test --all-features green: 2891 tests passing (was 2881 in Phase 2 + 10 new), zero failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of the ProvableSumTree feature. The existing
`verify_merk_and_submerks_in_transaction` walk is cryptographically
complete — `combine_hash(value_hash(parent_bytes), inner_merk_root) ==
stored_element_value_hash` catches any byte-level tampering, and for
ProvableSumTree the inner aggregate is bound into the inner Merk's
root_hash via `node_hash_with_sum` (Phase 2). What that walk did not
catch was the *software-consistency* class of drift: a parent
`ProvableSumTree(_, N, _)` whose stored sum field N disagrees with the
inner Merk's actual `aggregate_data()` value M. For provable variants
both N and M are bound into element_value_hash, but they live on disk
independently and could disagree if Phase 3's propagation logic drifts.
For non-provable variants (SumTree, BigSumTree, CountTree, CountSumTree)
the recorded aggregate isn't hash-bound at all, so a pure software bug
in propagation would silently corrupt the tree.
LIB.RS — verify_merk_and_submerks_in_transaction
After the existing cryptographic check, for any tree element whose
inner Merk holds actual data (i.e. excluding the non-Merk-data trees
CommitmentTree/MmrTree/BulkAppendTree/DenseTree, which already
short-circuit via `uses_non_merk_data_storage`), the verifier now opens
the inner Merk, reads its `aggregate_data()`, and compares against the
parent's recorded aggregate field via a new free helper
`aggregate_consistency_labels`. The helper covers all seven aggregate-
bearing tree variants:
- SumTree vs. AggregateData::Sum
- ProvableSumTree vs. AggregateData::ProvableSum
- BigSumTree vs. AggregateData::BigSum
- CountTree vs. AggregateData::Count
- CountSumTree vs. AggregateData::CountAndSum
- ProvableCountTree vs. AggregateData::ProvableCount
- ProvableCountSumTree vs. AggregateData::ProvableCountAndSum
Plus an empty-Merk identity case (NoAggregateData with zero recorded
aggregate matches), and a fallback that reports any variant-shape
mismatch (e.g. ProvableSumTree paired with AggregateData::Count(_)).
VERIFICATIONISSUES SHAPE — placeholder hashes, not type extension
`VerificationIssues` is a private type alias
`HashMap<Vec<Vec<u8>>, (CryptoHash, CryptoHash, CryptoHash)>` whose
shape is consumed by `visualize_verify_grovedb`. To avoid breaking
its callers and the visualize hex output, mismatched aggregates are
packed into deterministic placeholder CryptoHashes via
`blake3(format!("recorded ..."))` and `blake3(format!("inner ..."))`,
slotted into the "expected" and "actual" fields. The "root" slot
reuses the inner-Merk root_hash for path locality. Documented inline.
INTEGRITY WALK TESTS — 7 new tests
A new `integrity_walk_tests` module in `provable_sum_tree_tests.rs`
exercises the verifier end-to-end via two raw-storage tampering
helpers:
- `tamper_value_no_hash_update` decodes the on-disk TreeNode for a
leaf, replaces only its element bytes, re-encodes (leaving the
stored value_hash stale), writes back via the immediate storage
context. Simulates byte-level tampering caught by the SumItem
arm's `value_hash(bytes) != stored_value_hash` check.
- `tamper_parent_element_with_consistent_hashes` splices in fresh
element bytes AND recomputes hash + value_hash to remain
crypto-consistent with the inner Merk's existing root_hash. Used
for aggregate-mismatch scenarios — the crypto check passes,
but the new aggregate-consistency check fires. Offsets into the
on-disk TreeNodeInner encoding are derived from the decoded
`value_as_slice().len()`.
Scenarios covered:
1. Inner SumItem value tamper (different bytes) — crypto check
catches it.
2. Inner SumItem same-length value tamper — crypto check catches
it (assert: hashes, not lengths, are what's verified).
3. Parent ProvableSumTree aggregate mismatch (sum=999 stored vs.
40 actual) — new aggregate-consistency check fires.
4. Clean ProvableSumTree verifies clean (with mixed positive,
negative, zero, and large values).
5. Clean ProvableCountTree verifies clean.
6. Parent ProvableCountTree aggregate mismatch (count=9999 vs. 3)
— sanity check that the generalized helper handles the count
variant too.
7. Reload-after-write determinism: insert, drop the db handle,
reopen, verify_grovedb reports zero issues; the parent's
ProvableSumTree.sum_value field round-trips.
Workspace cargo test --all-features green: 2898 passing
(Phase 3 baseline of 2891 + 7 new), zero failures.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the marquee Phase 5 feature for ProvableSumTree: a query that asks "what's the cryptographically-verifiable signed sum of children with keys in range [a, b]?" against a ProvableSumTree, with proof size O(log n + |boundary|) and a verify path that returns the root hash plus the aggregate i64 sum. Mirrors AggregateCountOnRange line-for-line: - QueryItem::AggregateSumOnRange(Box<QueryItem>) variant (wire tag 11) - Query / SizedQuery / PathQuery::validate_aggregate_sum_on_range with the same nested-rejection, no-subquery, no-pagination, allowed-inner-range rules - merk/src/proofs/query/aggregate_sum.rs (~760 lines) implementing create_aggregate_sum_on_range_proof + verify_aggregate_sum_on_range_proof with the same Disjoint/Contained/Boundary classification, HashWithSum self-verifying compression at fully-inside/outside subtrees, and KVDigestSum at boundaries - grovedb/src/operations/proof/aggregate_sum.rs (~330 lines) for the GroveDB-level multi-layer envelope chain check - prove_query / verify_query dispatch in generate.rs and verify.rs - Tree-type rejection arms in BulkAppendTree, DenseTree, MMR for the new variant Key correctness points handled differently from count: - i128 accumulator throughout the verifier (sum can validly be 0 with non-zero children, so no "if sum == 0" short-circuit; final narrow to i64 with an explicit overflow error) - No checked_sub equivalent for own_sum derivation — signed sums make arithmetic-only corruption detection meaningless; the hash chain binds the values regardless - ProvableSumTree-only at the merk-level gate (Sum/BigSum use different hash dispatches and can't host this proof shape) Tests: 35 new tests total (14 merk-level in aggregate_sum.rs, 21 GroveDB- level in aggregate_sum_query_tests.rs) covering empty trees, single-key ranges, full/sub/boundary ranges, negative sums, mixed-sign extremes including i64::MAX + i64::MIN = -1, tampering rejection, wrong-tree rejection, validation rejection of nested/Key/RangeFull/orthogonal-aggregate inners, multi-layer paths, NotSummed-wrapped subtree exclusion, V0 envelope round-trip. Workspace test count: 2898 → 2938, zero failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final phase of the ProvableSumTree feature — documentation. Adds: - `docs/book/src/aggregate-sum-on-range-queries.md`: new dedicated chapter describing the AggregateSumOnRange query, the ProvableSumTree tree type it operates on, why the existing sum trees can't be queried this way, the proof node vocabulary (KVSum / KVHashSum / HashWithSum / KVDigestSum / KVRefValueHashSum at wire tags 0x30..=0x3D), and the signed-sum correctness notes (no zero-sum short-circuit; i128 accumulator with i64 narrowing at the entry points; overflow handling at i64::MAX extremes). - `docs/book/src/element-system.md`: ProvableSumTree row added to the aggregate-tree table; ProvableSummedMerkNode added to the TreeFeatureType enum block; NonCounted/NotSummed wrapper indices surfaced; explanation of when to choose ProvableSumTree over plain SumTree (sum is part of the protocol invariant vs metadata) and the rationale for the explicit `NotSummedProvableSumTree = 177` slot. - `docs/book/src/hashing.md`: parallel "Aggregate Hashing for ProvableSumTree" section showing node_hash_with_sum's i64 BE input layout and the wire-vs-hash encoding split. - `docs/book/src/appendix-a.md`: rows for NonCounted (15), NotSummed (16), and ProvableSumTree (17) added to the discriminant table. - `docs/book/src/aggregate-sum-queries.md`: disambiguation banner at the top distinguishing the existing sum-budget iterator from the new AggregateSumOnRange query, with a cross-link. - `docs/book/src/SUMMARY.md`: registers the new chapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ProvableSumTree (sum bound into Merk hashes) and AggregateSumOnRange queries with full prover/verification, query APIs, proof op encoding, GroveDB plumbing, debugger types, docs, version flags, and extensive tests. ChangesProvableSumTree and Aggregate Sum on Range
Sequence Diagram(s)sequenceDiagram
participant Client as Client (rgba(66, 135, 245, 0.5))
participant GroveDb as GroveDb (rgba(72, 201, 176, 0.5))
participant Merk as Merk (rgba(245, 176, 65, 0.5))
Client->>GroveDb: PathQuery.new_aggregate_sum_on_range(path, range)
GroveDb->>GroveDb: validate_aggregate_sum_on_range
alt proof-based
GroveDb->>Merk: prove_aggregate_sum_on_range(inner_range)
Merk-->>GroveDb: ProofOps + sum i64
GroveDb-->>Client: verify_aggregate_sum_query -> (root_hash, sum)
else no-proof
GroveDb->>Merk: sum_aggregate_on_range(inner_range)
Merk-->>GroveDb: sum i64
GroveDb-->>Client: sum i64
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #661 +/- ##
===========================================
+ Coverage 90.76% 91.10% +0.34%
===========================================
Files 189 193 +4
Lines 56407 59993 +3586
===========================================
+ Hits 51196 54655 +3459
- Misses 5211 5338 +127
🚀 New features to boost your workflow:
|
Adds focused unit tests for production code added by this PR that the existing test suite happened to exercise indirectly. None of these tests add new behavior; they only pin down branches that codecov flagged as uncovered on the patch. Covers: - grovedb-query/src/proofs/encoding.rs: long-value (>= 65536 bytes) round-trips for the four KV-style ProvableSumTree wire variants (KVSum, KVRefValueHashSum in both Push and PushInverted directions -- tag bytes 0x31, 0x34, 0x38, 0x3b). - grovedb-query/src/proofs/mod.rs: Display tests for KVSum, KVHashSum, KVRefValueHashSum, KVDigestSum, and HashWithSum proof nodes. - grovedb-element/src/element_type.rs: proof_node_type dispatch on ProvableSumTree parents (Items -> KvSum, References -> KvRefValueHashSum), plus as_str / Display for ProvableSumTree, NonCountedProvableSumTree, NotSummedProvableSumTree. - grovedb-element/tests/element_constructors_helpers.rs: every ProvableSumTree constructor + is_provable_sum_tree / as_provable_sum_tree_value / into_provable_sum_tree_value helpers, including the wrong-element error paths. - grovedb-element/tests/element_display_and_serialization.rs: extends the all-variants Display test to include ProvableSumTree. - merk/src/proofs/tree.rs: forged-sum sensitivity for KVDigestSum and KVRefValueHashSum (the latter exercises the full combine(referenced_value_hash, node_value_hash) -> node_hash_with_sum path), aggregate_data() returning ProvableSum for KVSum / HashWithSum, and key() returning the right thing for all five sum node variants. - merk/src/proofs/query/aggregate_sum.rs: two new regression tests for Merk::prove (regular query) on a ProvableSumTree, asserting that the emitted proof contains KVSum + KVHashSum (and KVDigestSum for an absent-key boundary). These hit the to_kv_sum_node / to_kvhash_sum_node / to_kvdigest_sum_node helpers whose only callers are inside create_proof_internal's ProvableSumTree branches. All 2956 workspace tests pass (was 2938 before this commit). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 11
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
merk/src/proofs/tree.rs (1)
55-68:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate
ProvableSumout of every sum-bearing proof node.
KVHashSum,KVDigestSum,KVRefValueHashSum, andHashWithSumstill collapse toNoAggregateDataorInvalidProofErrorhere. Once one of those variants is reconstructed as a child link, the parent loses that subtree's committed sum even though the proof node already carried it.Suggested fix
pub fn as_link(&self) -> Link { let (key, aggregate_data) = match &self.tree.node { Node::KV(key, _) | Node::KVValueHash(key, ..) => { (key.as_slice(), AggregateData::NoAggregateData) } Node::KVValueHashFeatureType(key, _, _, feature_type) | Node::KVValueHashFeatureTypeWithChildHash(key, _, _, feature_type, _) => { (key.as_slice(), (*feature_type).into()) } Node::KVCount(key, _, count) => (key.as_slice(), AggregateData::ProvableCount(*count)), - Node::KVSum(key, _, sum) => (key.as_slice(), AggregateData::ProvableSum(*sum)), + Node::KVSum(key, _, sum) + | Node::KVDigestSum(key, _, sum) + | Node::KVRefValueHashSum(key, _, _, sum) => { + (key.as_slice(), AggregateData::ProvableSum(*sum)) + } + Node::KVHashSum(_, sum) | Node::HashWithSum(.., sum) => { + (&[] as &[u8], AggregateData::ProvableSum(*sum)) + } // for the connection between the trunk and leaf chunks, we don't // have the child key so we must first write in an empty one. once // the leaf gets verified, we can write in this key to its parent _ => (&[] as &[u8], AggregateData::NoAggregateData), }; @@ pub(crate) fn aggregate_data(&self) -> Result<AggregateData, Error> { match &self.node { Node::KVValueHashFeatureType(.., feature_type) | Node::KVValueHashFeatureTypeWithChildHash(.., feature_type, _) => { Ok((*feature_type).into()) } Node::KVCount(_, _, count) => Ok(AggregateData::ProvableCount(*count)), Node::HashWithCount(.., count) => Ok(AggregateData::ProvableCount(*count)), - Node::KVSum(_, _, sum) => Ok(AggregateData::ProvableSum(*sum)), - Node::HashWithSum(.., sum) => Ok(AggregateData::ProvableSum(*sum)), + Node::KVSum(_, _, sum) + | Node::KVHashSum(_, sum) + | Node::KVDigestSum(_, _, sum) + | Node::KVRefValueHashSum(_, _, _, sum) + | Node::HashWithSum(.., sum) => Ok(AggregateData::ProvableSum(*sum)), Node::KV(..) | Node::KVValueHash(..) => Ok(AggregateData::NoAggregateData), _ => Err(Error::InvalidProofError( "Cannot extract aggregate data from this node type".to_string(), )), }Also applies to: 482-497
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/proofs/tree.rs` around lines 55 - 68, The match on self.tree.node currently returns NoAggregateData for many sum-bearing variants, losing committed sums; update the match in tree.rs (the block matching Node variants) to propagate AggregateData::ProvableSum(sum) for all sum-bearing node variants (e.g., KVHashSum, KVDigestSum, KVRefValueHashSum, HashWithSum and any other *_Sum variants) instead of falling back to NoAggregateData or error, so the parent link preserves the subtree's sum when reconstructed; use the existing AggregateData::ProvableSum(...) constructor and the same key extraction pattern as other arms.merk/src/proofs/query/mod.rs (1)
379-419:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve sum-aware queried nodes for non-Element Merk values.
If
ElementType::from_serialized_value(...)fails here, a matchedProvableSummedMerkNodestill falls back toProofNodeType::Kv. The verifier then recomputesnode_hash(...)instead ofnode_hash_with_sum(...), so raw MerkProvableSumTreeproofs for queried items cannot verify even though the stored link hash committed the sum. The same fallback already exists for count; this new path extends that mismatch to sum proofs too.Suggested fix
- let proof_node_type = ElementType::from_serialized_value(self.tree().value_as_slice()) - .map(|et| et.proof_node_type(parent_tree_type)) - .unwrap_or(ProofNodeType::Kv); // Default to tamper-resistant for raw Merk + let proof_node_type = ElementType::from_serialized_value(self.tree().value_as_slice()) + .map(|et| et.proof_node_type(parent_tree_type)) + .unwrap_or_else(|_| match parent_tree_type { + Some(ElementType::ProvableCountTree) => ProofNodeType::KvCount, + Some(ElementType::ProvableSumTree) => ProofNodeType::KvSum, + _ => ProofNodeType::Kv, + });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/proofs/query/mod.rs` around lines 379 - 419, The proof_node_type fallback currently always uses ProofNodeType::Kv when ElementType::from_serialized_value(...) fails, which loses sum-awareness for ProvableSummedMerkNode; update the proof_node_type assignment so that when from_serialized_value returns Err you detect the node kind (the existing matched ProvableSummedMerkNode case) and choose the appropriate sum-aware ProofNodeType (e.g., ProofNodeType::KvSum or the KvRefValueHashSum variant for reference/summed nodes) instead of defaulting to Kv; change the ElementType::from_serialized_value(...) .map(...).unwrap_or(...) logic around the proof_node_type variable to inspect the current node/variant (the same condition that produced a ProvableSummedMerkNode) and return the correct sum proof node type, so downstream node generation (the match on proof_node_type that calls to_kv_sum_node(), to_kv_value_hash_feature_type_node(), etc.) preserves sum verification.
🧹 Nitpick comments (6)
merk/src/proofs/branch/mod.rs (1)
112-133: 💤 Low valueConsider deduplicating
get_key_from_nodeacrossTrunkQueryResultandBranchQueryResult.The two
get_key_from_nodeimplementations are now byte-identical and both had to be updated in lockstep for the new sum-bearing variants. Extracting a free function (or a private helper onNode) would eliminate the drift risk as more variants get added.♻️ Sketch
// merk/src/proofs/branch/mod.rs (module-private helper) #[cfg(any(feature = "minimal", feature = "verify"))] fn key_from_node(node: &Node) -> Option<Vec<u8>> { match node { Node::KV(key, _) | Node::KVValueHash(key, ..) | Node::KVValueHashFeatureType(key, ..) | Node::KVValueHashFeatureTypeWithChildHash(key, ..) | Node::KVDigest(key, _) | Node::KVDigestCount(key, ..) | Node::KVRefValueHash(key, ..) | Node::KVCount(key, ..) | Node::KVRefValueHashCount(key, ..) | Node::KVSum(key, ..) | Node::KVDigestSum(key, ..) | Node::KVRefValueHashSum(key, ..) => Some(key.clone()), Node::Hash(_) | Node::KVHash(_) | Node::KVHashCount(..) | Node::HashWithCount(..) | Node::KVHashSum(..) | Node::HashWithSum(..) => None, } }Then have both impls call
key_from_node(...).Also applies to: 383-404
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/proofs/branch/mod.rs` around lines 112 - 133, Both TrunkQueryResult::get_key_from_node and BranchQueryResult::get_key_from_node are identical; extract a single helper (e.g., module-private fn key_from_node(node: &Node) -> Option<Vec<u8>>) or a private Node method and move the match there (include the same cfg attributes used now), then update both TrunkQueryResult::get_key_from_node and BranchQueryResult::get_key_from_node to delegate to that helper; apply the same change for the duplicate occurrence around the other block (the 383-404 region) so both call the shared helper.merk/src/merk/prove.rs (1)
201-206: ⚡ Quick winUse
cost_return_on_error!for the tree-type guard.This new
CostResultpath does a manual early return instead of the repo’s standard cost-accumulating macro. Switching the guard tocost_return_on_error!keeps the error path consistent with the rest of the costed API. As per coding guidelines, "**/*.rs: Usecost_return_on_error!macro for early returns with cost accumulation in Rust source files`."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/merk/prove.rs` around lines 201 - 206, Replace the manual early return in the tree-type guard with the repo macro cost_return_on_error! so the error path accumulates cost consistently; specifically, where you check if !matches!(tree_type, crate::TreeType::ProvableSumTree) and return Error::InvalidProofError("AggregateSumOnRange..."), switch that to cost_return_on_error!(Error::InvalidProofError(format!(...))) using the same formatted message (referencing tree_type and ProvableSumTree) so the semantics and message remain identical but the CostResult path uses the standard macro.merk/src/element/tree_type.rs (1)
246-251: ⚡ Quick winExtend the wrapper regression to
ProvableSumTree.
get_feature_type(TreeType::ProvableSumTree)is the new sum-bearing branch, but the existingNotSummedregression tests still only enumerate the older parent types. Please addProvableSumTreethere as well so the zeroed-sum wrapper semantics stay pinned down.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/element/tree_type.rs` around lines 246 - 251, The NotSummed wrapper-regression tests don’t include the new sum-bearing branch, so update the enumeration used by the regression to also include TreeType::ProvableSumTree; specifically, in the tests that call get_feature_type(...) and assert NotSummed behavior, add TreeType::ProvableSumTree to the list of TreeType variants being exercised so the zeroed-sum wrapper semantics remain pinned (look for the NotSummed regression/test block that enumerates parent types and add the ProvableSumTree variant).merk/src/tree/link.rs (1)
447-456: ⚡ Quick winAdd a round-trip test for the new wire tag.
Tag
7is introduced in both encode and decode, but this module’s unit tests still only exercise the older aggregate variants. A small round-trip forAggregateData::ProvableSum, including a negative value, would lock down the tag assignment and catch wire-format drift quickly.Also applies to: 654-658
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/tree/link.rs` around lines 447 - 456, Add a unit test that round-trips the new wire tag 7: construct AggregateData::ProvableSum with a negative value (e.g., -42), serialize it using the same encode path that writes the tag (the code that calls out.write_all(&[7]) and out.write_varint), then deserialize using this module's decode path and assert the decoded AggregateData equals the original and that the serialized bytes start with 0x07; add a matching test for the other encode location mentioned (lines ~654-658) so both encoding sites for AggregateData::ProvableSum are exercised.grovedb/src/operations/proof/generate.rs (1)
337-341: ⚡ Quick winWrap the new aggregate-sum merk errors with call-site context.
Both new branches forward a bare
MerkError, which makes aggregate-sum proof failures hard to distinguish from the surrounding proof-generation paths. Please wrap these with a contextual message before returning.♻️ Suggested change
- .prove_aggregate_sum_on_range(&inner_range, grove_version) - .map_err(Error::MerkError) + .prove_aggregate_sum_on_range(&inner_range, grove_version) + .map_err(|e| { + Error::CorruptedData(format!( + "prove_aggregate_sum_on_range failed: {}", + e + )) + })As per coding guidelines "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!("context: {}", e)))pattern in Rust source files".Also applies to: 1169-1173
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/proof/generate.rs` around lines 337 - 341, The current calls that map merk errors from subtree.prove_aggregate_sum_on_range to Error::MerkError lack context; replace .map_err(Error::MerkError) with a contextual wrapper such as .map_err(|e| Error::CorruptedData(format!("aggregate-sum proof failed: {}", e))) at each site (the call inside the cost_return_on_error! around subtree.prove_aggregate_sum_on_range and the other similar branch later), so the returned Error includes a clear message plus the original merk error for easier diagnosis.grovedb/src/tests/aggregate_sum_query_tests.rs (1)
578-590: ⚡ Quick winExercise the new prover gate, not just verifier-side validation.
This only feeds a dummy proof into
verify_aggregate_sum_query(). The new defense-in-depth behavior added inprove_query_non_serialized()is still unasserted here, so a regression in proof generation could slip through while this test keeps passing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 578 - 590, The test currently only calls GroveDb::verify_aggregate_sum_query with a dummy proof, so update it to also exercise the prover gate by calling the prover function (e.g., GroveDb::prove_query_non_serialized or the appropriate proof-generation method used in the codebase) with the constructed PathQuery (created via PathQuery::new_aggregate_sum_on_range and mutated via set_subquery) and assert that proof generation returns an Err; keep the existing verify_aggregate_sum_query assertion as well so both prover-side (prove_query_non_serialized) and verifier-side (GroveDb::verify_aggregate_sum_query) validations are asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/book/src/appendix-a.md`:
- Around line 20-22: The table entry for `ProvableSumTree` has the wrong
TreeType discriminant (currently `11`); update the table row so the discriminant
is `17` to match the actual enum variant (`ProvableSumTree`) in the codebase,
keeping the rest of the columns unchanged (use `17` for the second column, keep
`(root_key, sum: i64, flags)` and `SUM_TREE_COST_SIZE` as-is) so the doc matches
the `ProvableSumTree` discriminant.
In `@grovedb-query/src/query_item/mod.rs`:
- Around line 165-166: The serializer is emitting PascalCase for the QueryItem
variant names (e.g., QueryItem::AggregateSumOnRange) via
serialize_newtype_variant but the deserializer uses #[serde(field_identifier,
rename_all = "snake_case")]; update the serialize_newtype_variant calls for
QueryItem::AggregateSumOnRange (and the other QueryItem variants in the same
match arm group) to use snake_case names (e.g., "aggregate_sum_on_range") so the
serialized variant names match deserialization expectations and round-trip
correctly.
In `@grovedb-query/src/query.rs`:
- Around line 413-420: The current loop over self.conditional_subquery_branches
only checks branch.subquery for has_aggregate_sum_on_range_anywhere(), so a
branch that contains AggregateSumOnRange in its selector/key is missed; update
the loop to also inspect the branch itself (not just branch.subquery) for
AggregateSumOnRange—e.g., for each branch in
conditional_subquery_branches.values() call a method or predicate on the branch
(e.g., branch.has_aggregate_sum_on_range_anywhere() or explicitly check
branch.selector/key structure) and return true if that check or the existing
branch.subquery.as_deref().has_aggregate_sum_on_range_anywhere() is true. Ensure
you reference conditional_subquery_branches, branch, branch.subquery and
has_aggregate_sum_on_range_anywhere in the fix.
In `@grovedb/src/lib.rs`:
- Around line 1061-1073: The current issues.insert call inside the
aggregate_consistency_labels branch overwrites any existing entry (such as one
created when combined_value_hash != element_value_hash) so callers lose the real
hash-chain mismatch; modify the logic around aggregate_consistency_labels(...)
and issues.insert(...) to detect an existing entry for new_path and either merge
the aggregate placeholders with the existing (root_hash, expected_placeholder,
actual_placeholder) tuple or append/replace with a composite value that
preserves both the original element_value_hash mismatch and the synthetic
aggregate placeholders; specifically update the code that calls
aggregate_consistency_labels(&element, &actual_aggregate) and the
issues.insert(new_path.to_vec(), ...) call to check
issues.get(&new_path.to_vec()) and preserve/merge previous data instead of
unconditionally inserting and overwriting.
In `@grovedb/src/operations/proof/aggregate_sum.rs`:
- Around line 193-195: The V1 layer walker is calling
verify_single_key_layer_proof_v0 from verify_v1_layer, which internally calls
execute_proof with a hardcoded 0 (V0); update verify_single_key_layer_proof_v0
(and its call site in verify_v1_layer) to use PROOF_VERSION_LATEST instead so V1
semantics are enforced consistently with verify_layer_proof_v1 and
verify_trunk_chunk_proof_v1; locate the execute_proof(...) invocation inside
verify_single_key_layer_proof_v0 and replace the literal 0 with
PROOF_VERSION_LATEST (or propagate a version constant through the call chain)
and run tests to ensure V1 proof shapes are rejected/accepted consistently.
In `@grovedb/src/operations/proof/verify.rs`:
- Around line 2694-2696: The match arm in get_key_value_from_node incorrectly
treats Node::KVRefValueHashSum as carrying an authenticated value; remove
KVRefValueHashSum from the arm that returns Some((key.clone(), value.clone()))
so opaque-hash sum nodes are not deserialized as values, and add the same
rejection/guard for Node::KVRefValueHashSum in extract_elements_and_leaf_keys
(mirror the existing handling for KVRefValueHashCount) to ensure trunk/branch
proofs never accept opaque reference_element_hash nodes as authenticated values;
update tests to cover both KVRefValueHashCount and KVRefValueHashSum being
rejected.
In `@grovedb/src/tests/provable_sum_tree_tests.rs`:
- Around line 601-651: The test currently only builds `template` via normal
inserts and never exercises the direct-insert branch; after you capture
`root_key` and `sum` from the `template` Element::ProvableSumTree, perform a
second insert that directly writes an
Element::ProvableSumTree(Some(root_key.clone()), sum, ...) (use the same Element
variant) into a new key (e.g., "direct") via `db.insert`, then `db.get` that new
key and assert it returns the same ProvableSumTree contents (root_key present
and sum equals 6) to validate the direct-insert path in
`direct_insert_provable_sum_tree_with_root_key_and_sum`.
- Around line 48-69: The test loop inserts sum items via db.insert
(Element::new_sum_item) and then fetches the parent with db.get and calls
fetched.as_provable_sum_tree_value(), but it never asserts the fetched aggregate
equals the running total; update the loop in provable_sum_tree_tests to compute
an expected_running_total (incrementing by value each iteration) and assert that
the value returned by fetched.as_provable_sum_tree_value() (from the
Element::ProvableSumTree) equals that expected_running_total after each insert
so intermediate propagation is verified.
In `@merk/src/proofs/query/aggregate_sum.rs`:
- Around line 1062-1071: The test currently returns early when
merk.apply(&entries, &[], None, v).unwrap().is_err(), which makes the overflow
regression test a no-op; instead update the branch to assert the expected
failure (e.g., assert matches the specific overflow error returned by merk.apply
for the two i64::MAX fixture) or, if the design expects Phase 1 to reject but
you still need verifier coverage, add a fabricated-proof path that constructs a
proof for the overflow case and always exercises the verifier; reference the
merk.apply call, the entries fixture containing two i64::MAX values, and the
verifier path in the verify layer when implementing the assertion or
fabricated-proof fallback.
- Around line 130-137: provable_sum_from_aggregate currently returns
Error::InvalidProofError for missing/wrong AggregateData, but these are
local/state invariant failures and should be Error::CorruptedData; change the
match arms in provable_sum_from_aggregate to return
Err(Error::CorruptedData(...)) with a descriptive message, and similarly replace
InvalidProofError uses in the other failing aggregate-handling sites (the blocks
around the ranges corresponding to lines ~240-250 and ~283-293) so they wrap
underlying errors/context using the .map_err(|e|
Error::CorruptedData(format!("context: {}", e))) pattern when calling
aggregate_data() and return CorruptedData on wrong enum variants (e.g.,
"expected ProvableSum aggregate data, got …") to correctly classify storage
corruption.
In `@merk/src/tree_type/mod.rs`:
- Around line 47-51: The doc comment for the ProvableSumTree variant currently
implies it's a Phase-1 alias of SumTree; update the comment on the
ProvableSumTree declaration to describe its actual behavior: state that it
implements a distinct provable-sum hash and proof path (not just a Phase-2
plan), mention which fields/types differ (inner_node_type and
empty_tree_feature_type) and that hashes/proofs are computed using the
provable-sum algorithm introduced in this PR, and remove the “Phase 2 will
diverge” wording so callers understand it is a separate tree type with its own
hashing/proof semantics.
---
Outside diff comments:
In `@merk/src/proofs/query/mod.rs`:
- Around line 379-419: The proof_node_type fallback currently always uses
ProofNodeType::Kv when ElementType::from_serialized_value(...) fails, which
loses sum-awareness for ProvableSummedMerkNode; update the proof_node_type
assignment so that when from_serialized_value returns Err you detect the node
kind (the existing matched ProvableSummedMerkNode case) and choose the
appropriate sum-aware ProofNodeType (e.g., ProofNodeType::KvSum or the
KvRefValueHashSum variant for reference/summed nodes) instead of defaulting to
Kv; change the ElementType::from_serialized_value(...) .map(...).unwrap_or(...)
logic around the proof_node_type variable to inspect the current node/variant
(the same condition that produced a ProvableSummedMerkNode) and return the
correct sum proof node type, so downstream node generation (the match on
proof_node_type that calls to_kv_sum_node(),
to_kv_value_hash_feature_type_node(), etc.) preserves sum verification.
In `@merk/src/proofs/tree.rs`:
- Around line 55-68: The match on self.tree.node currently returns
NoAggregateData for many sum-bearing variants, losing committed sums; update the
match in tree.rs (the block matching Node variants) to propagate
AggregateData::ProvableSum(sum) for all sum-bearing node variants (e.g.,
KVHashSum, KVDigestSum, KVRefValueHashSum, HashWithSum and any other *_Sum
variants) instead of falling back to NoAggregateData or error, so the parent
link preserves the subtree's sum when reconstructed; use the existing
AggregateData::ProvableSum(...) constructor and the same key extraction pattern
as other arms.
---
Nitpick comments:
In `@grovedb/src/operations/proof/generate.rs`:
- Around line 337-341: The current calls that map merk errors from
subtree.prove_aggregate_sum_on_range to Error::MerkError lack context; replace
.map_err(Error::MerkError) with a contextual wrapper such as .map_err(|e|
Error::CorruptedData(format!("aggregate-sum proof failed: {}", e))) at each site
(the call inside the cost_return_on_error! around
subtree.prove_aggregate_sum_on_range and the other similar branch later), so the
returned Error includes a clear message plus the original merk error for easier
diagnosis.
In `@grovedb/src/tests/aggregate_sum_query_tests.rs`:
- Around line 578-590: The test currently only calls
GroveDb::verify_aggregate_sum_query with a dummy proof, so update it to also
exercise the prover gate by calling the prover function (e.g.,
GroveDb::prove_query_non_serialized or the appropriate proof-generation method
used in the codebase) with the constructed PathQuery (created via
PathQuery::new_aggregate_sum_on_range and mutated via set_subquery) and assert
that proof generation returns an Err; keep the existing
verify_aggregate_sum_query assertion as well so both prover-side
(prove_query_non_serialized) and verifier-side
(GroveDb::verify_aggregate_sum_query) validations are asserted.
In `@merk/src/element/tree_type.rs`:
- Around line 246-251: The NotSummed wrapper-regression tests don’t include the
new sum-bearing branch, so update the enumeration used by the regression to also
include TreeType::ProvableSumTree; specifically, in the tests that call
get_feature_type(...) and assert NotSummed behavior, add
TreeType::ProvableSumTree to the list of TreeType variants being exercised so
the zeroed-sum wrapper semantics remain pinned (look for the NotSummed
regression/test block that enumerates parent types and add the ProvableSumTree
variant).
In `@merk/src/merk/prove.rs`:
- Around line 201-206: Replace the manual early return in the tree-type guard
with the repo macro cost_return_on_error! so the error path accumulates cost
consistently; specifically, where you check if !matches!(tree_type,
crate::TreeType::ProvableSumTree) and return
Error::InvalidProofError("AggregateSumOnRange..."), switch that to
cost_return_on_error!(Error::InvalidProofError(format!(...))) using the same
formatted message (referencing tree_type and ProvableSumTree) so the semantics
and message remain identical but the CostResult path uses the standard macro.
In `@merk/src/proofs/branch/mod.rs`:
- Around line 112-133: Both TrunkQueryResult::get_key_from_node and
BranchQueryResult::get_key_from_node are identical; extract a single helper
(e.g., module-private fn key_from_node(node: &Node) -> Option<Vec<u8>>) or a
private Node method and move the match there (include the same cfg attributes
used now), then update both TrunkQueryResult::get_key_from_node and
BranchQueryResult::get_key_from_node to delegate to that helper; apply the same
change for the duplicate occurrence around the other block (the 383-404 region)
so both call the shared helper.
In `@merk/src/tree/link.rs`:
- Around line 447-456: Add a unit test that round-trips the new wire tag 7:
construct AggregateData::ProvableSum with a negative value (e.g., -42),
serialize it using the same encode path that writes the tag (the code that calls
out.write_all(&[7]) and out.write_varint), then deserialize using this module's
decode path and assert the decoded AggregateData equals the original and that
the serialized bytes start with 0x07; add a matching test for the other encode
location mentioned (lines ~654-658) so both encoding sites for
AggregateData::ProvableSum are exercised.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e52b141b-25f4-4a67-a7a6-f7c5ab321b01
📒 Files selected for processing (57)
docs/book/src/SUMMARY.mddocs/book/src/aggregate-sum-on-range-queries.mddocs/book/src/aggregate-sum-queries.mddocs/book/src/appendix-a.mddocs/book/src/element-system.mddocs/book/src/hashing.mdgrovedb-bulk-append-tree/src/proof/mod.rsgrovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rsgrovedb-element/src/element/constructor.rsgrovedb-element/src/element/helpers.rsgrovedb-element/src/element/mod.rsgrovedb-element/src/element/serialize.rsgrovedb-element/src/element/visualize.rsgrovedb-element/src/element_type.rsgrovedb-element/tests/element_constructors_helpers.rsgrovedb-element/tests/element_display_and_serialization.rsgrovedb-query/src/proofs/encoding.rsgrovedb-query/src/proofs/mod.rsgrovedb-query/src/proofs/tree_feature_type.rsgrovedb-query/src/query.rsgrovedb-query/src/query_item/intersect.rsgrovedb-query/src/query_item/mod.rsgrovedb/src/batch/mod.rsgrovedb/src/debugger.rsgrovedb/src/lib.rsgrovedb/src/operations/get/mod.rsgrovedb/src/operations/get/query.rsgrovedb/src/operations/insert/mod.rsgrovedb/src/operations/proof/aggregate_sum.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/operations/proof/mod.rsgrovedb/src/operations/proof/verify.rsgrovedb/src/query/mod.rsgrovedb/src/tests/aggregate_sum_query_tests.rsgrovedb/src/tests/mod.rsgrovedb/src/tests/provable_count_sum_tree_tests.rsgrovedb/src/tests/provable_sum_tree_tests.rsgrovedbg-types/src/lib.rsmerk/src/element/costs.rsmerk/src/element/delete.rsmerk/src/element/get.rsmerk/src/element/reconstruct.rsmerk/src/element/tree_type.rsmerk/src/merk/chunks.rsmerk/src/merk/prove.rsmerk/src/proofs/branch/mod.rsmerk/src/proofs/chunk/chunk.rsmerk/src/proofs/query/aggregate_sum.rsmerk/src/proofs/query/mod.rsmerk/src/proofs/query/verify.rsmerk/src/proofs/tree.rsmerk/src/tree/hash.rsmerk/src/tree/link.rsmerk/src/tree/mod.rsmerk/src/tree/tree_feature_type.rsmerk/src/tree_type/costs.rsmerk/src/tree_type/mod.rs
| | 15 | `NonCounted` | wrapper | `Box<Element>` | inner + 1 byte | Opts inner out of parent count aggregation | | ||
| | 16 | `NotSummed` | wrapper | `Box<Element>` | inner + 1 byte | Opts inner out of parent sum aggregation | | ||
| | 17 | `ProvableSumTree` | 11 | `(root_key, sum: i64, flags)` | SUM_TREE_COST_SIZE | Sum baked into hash (see [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)) | |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Locate the TreeType enum definition.
ast-grep --pattern 'pub enum TreeType {
$$$
}'
# As fallback, search for "TreeType" enum variants with explicit discriminants.
rg -nP --type=rust -C2 '\bProvableSumTree\b.*=\s*\d+'
rg -nP --type=rust -B1 -A1 '\benum TreeType\b'Repository: dashpay/grovedb
Length of output: 3780
Correct TreeType discriminant for ProvableSumTree.
Line 22 lists ProvableSumTree with TreeType 11, but the actual discriminant is 17 (per grovedb-element/src/element_type.rs). Update to:
| 17 | `ProvableSumTree` | 17 | `(root_key, sum: i64, flags)` | SUM_TREE_COST_SIZE | Sum baked into hash (see [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/book/src/appendix-a.md` around lines 20 - 22, The table entry for
`ProvableSumTree` has the wrong TreeType discriminant (currently `11`); update
the table row so the discriminant is `17` to match the actual enum variant
(`ProvableSumTree`) in the codebase, keeping the rest of the columns unchanged
(use `17` for the second column, keep `(root_key, sum: i64, flags)` and
`SUM_TREE_COST_SIZE` as-is) so the doc matches the `ProvableSumTree`
discriminant.
The trunk/branch proof extractor was rejecting Node::KVRefValueHash and Node::KVRefValueHashCount as having an opaque value_hash, but the new Phase 2 KVRefValueHashSum variant was missing from the rejection arm. Without this guard, get_key_value_from_node still surfaces (key, value) for KVRefValueHashSum nodes and the verifier would deserialize and insert the value bytes into the elements map. The embedded node_value_hash is opaque (combine_hash of the node_value_hash and the referenced_value_hash) and cannot be recomputed from the value bytes alone, so a forged value could ride along while the per-node merk hash chain still appears valid. Add KVRefValueHashSum to the rejection arm in extract_elements_and_leaf_keys, alongside KVRefValueHash and KVRefValueHashCount, with a regression test that mirrors the existing KVRefValueHash trunk-proof rejection test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
has_aggregate_sum_on_range_anywhere previously walked only branch.subquery for each conditional branch, ignoring the branch selector (the IndexMap key). Selectors are themselves QueryItems and the type system permits an AggregateSumOnRange tag there even though it is not a meaningful conditional matcher. The shape check is meant to be exhaustive — if any ASOR is present "anywhere", the prover must refuse to route through the regular-proof path — so the walker must surface a selector-tagged ASOR too. Iterate `(selector, branch)` instead of `branches.values()` and short-circuit on `selector.is_aggregate_sum_on_range()` before recursing into `branch.subquery`. Add a regression test that mirrors the existing count walker test and explicitly covers the selector case. The same gap exists for has_aggregate_count_on_range_anywhere but is left as-is here since it predates this PR and changing it would be out of scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small correctness improvements flagged by review:
* grovedb/src/lib.rs verify_merk_and_submerks_in_transaction: when both
the cryptographic combined_value_hash check and the
aggregate_consistency check fail for the same subtree path, the
aggregate-consistency branch's `issues.insert(...)` clobbered the
cryptographic mismatch entry. The real merk-hash chain mismatch is
the more diagnostic message, so switch to `.entry().or_insert(...)`
to preserve the first-inserted entry per path.
* merk/src/proofs/query/aggregate_sum.rs: the prover walks our *own*
in-memory merk. If `aggregate_data()` refuses to surface a
`ProvableSum` for a node in a tree we already gated as
`ProvableSumTree`, that is local storage/state corruption, not a
peer-supplied invalid proof. Reclassify three sites from
`Error::InvalidProofError` to `Error::CorruptedData` to match the
repo error-handling convention.
* grovedb/src/operations/proof/generate.rs: the two ASOR call sites
forwarded bare `Error::MerkError` for `prove_aggregate_sum_on_range`
failures, making them indistinguishable from surrounding
proof-generation merk errors. Wrap each with
`Error::CorruptedData(format!("prove_aggregate_sum_on_range failed: {}", e))`
per the repo convention.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on test gaps in the ProvableSumTree / AggregateSumOnRange suite: * provable_sum_tree_tests::populated_provable_sum_tree_round_trips: the per-iteration loop fetched the parent psum but only asserted its variant — the running aggregate after each insert was never checked. Track an `expected_sum` accumulator and assert `as_provable_sum_tree_value() == expected_sum` after every insert so the running aggregate (7 → 20 → 40) is pinned down, not just the final sum. * direct_insert_provable_sum_tree_with_root_key_and_sum: the test was named for a direct-insert path but never actually performed one — it built a populated template tree and inspected its on-disk shape, then stopped. Capture the template's root_key + sum, then run a batch `insert_only_known_to_not_already_exist_op` over a ProvableSumTree element carrying those values at a fresh top-level key (the non-batch insert path forbids non-empty Tree elements with "a tree should be empty at the moment of insertion when not using batches", so the documented direct-insert semantics are only reachable via the batch API). Assert the round-tripped element preserves the captured root_key + sum. * element/tree_type.rs get_feature_type_zeros_sum_for_not_summed_in_sum_parents: extend the NotSummed wrapper regression to include `TreeType::ProvableSumTree` so the new Phase 2 sum-bearing parent variant's zero-sum semantics stay pinned alongside SumTree, BigSumTree, CountSumTree, and ProvableCountSumTree. * tree/link.rs round_trip_aggregate_data_provable_sum_negative: pin down the new wire tag 7 introduced for `AggregateData::ProvableSum`. Encode a negative-sum reference link (also exercises the signed-i64 varint path), assert tag 7 is present in the bytes, then decode and assert the link's fields round-trip identically. * aggregate_sum_query_tests::aggregate_sum_with_subquery_is_rejected_at_validation: previously only fed a dummy proof to the verifier-side validator, leaving the new prover-side gate (`prove_query_non_serialized` short-circuit) unasserted. Build the malformed PathQuery, set up the standard 15-key fixture, and assert `db.prove_query(...)` returns `Err` so a regression in the prover gate can't slip through with this test still passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TreeType::ProvableSumTree doc said "Phase 1: behaves identically to SumTree everywhere except in inner_node_type / empty_tree_feature_type. Phase 2 will diverge the hash computation." Phase 2 has shipped — the hash dispatch now goes through node_hash_with_sum and the new proof-node families (KVSum, KVHashSum, KVDigestSum, KVRefValueHashSum, HashWithSum) plus the AggregateSumOnRange query are all in place. Rewrite the comment to describe the current (post-Phase-2) semantics: an i64 sum baked into every node's hash, making sum tampering catchable via proof verification, as the sum-side counterpart of ProvableCountTree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude addressing the review feedback. Pushed 2b8490e, ae7b971, 8cec7b5, fce7132, ff5645b. Fixed:
Skipped (false positives / out of scope):
Test totals: 2959 passing (was 2956). 3 new tests added across the fixes above. No regressions. Per project memory rule: identifying as Claude on this comment. The branch |
…ayer
Security finding (Codex): the `verify_aggregate_sum_query` and
`verify_aggregate_count_query` chain walkers only checked
`element.is_any_tree()` for path elements. At the terminal (leaf) layer
this is insufficient — if the honest tree at the queried path happens
to be an EMPTY Merk-backed tree of any type (NormalTree, SumTree,
BigSumTree, CountTree, CountSumTree, ProvableCountTree,
ProvableCountSumTree, ProvableSumTree), its stored
`value_hash = combine_hash(H(element_bytes), NULL_HASH)`. The merk
verifier accepts empty proof bytes as `(NULL_HASH, 0)`, so an attacker
can construct a forged proof with:
- layer 0: honest single-key proof of the leaf path key in its parent
- layer 1: empty bytes (forged)
and the chain check passes uniformly. The verifier returns `sum = 0`
(or `count = 0`) against the trusted root hash, even though the leaf
isn't a Provable{Sum,Count}Tree.
The numeric answer is correct (an empty tree has sum 0 / count 0), so
this isn't a value forgery — but it IS a type-confusion soundness gap:
a caller that infers "leaf is a ProvableSumTree" from "the aggregate
verifier accepted" is deceived. The prover-side gate in
`Merk::prove_aggregate_{sum,count}_on_range` already rejects
non-provable inputs, but the verifier didn't mirror that invariant.
THE FIX
In `enforce_lower_chain`, add an `is_terminal: bool` parameter. At
intermediate depths nothing changes (`is_any_tree()` still suffices —
the GroveDB grove can route through any tree type on the way down). At
the terminal depth — passed `is_terminal = true` when
`depth + 1 == path_keys.len()` — the verifier now requires:
- aggregate-sum: `matches!(element, Element::ProvableSumTree(..))`
- aggregate-count: `matches!(element, Element::ProvableCountTree(..) |
Element::ProvableCountSumTree(..))`
Wrapper variants (NonCounted, NotSummed) are stripped via the existing
`into_underlying()` so they continue to work transparently.
TESTS
Three new regression tests that surgically construct the forgery from a
real honest single-key envelope and confirm the verifier now rejects:
- `empty_leaf_type_confusion_forgery_rejected` (sum side, empty
NormalTree at leaf)
- `empty_provable_count_tree_at_leaf_rejected_for_sum` (sum side,
empty ProvableCountTree at leaf — confirms type-specificity)
- `empty_leaf_type_confusion_forgery_rejected` (count side, empty
NormalTree at leaf)
The path == 0 case is unaffected: the merk-level hash divergence
between `node_hash` and `node_hash_with_sum` / `node_hash_with_count`
makes it computationally infeasible to forge a proof that matches the
trusted root, so the path-elements check is unnecessary at the root.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Pushed da53ef07 addressing the Codex finding about the aggregate-sum verifier's loose tree-type check at the terminal layer. Verified the finding is real. With the leaf at an empty Merk-backed tree of any type (not just empty Same bug existed on the count side. Pre-existing this PR; fixed in the same commit. Fix.
Wrapper variants ( Path == [] case. Not vulnerable. At depth 0 the chain check is bypassed (no parent layer), and the merk-level hash divergence between Regression tests (3 new, all use V0 envelope via
Each test honest-probes the leaf path with a single-key query to harvest valid layer-0 and layer-1 merk proof bytes, builds a forged Workspace: |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grovedb/src/operations/proof/aggregate_count.rs (1)
120-123:⚠️ Potential issue | 🔴 CriticalAdd empty-path validation for AggregateCountOnRange queries.
The concern is valid: when
path_query.pathis empty,path_keysis also empty, causingdepth == path_keys.len()to be true immediately at entry (depth=0). This bypasses the terminal-type gate inenforce_lower_chain(lines 155-170), which validates that non-leaf descendants are ProvableCountTree or ProvableCountSumTree.While the validation chain (
PathQuery→SizedQuery→Query::validate_aggregate_count_on_range) currently rejects malformed inner ranges (Key, RangeFull, nested aggregates), no explicit empty-path check was found in the grovedb-level validation. The code also lacks test coverage for empty-path aggregate count queries.Two solutions:
- Reject empty paths explicitly in
PathQuery::validate_aggregate_count_on_range()or the underlyingQueryvalidation.- Document a caller-side guarantee that aggregate-count queries are never constructed with empty paths (root-layer queries).
Same applies to lines 194-196 in
verify_v1_layer.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/proof/aggregate_count.rs` around lines 120 - 123, The AggregateCountOnRange flow currently allows empty paths to reach the leaf branch immediately (depth == path_keys.len()) and bypass enforce_lower_chain validation; update validation to reject empty paths by adding an explicit check in PathQuery::validate_aggregate_count_on_range (or the underlying Query validation) to return an error when path.is_empty(), and mirror the same check for the v1 verifier branch (verify_v1_layer) to prevent empty-path AggregateCountOnRange queries from being processed; reference verify_count_leaf and enforce_lower_chain to ensure the leaf/descendant gating remains correct after adding this validation.
♻️ Duplicate comments (1)
grovedb/src/operations/proof/aggregate_sum.rs (1)
204-206:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUse V1 proof semantics for
LayerProofpath steps.
verify_v1_layerstill routes throughverify_single_key_layer_proof_v0, and that helper still hardcodesexecute_proof(..., 0). That means every non-leaf step in a V1 aggregate-sum proof is validated under V0 compatibility rules, so this verifier can still accept path-layer proof shapes that the rest of the V1 verifier rejects. Please thread the proof version through here and usePROOF_VERSION_LATESTon the V1 path.Based on learnings:
PROOF_VERSION_LATESTis a workspace-scoped constant for GroveDB V1 proof verification paths and should be used there rather than treating V1 as V0.Also applies to: 262-277
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/proof/aggregate_sum.rs` around lines 204 - 206, The V1 layer verification is incorrectly delegating to verify_single_key_layer_proof_v0 which calls execute_proof with version 0; update the V1 path so verify_v1_layer (and the call sites around the shown snippet and the similar block at lines ~262-277) thread a proof version through to the helper and call the appropriate helper or overload that uses PROOF_VERSION_LATEST instead of hardcoding 0—specifically, change the call site that currently invokes verify_single_key_layer_proof_v0(merk_bytes, &next_key, path_query) to use a V1-aware verifier (or a new verify_single_key_layer_proof that accepts a proof_version param) and ensure execute_proof is invoked with PROOF_VERSION_LATEST for V1 path steps.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grovedb/src/operations/proof/aggregate_sum.rs`:
- Around line 79-109: The code currently allows aggregate-sum queries with an
empty path which skips the terminal type check; fix by either rejecting
empty-path aggregate-sum queries early in validation (in
path_query.validate_aggregate_sum_on_range or immediately after) or by adding a
root-path type gate before calling verify_v0_layer/verify_v1_layer (call
enforce_lower_chain with is_terminal=true or perform the same ProvableSumTree
check when path_query.path.is_empty()), and add a regression test that submits
an empty-path aggregate-sum proof against a non-ProvableSumTree root to ensure
the verifier fails.
---
Outside diff comments:
In `@grovedb/src/operations/proof/aggregate_count.rs`:
- Around line 120-123: The AggregateCountOnRange flow currently allows empty
paths to reach the leaf branch immediately (depth == path_keys.len()) and bypass
enforce_lower_chain validation; update validation to reject empty paths by
adding an explicit check in PathQuery::validate_aggregate_count_on_range (or the
underlying Query validation) to return an error when path.is_empty(), and mirror
the same check for the v1 verifier branch (verify_v1_layer) to prevent
empty-path AggregateCountOnRange queries from being processed; reference
verify_count_leaf and enforce_lower_chain to ensure the leaf/descendant gating
remains correct after adding this validation.
---
Duplicate comments:
In `@grovedb/src/operations/proof/aggregate_sum.rs`:
- Around line 204-206: The V1 layer verification is incorrectly delegating to
verify_single_key_layer_proof_v0 which calls execute_proof with version 0;
update the V1 path so verify_v1_layer (and the call sites around the shown
snippet and the similar block at lines ~262-277) thread a proof version through
to the helper and call the appropriate helper or overload that uses
PROOF_VERSION_LATEST instead of hardcoding 0—specifically, change the call site
that currently invokes verify_single_key_layer_proof_v0(merk_bytes, &next_key,
path_query) to use a V1-aware verifier (or a new verify_single_key_layer_proof
that accepts a proof_version param) and ensure execute_proof is invoked with
PROOF_VERSION_LATEST for V1 path steps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ed5bfdc7-5371-4238-872a-7e002c2093b6
📒 Files selected for processing (14)
grovedb-query/src/query.rsgrovedb/src/lib.rsgrovedb/src/operations/proof/aggregate_count.rsgrovedb/src/operations/proof/aggregate_sum.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/operations/proof/verify.rsgrovedb/src/tests/aggregate_count_query_tests.rsgrovedb/src/tests/aggregate_sum_query_tests.rsgrovedb/src/tests/provable_sum_tree_tests.rsgrovedb/src/tests/trunk_proof_tests.rsmerk/src/element/tree_type.rsmerk/src/proofs/query/aggregate_sum.rsmerk/src/tree/link.rsmerk/src/tree_type/mod.rs
🚧 Files skipped from review as they are similar to previous changes (8)
- merk/src/tree/link.rs
- grovedb-query/src/query.rs
- merk/src/element/tree_type.rs
- grovedb/src/tests/provable_sum_tree_tests.rs
- merk/src/proofs/query/aggregate_sum.rs
- grovedb/src/lib.rs
- merk/src/tree_type/mod.rs
- grovedb/src/operations/proof/generate.rs
Codex follow-up + CodeRabbit: the previous fix added a terminal-type
gate in `enforce_lower_chain`, but `verify_v0_layer` and
`verify_v1_layer` short-circuit to the leaf verifier when
`depth == path_keys.len()`. With an empty path (`path == []`) that's
true at depth 0, so the type gate is never invoked.
In practice the empty-path case is already protected by hash divergence:
the GroveDB root merk is always a `NormalTree` (built with
`Element::empty_tree()` by API), so its root_hash uses `node_hash`. An
attacker's forged proof of `HashWithSum` / `HashWithCount` ops would
reconstruct via `node_hash_with_sum` / `node_hash_with_count` — distinct
hash functions, no collision. So the caller's root-hash compare catches
the forgery cryptographically.
But the defense-in-depth principle says: don't rely on the cryptographic
divergence implicitly. Reject up-front, before any proof handling.
PathQuery::validate_aggregate_{sum,count}_on_range now check
`self.path.is_empty()` and return a clear InvalidQuery error naming why
(root is always NormalTree, no valid Provable* target at root).
The check fires at the entry of `verify_aggregate_{sum,count}_query`
(which call `validate_*` first thing) and at `prove_query` (the
generator also validates the path query before dispatch).
TESTS
- `empty_path_aggregate_sum_rejected_at_validation`
- `empty_path_aggregate_count_rejected_at_validation`
Both pin the rejection at both the PathQuery validator and the verify
entrypoint. 2964 workspace tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Pushed cc1828dd addressing CodeRabbit's empty-path follow-up on both aggregate verifiers. Verification of the finding. Real concern. In practice the empty-path case is already protected by hash divergence: the GroveDB root merk is always a Fix. Tests (2 new, pin rejection at both the
Workspace: 2964 pass / 0 fail (was 2962). On the V1-hardcoded-V0 duplicate. Still skipping in this PR for the reason given in the prior thread — |
Mirrors PR #662's `query_aggregate_count` for the signed-sum side. Callers that need a sum value but not a proof (e.g. server handlers answering `prove=false` sum requests) can now bypass proof construction, serialization, and verification entirely. The merk-level walk is `O(log n + |boundary|)` in the number of distinct keys, identical complexity to the prover but without the proof-op allocations or hash recomputations. The signed-sum arithmetic carries the same `i128` accumulator the prover and verifier use (so adversarial intermediate sums never wrap), and narrows to `i64` at the public entry point. An out-of-i64 result is classified as `Error::CorruptedData` since a real `ProvableSumTree` maintains every aggregate as `i64` at every level. NEW APIS - `Merk::sum_aggregate_on_range(&inner_range, grove_version) -> CostResult<i64, Error>` in `merk/src/merk/get.rs`. Checks `tree_type == ProvableSumTree`; rejects any other tree type with `Error::InvalidProofError`. Returns 0 for an empty merk. - `RefWalker::sum_aggregate_on_range(&inner_range, grove_version)` in `merk/src/proofs/query/aggregate_sum.rs`. Walks the same Contained / Disjoint / Boundary classification path as `create_aggregate_sum_on_range_proof`, but emits no proof ops. - `GroveDb::query_aggregate_sum(path_query, transaction, grove_version) -> CostResult<i64, Error>` in `grovedb/src/operations/get/query.rs`. Validates the PathQuery up-front via `validate_aggregate_sum_on_range` (same gate the prover and verifier use — catches malformed ASOR queries plus the empty-path rejection from the prior commit before any storage reads), opens the leaf merk at `path_query.path`, and delegates to the merk-level walk. - New `query_aggregate_sum_on_range` field on `GroveDBOperationsQueryVersions`, wired through v1/v2/v3 at version `0`. NotSummed-correctness is preserved via the same `own_sum = node_sum - left_struct - right_struct` derivation the prover uses. NotSummed-wrapped subtrees have stored aggregate 0, so the subtraction yields 0 at the wrapper boundary - they do not contribute to the in-range total. The returned sum is **not** independently verifiable: callers are trusting their own merk read path. For a verifiable sum, continue using `prove_query` + `verify_aggregate_sum_query`. Documented explicitly on both entry points. TESTS - 10 new merk-level cross-checks (`merk/src/proofs/query/aggregate_sum.rs::tests`): each range variant against `prove_aggregate_sum_on_range`'s computed sum, plus empty-merk-returns-0, NormalTree rejection, ProvableCountTree rejection (precise tree-type match, not "any provable aggregate tree"), and a mixed-positive/negative scenario that exercises the signed `own_sum` subtraction. - 11 new GroveDB-level cross-checks (`grovedb/src/tests/aggregate_sum_query_tests.rs::tests`): every range shape on a populated `ProvableSumTree`, empty subtree returns 0, negative-sum scenario, invalid-inner-range (`Key`) rejected with `InvalidQuery`, empty-path rejected with `InvalidQuery`, NormalTree leaf rejected with `MerkError` from the merk-level gate. Workspace `cargo test --all-features`: 2985 passing / 0 failing (was 2964 / 0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…margulis-99463b # Conflicts: # grovedb-version/src/version/grovedb_versions.rs # grovedb-version/src/version/v1.rs # grovedb-version/src/version/v2.rs # grovedb-version/src/version/v3.rs # grovedb/src/tests/aggregate_count_query_tests.rs # merk/src/merk/get.rs
…OnRange Adds parallel coverage for the variant-11 dispatch paths in encode/ decode_with_depth/borrow_decode_with_depth/Display/serde, plus helper accessors (lower_bound, upper_bound, is_aggregate_sum_on_range, aggregate_sum_inner, ...). Mirrors the existing AggregateCountOnRange test set so each match arm has direct exercise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds direct round-trip tests of get_flags_owned / get_flags_mut / set_flags on every aggregate-bearing variant (CountSumTree, ProvableCountTree, ProvableCountSumTree, ProvableSumTree, ItemWithSumItem, CommitmentTree, MmrTree, BulkAppendTree, DenseAppendOnlyFixedSizeTree, BigSumTree, CountTree) plus NotSummed / NonCounted delegation arms. Each test pins one match arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Direct unit tests for the previously untestable internal helper. Each aggregate-bearing tree variant now has both a matching (returns None) and mismatching (returns Some) case, plus tests for the empty-merk identity arms (zero-recorded with NoAggregateData), non-Merk data tree arms (always None), and the catch-all variant/shape mismatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 10 mutation-style tests for the GroveDB-side aggregate-sum verifier (verify_v0_layer / verify_v1_layer / verify_sum_leaf / verify_single_key_layer_proof_v0 / enforce_lower_chain). Each test pins one previously-uncovered error arm: - V1 unexpected non-merk leaf bytes - V0 and V1 missing lower_layer for path key - Malformed leaf sum proof (Phase 1 rejection) - Corrupted non-leaf merk bytes (single-key proof failure) - Non-leaf proof without target key - KV replaced by KVDigest in non-leaf (no value bytes) - Undeserializable value bytes on the path - Intermediate-tree non-tree element rejection - Unparsable envelope bincode decode error Plus mirrors of the count-side AggregateSumOnRange rejection tests in proof/generate.rs for dense/MMR/BulkAppend index helpers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Direct unit tests for previously-uncovered internal helpers in
merk/src/proofs/query/aggregate_sum.rs:
- provable_sum_from_aggregate Err arm for every non-ProvableSum
AggregateData variant (CorruptedData classification check)
- provable_sum_from_aggregate happy path including i64::MIN/MAX
- is_provable_sum_bearing false for every non-ProvableSumTree
TreeType variant
- classify_subtree additional disjoint-above / contained-within /
boundary-overlapping-upper cases
- key_strictly_inside unbounded endpoint and equality cases
- empty ProvableSumTree prove+verify round-trip
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds parallel-variant rejection tests in the BulkAppendTree and dense fixed-size Merkle tree proof modules. Both tree types have no count or sum commitment in their node hash, so their index-resolution helpers reject AggregateCountOnRange and AggregateSumOnRange query items outright. This exercises the previously-uncovered rejection arms in both proof modules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small targeted coverage additions:
- merk/src/element/tree_type.rs: direct per-variant tests for the
ProvableSumTree / CommitmentTree / BulkAppendTree /
DenseAppendOnlyFixedSizeTree / MmrTree arms of
root_key_and_tree_type, tree_flags_and_type, tree_type,
maybe_tree_type, and tree_feature_type, plus a
ProvableSumTree-through-NotSummed delegation test.
- grovedb/src/tests/aggregate_sum_query_tests.rs: tests that
drive node_to_string's KVSum / KVHashSum / KVDigestSum /
HashWithSum / KVRefValueHashSum Display arms in
grovedb/src/operations/proof/mod.rs by formatting real
ProvableSumTree proofs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (4)
grovedb-query/src/query_item/mod.rs (1)
320-322: 💤 Low valueStale inline comment in
NonAggregateInner::deserialize.The comment at lines 320-322 only mentions excluding
"AggregateCountOnRange", but theFieldenum andNON_AGGREGATE_VARIANTS(line 378-389) correctly exclude both aggregate variants. The struct-level doc at lines 297-310 was updated to reference both; this inline comment is the lone leftover.📝 Suggested wording tweak
- // Field set excludes "AggregateCountOnRange"; encountering that tag - // produces a serde "unknown variant" error before any inner - // recursion can happen. + // Field set excludes both `AggregateCountOnRange` and + // `AggregateSumOnRange`; encountering either tag produces a serde + // "unknown variant" error before any inner recursion can happen.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb-query/src/query_item/mod.rs` around lines 320 - 322, Update the stale inline comment inside NonAggregateInner::deserialize to reflect that the Field set excludes both aggregate variants (e.g., "AggregateCountOnRange" and "AggregateSomethingElse") rather than only "AggregateCountOnRange"; reference the Field enum and NON_AGGREGATE_VARIANTS to confirm which aggregate variants are excluded and rewrite the comment to mention both aggregate variants so it matches the struct-level doc and the actual NON_AGGREGATE_VARIANTS definition.grovedb/src/tests/aggregate_sum_query_tests.rs (2)
1038-1041: 💤 Low valueRedundant disjunction in assertion.
msg.contains("must be a ProvableSumTree")is implied bymsg.contains("ProvableSumTree"). Either drop the second branch to pin the exact phrase, or drop the first since it's already covered.Proposed simplification
- assert!( - msg.contains("must be a ProvableSumTree") || msg.contains("ProvableSumTree"), - "verifier rejected as expected but with an unrelated message: {msg}" - ); + assert!( + msg.contains("must be a ProvableSumTree"), + "verifier rejected as expected but with an unrelated message: {msg}" + );🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 1038 - 1041, The assertion in aggregate_sum_query_tests.rs redundantly checks msg.contains("must be a ProvableSumTree") || msg.contains("ProvableSumTree"); update the assert to use a single, precise check on the error string (either msg.contains("must be a ProvableSumTree") to pin the exact phrase, or msg.contains("ProvableSumTree") if you want a looser match) and adjust the failure message accordingly; locate the assert that references the local variable msg and replace the disjunction with the chosen single contains() call.
318-385: 💤 Low valueOverflow test can silently no-op if insertion is permissive.
If both i64::MAX inserts succeed (which the comment on Line 339-341 acknowledges is possible depending on Phase 1's aggregation rules), but the prover also doesn't detect the overflow and produces a proof, then the test exercises the verifier's narrowing check correctly. However, if both inserts succeed AND the prover never produces a proof that triggers overflow (e.g. Phase 1 silently saturates instead), the assertion on Line 379-382 only fires when
prove_resultisOk. The test will pass even if a regression makes overflow undetectable, because the test bails out at Line 365-369. Consider asserting that either insert fails or verify rejects — never the silent-no-op branch — to keep this test load-bearing.Proposed hardening
- if !ok1 || !ok2 { - // Insertion already rejected the overflow — that's the - // healthiest end state. Bail out. - return; - } + let either_insert_rejected = !ok1 || !ok2; let pq = PathQuery::new_aggregate_sum_on_range( vec![TEST_LEAF.to_vec(), b"st".to_vec()], QueryItem::RangeInclusive(b"a".to_vec()..=b"b".to_vec()), ); - let prove_result = db.grove_db.prove_query(&pq, None, v).unwrap(); - match prove_result { - Err(_) => { /* prover detected overflow — fine */ } - Ok(proof) => { - let verify_result = GroveDb::verify_aggregate_sum_query(&proof, &pq, v); - assert!( - verify_result.is_err(), - "verifier must reject a sum that doesn't fit in i64" - ); - } - } + if either_insert_rejected { + return; + } + let prove_result = db.grove_db.prove_query(&pq, None, v).unwrap(); + match prove_result { + Err(_) => { /* prover detected overflow — fine */ } + Ok(proof) => { + let verify_result = GroveDb::verify_aggregate_sum_query(&proof, &pq, v); + assert!( + verify_result.is_err(), + "verifier must reject a sum that doesn't fit in i64" + ); + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 318 - 385, Test currently early-returns when both inserts succeed, allowing a silent-no-op regression to pass; change provable_sum_tree_overflow_at_i64_max_is_rejected so it asserts that at least one of these holds: insertion rejected ( !ok1 || !ok2 ), prover rejected (prove_result is Err), or verifier rejected (GroveDb::verify_aggregate_sum_query returns Err). Keep using PathQuery::new_aggregate_sum_on_range and the existing prove_result/verify_result flow but replace the early return with a final assertion that (!ok1 || !ok2) || prove_result.is_err() || verify_result.is_err(), so the test fails if all steps silently succeed.grovedb/src/operations/get/query.rs (1)
662-667: 💤 Low valueConsider wrapping
MerkErrorwith contextualCorruptedDataper repo convention.The PR description notes that callers of
prove_aggregate_sum_on_rangewere updated to wrap theMerkErrorwith contextualCorruptedDatamessages "per repo convention." The no-proof entry point here uses the same pattern as the existingquery_aggregate_count(.map_err(Error::MerkError)on line 821), so the in-file convention is consistent — but a callsite-specific wrap (e.g.Error::CorruptedData(format!("query_aggregate_sum at path {:?}: {}", path_slices, e))) would give operators better triage signal when a non-ProvableSumTreeresolves at this path or storage is corrupted. As per coding guidelines: "Wrap errors with context using.map_err(|e| Error::CorruptedData(format!(\"context: {}\", e)))pattern in Rust source files".Happy to also apply the same to
query_aggregate_count(line 817-822) in a follow-up if you'd like to keep the two mirrored.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/get/query.rs` around lines 662 - 667, Replace the bare .map_err(Error::MerkError) on the subtree.sum_aggregate_on_range call with a contextual wrapper that maps the MerkError into Error::CorruptedData including identifying context (e.g., the path_slices and that this occurred in query_aggregate_sum / sum aggregate resolution); specifically change the error mapping for subtree.sum_aggregate_on_range(...) to .map_err(|e| Error::CorruptedData(format!("query_aggregate_sum at path {:?}: {}", path_slices, e))) so callers can triage corrupted or non-ProvableSumTree cases more easily while keeping the rest of the cost_return_on_error(...) call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@grovedb-query/src/query_item/mod.rs`:
- Around line 320-322: Update the stale inline comment inside
NonAggregateInner::deserialize to reflect that the Field set excludes both
aggregate variants (e.g., "AggregateCountOnRange" and "AggregateSomethingElse")
rather than only "AggregateCountOnRange"; reference the Field enum and
NON_AGGREGATE_VARIANTS to confirm which aggregate variants are excluded and
rewrite the comment to mention both aggregate variants so it matches the
struct-level doc and the actual NON_AGGREGATE_VARIANTS definition.
In `@grovedb/src/operations/get/query.rs`:
- Around line 662-667: Replace the bare .map_err(Error::MerkError) on the
subtree.sum_aggregate_on_range call with a contextual wrapper that maps the
MerkError into Error::CorruptedData including identifying context (e.g., the
path_slices and that this occurred in query_aggregate_sum / sum aggregate
resolution); specifically change the error mapping for
subtree.sum_aggregate_on_range(...) to .map_err(|e|
Error::CorruptedData(format!("query_aggregate_sum at path {:?}: {}",
path_slices, e))) so callers can triage corrupted or non-ProvableSumTree cases
more easily while keeping the rest of the cost_return_on_error(...) call intact.
In `@grovedb/src/tests/aggregate_sum_query_tests.rs`:
- Around line 1038-1041: The assertion in aggregate_sum_query_tests.rs
redundantly checks msg.contains("must be a ProvableSumTree") ||
msg.contains("ProvableSumTree"); update the assert to use a single, precise
check on the error string (either msg.contains("must be a ProvableSumTree") to
pin the exact phrase, or msg.contains("ProvableSumTree") if you want a looser
match) and adjust the failure message accordingly; locate the assert that
references the local variable msg and replace the disjunction with the chosen
single contains() call.
- Around line 318-385: Test currently early-returns when both inserts succeed,
allowing a silent-no-op regression to pass; change
provable_sum_tree_overflow_at_i64_max_is_rejected so it asserts that at least
one of these holds: insertion rejected ( !ok1 || !ok2 ), prover rejected
(prove_result is Err), or verifier rejected (GroveDb::verify_aggregate_sum_query
returns Err). Keep using PathQuery::new_aggregate_sum_on_range and the existing
prove_result/verify_result flow but replace the early return with a final
assertion that (!ok1 || !ok2) || prove_result.is_err() ||
verify_result.is_err(), so the test fails if all steps silently succeed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1b061371-8b2d-4352-96a7-ee7c9c94e759
📒 Files selected for processing (14)
grovedb-element/src/element/helpers.rsgrovedb-query/src/query_item/mod.rsgrovedb-version/src/version/grovedb_versions.rsgrovedb-version/src/version/v1.rsgrovedb-version/src/version/v2.rsgrovedb-version/src/version/v3.rsgrovedb/src/lib.rsgrovedb/src/operations/get/query.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/query/mod.rsgrovedb/src/tests/aggregate_count_query_tests.rsgrovedb/src/tests/aggregate_sum_query_tests.rsmerk/src/merk/get.rsmerk/src/proofs/query/aggregate_sum.rs
🚧 Files skipped from review as they are similar to previous changes (5)
- grovedb/src/lib.rs
- grovedb-element/src/element/helpers.rs
- grovedb/src/operations/proof/generate.rs
- grovedb/src/query/mod.rs
- merk/src/proofs/query/aggregate_sum.rs
Four low-value but clean tweaks from CodeRabbit on PR #661: - `grovedb-query/src/query_item/mod.rs`: refresh the stale `NonAggregateInner::deserialize` inline comment to mention both excluded aggregate variants (Count + Sum), matching the struct-level doc and `NON_AGGREGATE_VARIANTS`. - `grovedb/src/tests/aggregate_sum_query_tests.rs`: drop the redundant disjunction `msg.contains("must be a ProvableSumTree") || msg.contains("ProvableSumTree")` — the first clause already implies the second; pin the exact phrase. - `grovedb/src/tests/aggregate_sum_query_tests.rs`: harden `provable_sum_tree_overflow_at_i64_max_is_rejected` so it no longer silently passes when insert AND prove AND verify all accept an overflow. Replace the early-return-on-both-inserts-accepted with an explicit "at least one stage must reject" assertion. - `grovedb/src/operations/get/query.rs`: wrap the MerkError from `Merk::sum_aggregate_on_range` (and the count sibling) with contextual `CorruptedData(format!("query_aggregate_{sum,count} at path {:?}: {}", path_slices, e))` per the repo error-wrapping convention. Two test assertions updated from `MerkError(_)` to `CorruptedData(_)` to match. Workspace `cargo test --all-features`: 3102 pass / 0 fail (unchanged). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Pushed b1027dcb addressing all 4 CodeRabbit nitpicks:
Workspace |
CodeRabbit flagged the TreeType column for ProvableSumTree as "should be
17" — the technical claim was wrong (11 is the correct TreeType
discriminant per `merk/src/tree_type/mod.rs:77`), but the confusion was
fair: the column header "TreeType" was ambiguous and the table had
several pre-existing inaccuracies in adjacent rows. This commit fixes
the ambiguity AND the bugs.
Changes:
- Rename column header from "TreeType" to "TreeType disc" and add an
intro paragraph explaining that "Element disc" and "TreeType disc"
are discriminants of two SEPARATE enums.
- Add the TreeType-variant label to every tree row for consistency
(some had it, most didn't). The new format is `N (VariantName)` —
e.g. `5 (ProvableCountTree)` — which CodeRabbit-style auto-review
can't misread.
- Fix three pre-existing wrong TreeType disc values:
`BigSumTree`: 4 -> 2
`CountTree`: 2 -> 3
`CountSumTree`: 3 -> 4
(These were drift from the actual `TreeType::discriminant()`
implementation; the file had `4 (BigSumTree)` etc. but those labels
were wrong.)
- Swap the row order at Element discriminants 8 and 9 to match the
actual `Element` enum order:
8 = `ProvableCountTree` (was incorrectly listed as `ItemWithSumItem`)
9 = `ItemWithSumItem` (was incorrectly listed as `ProvableCountTree`)
- Tighten the `ProvableCountSumTree` Purpose blurb to note "only count
in hash" since the sum is tracked metadata, not bound — this is the
half-step variant a future `ProvableCountAndSumTree` would replace.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Going back through the CodeRabbit review thread to make sure nothing was silently dismissed. Just-pushed 86b527be addressing the appendix-a TreeType column. CodeRabbit's specific claim ("TreeType should be 17") was wrong (11 is the correct
Re: the two remaining
Both are documented for the cidx rebase / future PR. Everything else from the review has been addressed. |
…margulis-99463b # Conflicts: # grovedb-query/src/query.rs # grovedb/src/operations/proof/aggregate_count.rs # grovedb/src/operations/proof/generate.rs # grovedb/src/query/mod.rs
Codex security finding: the regular query verifier's lower- and
upper-bound `last_push` match arms in
`merk/src/proofs/query/verify.rs::execute_proof` (lines 206 / 241)
accept the Count-family boundary node variants (`KVCount`,
`KVDigestCount`, `KVRefValueHashCount`) but omit the parallel Sum
family (`KVSum`, `KVDigestSum`, `KVRefValueHashSum`). Regular proofs
against a `ProvableSumTree` can legitimately emit `KVDigestSum` as the
absence-boundary node for a queried key, so a multi-item query like
`Key("aa")` followed by `Range("g".."j")` would reject the perfectly
valid proof with `Cannot verify lower bound of queried range` whenever
the preceding boundary happened to be sum-flavored.
The downstream absence check at line ~572 already handled all six
node types (Count + Sum), making the omission an asymmetry between
the two checks within the same function.
THE FIX
Add `KVSum`, `KVDigestSum`, `KVRefValueHashSum` to both the lower- and
upper-bound `last_push` match arms. While at it, also extend
`boundaries_in_proof` (line ~742) to surface `KVDigestSum` boundary
keys alongside `KVDigest` and `KVDigestCount` — same class of
omission, same trivial extension.
TESTS
New `provable_sum_tree_bound_regression_tests` module at the bottom of
`verify.rs` covering:
- `key_plus_range_on_provable_sum_tree_left_to_right_verifies` — the
exact `[Key("aa"), Range("g".."j")]` shape Codex flagged, in
forward iteration. Without the fix this returns
`InvalidProofError("Cannot verify lower bound of queried range")`.
- `key_plus_range_on_provable_sum_tree_right_to_left_verifies` —
same query with `left_to_right = false`. The bug is symmetric, so
the regression coverage is too.
- `kv_digest_sum_appears_in_boundaries_in_proof` — proves that
`boundaries_in_proof` now surfaces `KVDigestSum`-flavor boundary
keys produced by `ProvableSumTree` proofs.
Workspace `cargo test --all-features`: 3150 pass / 0 fail (was 3147 /
0).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
merk/src/proofs/query/verify.rs (1)
884-901:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winKeep
key_exists_as_boundary_in_proofin sync with the new Sum-family boundary semantics.
boundaries_in_proof()now recognizesKVDigestSum, butkey_exists_as_boundary_in_proof()still only matchesKVDigest/KVDigestCount. That makes the two public helpers disagree on validProvableSumTreeabsence proofs.Suggested fix
match &op { Op::Push(Node::KVDigest(k, _)) | Op::PushInverted(Node::KVDigest(k, _)) | Op::Push(Node::KVDigestCount(k, _, _)) | Op::PushInverted(Node::KVDigestCount(k, _, _)) + | Op::Push(Node::KVDigestSum(k, _, _)) + | Op::PushInverted(Node::KVDigestSum(k, _, _)) if k.as_slice() == key => { return Ok(true); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/proofs/query/verify.rs` around lines 884 - 901, The helper key_exists_as_boundary_in_proof is out of sync with boundaries_in_proof: update key_exists_as_boundary_in_proof to treat KVDigestSum the same as KVDigest and KVDigestCount so Sum-family boundary nodes are recognized; locate the match/case in key_exists_as_boundary_in_proof that currently matches Op::Push/Op::PushInverted with Node::KVDigest and Node::KVDigestCount and add corresponding branches for Node::KVDigestSum (with the same key extraction and equality check logic) so both helpers behave identically for ProvableSumTree absence proofs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@merk/src/proofs/query/verify.rs`:
- Around line 884-901: The helper key_exists_as_boundary_in_proof is out of sync
with boundaries_in_proof: update key_exists_as_boundary_in_proof to treat
KVDigestSum the same as KVDigest and KVDigestCount so Sum-family boundary nodes
are recognized; locate the match/case in key_exists_as_boundary_in_proof that
currently matches Op::Push/Op::PushInverted with Node::KVDigest and
Node::KVDigestCount and add corresponding branches for Node::KVDigestSum (with
the same key extraction and equality check logic) so both helpers behave
identically for ProvableSumTree absence proofs.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: fa52ced9-e02c-4968-b7a3-8797a17c5a19
📒 Files selected for processing (16)
docs/book/src/appendix-a.mdgrovedb-bulk-append-tree/src/proof/tests.rsgrovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rsgrovedb-query/src/aggregate_count.rsgrovedb-query/src/query.rsgrovedb-query/src/query_item/mod.rsgrovedb/src/operations/get/query.rsgrovedb/src/operations/proof/aggregate_count/helpers.rsgrovedb/src/operations/proof/aggregate_count/leaf_chain.rsgrovedb/src/operations/proof/aggregate_count/per_key.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/query/mod.rsgrovedb/src/tests/aggregate_count_query_tests.rsgrovedb/src/tests/aggregate_sum_query_tests.rsmerk/src/element/tree_type.rsmerk/src/proofs/query/verify.rs
🚧 Files skipped from review as they are similar to previous changes (6)
- grovedb/src/operations/get/query.rs
- grovedb/src/tests/aggregate_count_query_tests.rs
- grovedb/src/query/mod.rs
- grovedb/src/tests/aggregate_sum_query_tests.rs
- grovedb-query/src/query_item/mod.rs
- grovedb/src/operations/proof/generate.rs
Develop landed a regression test (`query_validation_error_to_static_str_projects_invalid_operation_and_catches_other_variants`, commit 7a64938) that pins the catch-all fallback string returned by `query_validation_error_to_static_str` to `"AggregateCountOnRange query validation failed"`. This PR had generalised the helper to serve both count and sum, returning `"aggregate query validation failed"`, which broke the develop test under GitHub's "merge into base" CI workflow. Split the helper into two so each aggregate variant's error surface stays self-describing: - `query_validation_error_to_static_str` — count side, restored to the `"AggregateCountOnRange query validation failed"` label so develop's regression test stays green. - `sum_query_validation_error_to_static_str` — new sum-side helper returning `"AggregateSumOnRange query validation failed"`. Used by `SizedQuery::validate_aggregate_sum_on_range`. Both follow the same projection contract: `InvalidOperation(msg)` passes the static string through unchanged; any other variant (unreachable from real validators) gets the variant-specific fallback. No behavior change at the InvalidOperation happy path, which is all real callers reach. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit symmetry finding on top of commit 5b3a046: that fix extended `boundaries_in_proof` to recognize `KVDigestSum` boundary nodes from `ProvableSumTree` proofs, but missed the parallel helper `key_exists_as_boundary_in_proof`. The two public helpers are documented to behave identically; without this both helpers disagreed on valid `ProvableSumTree` absence proofs. Add `Op::Push(Node::KVDigestSum(..))` and the PushInverted variant to the match in `key_exists_as_boundary_in_proof`. Tighten the doc-comment to spell out that the two helpers share node-type coverage. Extended the regression test `kv_digest_sum_appears_in_boundaries_in_proof` (now renamed to `kv_digest_sum_appears_in_both_boundary_helpers`) so every boundary key surfaced by `boundaries_in_proof` is also reported by `key_exists_as_boundary_in_proof`, pinning the symmetry. Workspace `cargo test --all-features` for the affected module: 3 of 3 regression tests pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ee code Remove "Phase 1 / Phase 2 / etc." prefixes that referred to the PR's implementation timeline. Retains "Phase 1 / Phase 2" labels that describe runtime decode-vs-walk algorithm steps of the aggregate-count/sum verifiers (documented in docs/book/src/aggregate-count-queries.md). In the test-fixture stack-builder (merk/src/proofs/tree.rs) and the provable_sum_tree direct-insert test, renamed enumeration-style "Phase N" labels to "Step N" for clarity. Also renamed the phase2_* test fn prefixes in encoding.rs and tree.rs to drop the timeline label. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit re-flagged a pre-existing asymmetry in `QueryItem`'s manual serde implementations: the `Serialize` impl emitted PascalCase variant tags via `serialize_newtype_variant` (e.g. `"AggregateSumOnRange"`, `"Range"`), but the `Deserialize` impl uses `Field` enums marked `#[serde(field_identifier, rename_all = "snake_case")]`, expecting snake_case (`"aggregate_sum_on_range"`, `"range"`). The asymmetry was invisible to bincode (the format GroveDB actually uses in proofs and storage) because bincode identifies variants by index, not by the textual tag. But it broke round-trip through every text-based format that carries variant names verbatim (JSON, YAML, TOML). An in-code comment at the existing token-stream test site even documented the issue as "a pre-existing mismatch ... that breaks JSON round-trip but is invisible to formats that don't carry variant names textually." THE FIX Change every `serialize_newtype_variant` (and the one `serialize_unit_variant` for `RangeFull`) call in `grovedb-query/src/query_item/mod.rs` to emit snake_case variant tags. The variant indices stay the same so the bincode wire format is unchanged — only textual formats see the new tag names. Affected variants: `Key`, `Range`, `RangeInclusive`, `RangeFull`, `RangeFrom`, `RangeTo`, `RangeToInclusive`, `RangeAfter`, `RangeAfterTo`, `RangeAfterToInclusive`, `AggregateCountOnRange`, `AggregateSumOnRange` — i.e. every variant, not just the aggregate ones. This is the symmetric fix; doing only the new `AggregateSumOnRange` variant would have diverged it from the existing `AggregateCountOnRange` (and from the other ten variants that have always been broken the same way). Also updated the in-code comment at the token-stream test site to reflect the new contract. TESTS Two new `serde_test::assert_tokens` round-trip regression tests pin the snake_case contract on both aggregate variants: - serde_round_trip_aggregate_sum_on_range_uses_snake_case_tag - serde_round_trip_aggregate_count_on_range_uses_snake_case_tag assert_tokens exercises both Serialize AND Deserialize against the same token stream, so any future regression on either side fails the test immediately. Workspace cargo test --all-features: 3152 pass / 0 fail (was 3150 / 0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…0 envelopes V0 (`MerkOnlyLayerProof`) envelopes predate the aggregate-sum feature and cannot legitimately carry an aggregate-sum proof, so the V0 layer walker was unreachable in any honestly-produced proof. Mirror the count-side changes from PR #663: - Convert `aggregate_sum.rs` into a subdirectory mirroring `aggregate_count/`: `mod.rs` (public API + V0 rejection), `helpers.rs` (envelope decode, single-key layer verification, chain enforcement, leaf sum verification), `leaf_chain.rs` (V1 leaf-chain walker). Removes the dead `verify_v0_layer` path. - Add a prover-side V0 gate in `prove_query_non_serialized`: when grove_version dispatches to V0 and the path query carries an `AggregateSumOnRange` anywhere, return `NotSupported` instead of emitting a V0 envelope the verifier would (correctly) reject. - Update tests: replace the V0 round-trip test with a V0-rejection test; broaden the empty-leaf type-confusion test to accept either the V0-rejection or the terminal-type-gate error; remove the now- unreachable V0 missing-lower-layer test (V1 counterpart already pins the missing-layer behavior); refresh stale doc-comments that pointed at `aggregate_sum.rs` line numbers. No carrier-shape support yet — `AggregateSumOnRange` is still leaf-only on the merk side, so there is no `classification.rs` / `per_key.rs` mirror. The leaf-only shape can be extended later in parallel with matching merk-level work, just like the count side did. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
grovedb/src/tests/aggregate_sum_query_tests.rs (1)
957-1083: ⚡ Quick winThese regressions never reach the terminal-type check.
Both forged-proof tests build a
GROVE_V2/V0 envelope and treat the V0 rejection path as success, so they still pass even ifenforce_lower_chainstops checking forProvableSumTree. To actually pin the fix this PR introduced, forge the same shape under a V1 envelope and assert the terminal-type error specifically.Also applies to: 1128-1225
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 957 - 1083, The tests currently build a V0 (GROVE_V2) envelope and thus succeed by hitting the V0 rejection path; change them to construct the same forged proof under a V1 envelope so the verifier exercises the terminal-type check in enforce_lower_chain. Specifically, in the test functions empty_leaf_type_confusion_forgery_rejected (and the other occurrence around 1128-1225), set the GroveVersion reference v to the V1 constant (use GROVE_V3 or the project’s V1 version constant) so the match arm expecting GroveDBProof::V1 is used, adjust the forged envelope construction to produce a GroveDBProof::V1 envelope shape, then call GroveDb::verify_aggregate_sum_query(&forged_bytes, &attack_pq, v) and assert that the returned error message contains the terminal-type text (e.g., "must be a ProvableSumTree") to ensure enforce_lower_chain is actually exercised.merk/src/tree/mod.rs (1)
1231-1255: ⚡ Quick winCentralize the provable-hash dispatch.
These match tables now duplicate
hash_for_link()’sAggregateData→hash mapping in three places. That is a fragile proof contract: one missed branch on the next tree-type addition will make stored child hashes and computed root hashes diverge. Please extract a single helper and reuse it here and inhash_for_link().Also applies to: 1279-1303
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/tree/mod.rs` around lines 1231 - 1255, The match on AggregateData that selects node_hash_with_count/node_hash_with_sum is duplicated and must be centralized: extract a helper function (e.g., compute_provable_hash or provable_hash_for_aggregate) that takes (&AggregateData, &TreeTypeContext...) or the same parameters used by hash_for_link() and returns the computed hash (using node_hash_with_count, node_hash_with_sum, or tree.hash()), then replace the duplicated match blocks in this file (the block around the current match and the similar block at 1279-1303) to call that helper, and update hash_for_link() to call the same helper so all AggregateData→hash logic lives in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grovedb/src/operations/proof/aggregate_sum/leaf_chain.rs`:
- Around line 43-59: The proof walker accepts extra unexpected lower_layers in
V1 aggregate-sum chains; to harden it, validate that each non-terminal layer
contains exactly one lower_layer equal to the expected next_key and that the
terminal (leaf) layer contains zero lower_layers. In the function surrounding
verify_sum_leaf and verify_single_key_layer_proof_v0 where you access
layer.lower_layers (using variables depth, path_keys, next_key, layer,
path_query), add checks: if depth < path_keys.len() - 1 ensure
layer.lower_layers.len() == 1 and that the sole key equals next_key; if depth ==
path_keys.len() ensure layer.lower_layers.is_empty(); if any check fails return
Error::InvalidProof(path_query.clone(), format!(...)) with a clear message about
unexpected lower_layers.
In `@merk/src/tree/mod.rs`:
- Around line 674-691: For the TreeType::ProvableSumTree arm, do not silently
fall back to self.hash() when aggregate_data() is not
AggregateData::ProvableSum; instead fail closed by asserting or returning an
explicit error: replace the fallback branch with an assert/panic or propagate an
Err that clearly states that a ProvableSum was expected but aggregate_data()
returned something else (include the aggregate_data() value and node identity),
so node_hash_with_sum(self.inner.kv.hash(), self.child_hash(true),
self.child_hash(false), sum) is the only successful path for ProvableSumTree and
any mismatch is surfaced immediately; update callers if needed to handle the new
error path.
---
Nitpick comments:
In `@grovedb/src/tests/aggregate_sum_query_tests.rs`:
- Around line 957-1083: The tests currently build a V0 (GROVE_V2) envelope and
thus succeed by hitting the V0 rejection path; change them to construct the same
forged proof under a V1 envelope so the verifier exercises the terminal-type
check in enforce_lower_chain. Specifically, in the test functions
empty_leaf_type_confusion_forgery_rejected (and the other occurrence around
1128-1225), set the GroveVersion reference v to the V1 constant (use GROVE_V3 or
the project’s V1 version constant) so the match arm expecting GroveDBProof::V1
is used, adjust the forged envelope construction to produce a GroveDBProof::V1
envelope shape, then call GroveDb::verify_aggregate_sum_query(&forged_bytes,
&attack_pq, v) and assert that the returned error message contains the
terminal-type text (e.g., "must be a ProvableSumTree") to ensure
enforce_lower_chain is actually exercised.
In `@merk/src/tree/mod.rs`:
- Around line 1231-1255: The match on AggregateData that selects
node_hash_with_count/node_hash_with_sum is duplicated and must be centralized:
extract a helper function (e.g., compute_provable_hash or
provable_hash_for_aggregate) that takes (&AggregateData, &TreeTypeContext...) or
the same parameters used by hash_for_link() and returns the computed hash (using
node_hash_with_count, node_hash_with_sum, or tree.hash()), then replace the
duplicated match blocks in this file (the block around the current match and the
similar block at 1279-1303) to call that helper, and update hash_for_link() to
call the same helper so all AggregateData→hash logic lives in one place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 6e4d5122-0ebe-46d7-afa4-5422fdfced48
📒 Files selected for processing (28)
grovedb-element/src/element_type.rsgrovedb-element/tests/element_constructors_helpers.rsgrovedb-query/src/proofs/encoding.rsgrovedb-query/src/proofs/mod.rsgrovedb-query/src/proofs/tree_feature_type.rsgrovedb-query/src/query_item/mod.rsgrovedb/src/debugger.rsgrovedb/src/operations/proof/aggregate_sum/helpers.rsgrovedb/src/operations/proof/aggregate_sum/leaf_chain.rsgrovedb/src/operations/proof/aggregate_sum/mod.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/operations/proof/mod.rsgrovedb/src/query/mod.rsgrovedb/src/tests/aggregate_sum_query_tests.rsgrovedb/src/tests/provable_sum_tree_tests.rsgrovedb/src/tests/trunk_proof_tests.rsgrovedbg-types/src/lib.rsmerk/src/element/tree_type.rsmerk/src/merk/chunks.rsmerk/src/proofs/query/aggregate_sum.rsmerk/src/proofs/query/mod.rsmerk/src/proofs/query/verify.rsmerk/src/proofs/tree.rsmerk/src/tree/link.rsmerk/src/tree/mod.rsmerk/src/tree/tree_feature_type.rsmerk/src/tree_type/costs.rsmerk/src/tree_type/mod.rs
✅ Files skipped from review due to trivial changes (1)
- merk/src/merk/chunks.rs
🚧 Files skipped from review as they are similar to previous changes (20)
- merk/src/tree_type/costs.rs
- grovedb/src/operations/proof/mod.rs
- grovedb-query/src/proofs/mod.rs
- grovedbg-types/src/lib.rs
- merk/src/tree/tree_feature_type.rs
- grovedb/src/tests/trunk_proof_tests.rs
- grovedb-element/tests/element_constructors_helpers.rs
- merk/src/tree/link.rs
- grovedb-query/src/proofs/encoding.rs
- merk/src/proofs/query/verify.rs
- merk/src/element/tree_type.rs
- merk/src/proofs/query/mod.rs
- grovedb/src/query/mod.rs
- merk/src/tree_type/mod.rs
- grovedb/src/tests/provable_sum_tree_tests.rs
- merk/src/proofs/query/aggregate_sum.rs
- grovedb-query/src/proofs/tree_feature_type.rs
- grovedb/src/operations/proof/generate.rs
- merk/src/proofs/tree.rs
- grovedb-element/src/element_type.rs
| TreeType::ProvableSumTree => { | ||
| // For ProvableSumTree, include the aggregate sum in the hash | ||
| // via `node_hash_with_sum`. This is what makes the root hash | ||
| // diverge from a plain SumTree containing the same elements. | ||
| let aggregate_data = self | ||
| .aggregate_data() | ||
| .unwrap_or(AggregateData::NoAggregateData); | ||
| if let AggregateData::ProvableSum(sum) = aggregate_data { | ||
| node_hash_with_sum( | ||
| self.inner.kv.hash(), | ||
| self.child_hash(true), | ||
| self.child_hash(false), | ||
| sum, | ||
| ) | ||
| } else { | ||
| // Fallback to regular hash if aggregate data is unexpected | ||
| self.hash() | ||
| } |
There was a problem hiding this comment.
Fail closed when a ProvableSumTree cannot produce ProvableSum.
Lines 678-690 currently downgrade to self.hash() when aggregate_data() overflows or the node/tree_type pairing is inconsistent. That strips the sum commitment from a caller that explicitly requested ProvableSumTree hashing, which weakens the verification contract instead of surfacing corruption. Please make this path error/assert rather than emitting a plain node hash.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@merk/src/tree/mod.rs` around lines 674 - 691, For the
TreeType::ProvableSumTree arm, do not silently fall back to self.hash() when
aggregate_data() is not AggregateData::ProvableSum; instead fail closed by
asserting or returning an explicit error: replace the fallback branch with an
assert/panic or propagate an Err that clearly states that a ProvableSum was
expected but aggregate_data() returned something else (include the
aggregate_data() value and node identity), so
node_hash_with_sum(self.inner.kv.hash(), self.child_hash(true),
self.child_hash(false), sum) is the only successful path for ProvableSumTree and
any mismatch is surfaced immediately; update callers if needed to handle the new
error path.
Addresses CodeRabbit findings on PR #661. **Finding 1 (Major, fixed)** — `verify_v1_leaf_chain` accepted arbitrary `lower_layers` entries at non-leaf depths and under the leaf merk. That let two byte-distinct envelopes verify to the same `(root, sum)` because the smuggled siblings/children were never inspected, and gave downstream syntactic scanners an unverified surface to read from. Added a strict shape gate that mirrors the honest prover (`prove_aggregate_sum_on_range` short-circuit always emits `lower_layers: BTreeMap::new()` at the leaf, and each path-prefix wrapper inserts exactly one entry): - At `depth == path_keys.len()` (leaf merk): require `lower_layers.is_empty()`. - At non-leaf depths: require `lower_layers.len() == 1` and the sole key to equal the expected descent key `path_keys[depth]`. Two new regression tests (`sum_v1_envelope_with_extra_lower_layer_*` and `sum_v1_envelope_with_lower_layers_under_leaf_*`) construct byte-modified envelopes that the gate rejects. **Finding 4 (nitpick, addressed)** — the existing empty-leaf type-confusion tests build V0 envelopes and now hit the V0-rejection gate before the terminal-type gate in `enforce_lower_chain` runs. Added `empty_leaf_type_confusion_forgery_rejected_under_v1_envelope` which builds the same forgery under a V1 `LayerProof` envelope so the terminal-type gate fires directly. The test asserts the specific "must be a ProvableSumTree" error from `enforce_lower_chain` so future refactors that drop the gate are caught. **Finding 2 (Major, skipped with reason)** — CodeRabbit asks to make `hash_for_link` fail-closed (panic/Err) when a `ProvableSumTree` node's `aggregate_data()` doesn't return `ProvableSum`. The current fallback to `self.hash()` is identical across all three `Provable*` variants (`ProvableCountTree`, `ProvableCountSumTree`, `ProvableSumTree`) and also appears in commit-time dispatch (lines 1233-1304). Fixing only the sum arm creates asymmetry with the count side; the broader refactor (plus the matching dispatch-centralization in Finding 3) is out of scope for this sum-feature PR and is documented in MEMORY M1 as intentional. **Finding 3 (nitpick, skipped with reason)** — the duplicated `AggregateData → hash` dispatch in `merk/src/tree/mod.rs` predates this PR and applies to all three `Provable*` variants. Centralizing it would touch hot proof-emission paths; out of scope here. Also updated `sum_v1_envelope_with_missing_lower_layer_is_rejected` to accept the new strict-shape error message — removing an entry now trips the shape gate first instead of the older missing-layer arm. Both messages pin the same property. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… wrong-key
After splitting the strict-shape gate in \`verify_v1_leaf_chain\` into
two reachable error arms (entry-count vs. entry-key), the
ok_or_else-arm on the subsequent \`lower_layers.get(&next_key)\` became
naturally reachable for the wrong-key case (single entry but under
an unexpected key). Previously the combined-OR gate made both error
sites mutually exclusive, leaving the get-arm dead.
Coverage impact (aggregate_sum/ subdir, local tarpaulin):
- leaf_chain.rs: 36/41 (88%) -> 44/44 (100%)
- Subdir total: ~80% -> 89.94%
New tests in aggregate_sum_query_tests:
- sum_v1_envelope_with_wrong_keyed_lower_layer_is_rejected — single
\`lower_layers\` entry under \"impostor\" instead of \"st\". Exercises
the key-shape gate distinctly from the count-shape gate.
- sum_proof_with_trailing_bytes_is_rejected — mirror of
\`aggregate_count_proof_with_trailing_bytes_is_rejected\`. Pins the
canonical-decode invariant in \`decode_grovedb_proof\`.
Tightened assertion in
\`sum_v1_envelope_with_extra_lower_layer_is_rejected\` to match the new
\"lower-layer entries at depth N\" error string, and broadened the
assertion in \`sum_v1_envelope_with_missing_lower_layer_is_rejected\`
to accept either the old missing-layer message or the new
entry-count message (removing the only entry trips the count gate
first).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d.rs
The same `decode_grovedb_proof` function lived in both
`aggregate_count/helpers.rs` and `aggregate_sum/helpers.rs`, differing
only in whether the error string named "aggregate-count" or
"aggregate-sum" as the offending shape. Two copies of the same
canonical-decode contract is a maintenance hazard — drift between the
two would mean one aggregate path could (e.g.) accept trailing bytes
that the other rejects, breaking the equality-by-bytes assumption the
contract is meant to guarantee.
Hoist the function to `operations/proof/mod.rs` as
`decode_grovedb_proof_canonical` with `pub(super)` visibility. Both
helper modules now call `super::decode_grovedb_proof_canonical`. The
error message generalizes from "aggregate-{count,sum} proof has N
trailing bytes" to "proof has N trailing bytes" since the call site
provides the surrounding context; the existing
`*_proof_with_trailing_bytes_is_rejected` tests assert
`msg.contains("trailing bytes")` and remain green.
No behavior change beyond the wording adjustment in the trailing-
bytes error. Tests: workspace 3088 / 0 fail; aggregate tests
223 / 0 fail (113 sum + 75 count + 35 others).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrate the four `bincode::decode_from_slice(...)?.0` sites in
`operations/proof/verify.rs` (`verify_query_with_options`,
`verify_query_get_parent_tree_info_with_options`, `verify_query_raw`,
`verify_trunk_chunk_proof`) to call
`super::decode_grovedb_proof_canonical(proof)?` — the same canonical
decoder the aggregate-count and aggregate-sum entry points already use.
Before this change, the general verifier silently accepted trailing
bytes after the encoded `GroveDBProof` envelope, because the `.0`
discarded the bincode `consumed` count. The aggregate-count and
aggregate-sum entry points rejected trailing bytes via their own
private decoder. That asymmetry meant the same logical proof could
have many distinct byte encodings through the general-verify surface
but only one through aggregate-verify — a malleability gap that
breaks any equality-by-bytes / caching / dedup assumption a consumer
might rely on. The cryptographic chain still bound the answer, so
this wasn't a soundness break.
The general-verify path now matches: trailing bytes are rejected with
`Error::CorruptedData("proof has N trailing bytes after the encoded
envelope")`.
Added `verify_query_rejects_proof_with_trailing_bytes` in
`proof_advanced_tests.rs` to lock the new behavior in place, mirror
of `sum_proof_with_trailing_bytes_is_rejected` and
`aggregate_count_proof_with_trailing_bytes_is_rejected`.
Tests: workspace 3089 / 0 fail (3088 + 1 new regression test).
No honest test/proof emitted trailing bytes, so no existing test
needed updating.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue being fixed or feature implemented
GroveDB had a symmetry gap in its aggregate trees:
This PR adds
Element::ProvableSumTree— the missing parallel toElement::ProvableCountTree. It bakes the per-node aggregate sum into the node hash vianode_hash_with_sum(kv, l, r, sum), making aggregate-sum range queries cryptographically verifiable in O(log n + |boundary|) proof size.The marquee user-visible feature is
AggregateSumOnRange— ask "what is the total sum of children with keys in [a, b]?" against a ProvableSumTree, with a verifiable proof that a peer can check from the root hash alone, without ever seeing the underlyingSumItemvalues.ProvableBigSumTree(i128 variant) is intentionally out of scope — documented as a follow-up.What was done?
Seven commits, each a clean Phase boundary:
Element::ProvableSumTree,ElementType,TreeType::ProvableSumTree,TreeFeatureType::ProvableSummedMerkNode,NodeType::ProvableSumNode,zero_sum()helpernode_hash_with_sum,AggregateData::ProvableSum, new proof Node variantsKVSum/KVHashSum/KVRefValueHashSum/KVDigestSum/HashWithSumat wire tags0x30..=0x3D0xb0prefix with explicit per-variant mapping rather than theprefix | baseformula.NotSummedProvableSumTree = 177hand-assigned.verify_grovedbaggregate-consistency check across all 7 aggregate-bearing tree variantsQueryItem::AggregateSumOnRange+ prove/verify machinery with i128 accumulator and i64-narrowing overflow checkaggregate-sum-on-range-queries.mdchapter plus updates toelement-system,hashing,appendix-a, andaggregate-sum-queries(disambiguation banner)Discriminants (current state)
Element::ProvableSumTree→ bincode disc 17ElementType::ProvableSumTree→ 17ElementType::NonCountedProvableSumTree→ 145 (0x80 | 17)ElementType::NotSummedProvableSumTree→ 177 (explicit slot, NOT0xb0 | 17— that would collide back toReferenceunder the legacy mask)TreeType::ProvableSumTree→ 11Note on PR #657 (CountIndexedTree). This PR lands FIRST; PR #657 will need to rebase against develop once this merges and shift its discriminants up by 1:
CountIndexedTree17 → 18 andProvableCountIndexedTree18 → 19, with the NonCounted twins shifting 145/146 → 146/147 accordingly. The renumber is mechanical, same pattern as the historical 2498b97f merge resolution — just in the opposite direction this time.NotSummed twin encoding change
The existing
NotSummedtwin family used a0xb0 | baseformula with a 4-bit base mask0x0F. ProvableSumTree at base 17 doesn't fit (mask collapses 17 → 1 = Reference). Rather than widen the mask (which would rebase every existing discriminant), this PR keeps the family range at0xB0..=0xBFand switches to explicit per-variant slot assignment. Existing twins keep their values (180, 181, 183, 186); ProvableSumTree gets the new explicit slot 177.Signed-sum correctness
Two arithmetic differences from the count machinery, documented inline:
if sum == 0 → short-circuit. Count can safely skip a 0-count subtree; sum can be 0 with non-zero children (e.g., +5 and -5). The prover descends regardless.i128end-to-end and narrow once at the outermost call viai64::try_from.i64::MAX + i64::MAXcorrectly returnsInvalidProofError;i64::MAX + i64::MIN = -1correctly fits.How Has This Been Tested?
cargo test --workspace --all-features: 2938 passed, 0 failed (40 new tests across phases 3/4/5).New test files / modules:
grovedb/src/tests/provable_sum_tree_tests.rs— 10 round-trip + wrapper + nested-propagation tests +integrity_walk_testsmodule (7 tampering tests for Phase 4)grovedb/src/tests/aggregate_sum_query_tests.rs— 21 end-to-end query tests including negative sums, i64::MAX overflow rejection, tampering detection, NotSummed exclusion, validation rejection pathsmerk/src/proofs/query/aggregate_sum.rstests — 14 Merk-level prover/verifier internalsMandatory test coverage from the original spec:
prove_aggregate_sum_queryround-tripInvalidProofError)verify_grovedbcatches deliberately-corrupted ProvableSumTree nodeElement::ProvableSumTreebyte-identical serialization round-tripNotSummed(ProvableSumTree)constructible + serializable + correctly suppresses parent sumcargo clippy --workspace --all-features --no-deps: zero warnings.cargo fmt --all -- --check: clean.Breaking Changes
Pre-shipping V1 only. The NotSummed twin discriminants did NOT change in this PR (they were temporarily widened to
0xA0..=0xBFduring one work-in-progress phase, then reverted in commit 46466d1c — existing twin values 180/181/183/186 are unchanged from before this PR).The only new on-disk byte ranges are V1-only additions:
ProvableSumTree)0x30..=0x3D(V1 proofs only)No V0 wire format changes.
Checklist
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
ProvableSumTreeelement type for cryptographic commitment of aggregate sums in node hashesAggregateSumOnRangequeries to prove and verify signed aggregate sums over key ranges without revealing individual valuesDocumentation
Tests