Skip to content

feat: add Element::ProvableSumTree + AggregateSumOnRange query#661

Open
QuantumExplorer wants to merge 37 commits into
developfrom
claude/distracted-margulis-99463b
Open

feat: add Element::ProvableSumTree + AggregateSumOnRange query#661
QuantumExplorer wants to merge 37 commits into
developfrom
claude/distracted-margulis-99463b

Conversation

@QuantumExplorer
Copy link
Copy Markdown
Member

@QuantumExplorer QuantumExplorer commented May 11, 2026

Issue being fixed or feature implemented

GroveDB had a symmetry gap in its aggregate trees:

Variant Aggregate In node hash?
SumTree sum (i64)
CountTree count (u64)
ProvableCountTree count (u64)
ProvableSumTree (new) sum (i64)
BigSumTree sum (i128)
CountSumTree count + sum ❌ (count only on ProvableCountSumTree)

This PR adds Element::ProvableSumTree — the missing parallel to Element::ProvableCountTree. It bakes the per-node aggregate sum into the node hash via node_hash_with_sum(kv, l, r, sum), making aggregate-sum range queries cryptographically verifiable in O(log n + |boundary|) proof size.

The marquee user-visible feature is AggregateSumOnRange — ask "what is the total sum of children with keys in [a, b]?" against a ProvableSumTree, with a verifiable proof that a peer can check from the root hash alone, without ever seeing the underlying SumItem values.

ProvableBigSumTree (i128 variant) is intentionally out of scope — documented as a follow-up.

What was done?

Seven commits, each a clean Phase boundary:

# Commit Phase
1 c95cf749 Types: Element::ProvableSumTree, ElementType, TreeType::ProvableSumTree, TreeFeatureType::ProvableSummedMerkNode, NodeType::ProvableSumNode, zero_sum() helper
2 3364f08c Merk: node_hash_with_sum, AggregateData::ProvableSum, new proof Node variants KVSum/KVHashSum/KVRefValueHashSum/KVDigestSum/HashWithSum at wire tags 0x30..=0x3D
3 46466d1c Refactor: NotSummed twin reverted to 0xb0 prefix with explicit per-variant mapping rather than the prefix | base formula. NotSummedProvableSumTree = 177 hand-assigned.
4 73697a8b Wire ProvableSumTree through insert/read/batch paths
5 40b3c168 verify_grovedb aggregate-consistency check across all 7 aggregate-bearing tree variants
6 9c7d5e18 QueryItem::AggregateSumOnRange + prove/verify machinery with i128 accumulator and i64-narrowing overflow check
7 e49fa100 Documentation: new aggregate-sum-on-range-queries.md chapter plus updates to element-system, hashing, appendix-a, and aggregate-sum-queries (disambiguation banner)

Discriminants (current state)

  • Element::ProvableSumTree → bincode disc 17
  • ElementType::ProvableSumTree17
  • ElementType::NonCountedProvableSumTree145 (0x80 | 17)
  • ElementType::NotSummedProvableSumTree177 (explicit slot, NOT 0xb0 | 17 — that would collide back to Reference under the legacy mask)
  • TreeType::ProvableSumTree11

Note on PR #657 (CountIndexedTree). This PR lands FIRST; PR #657 will need to rebase against develop once this merges and shift its discriminants up by 1: CountIndexedTree 17 → 18 and ProvableCountIndexedTree 18 → 19, with the NonCounted twins shifting 145/146 → 146/147 accordingly. The renumber is mechanical, same pattern as the historical 2498b97f merge resolution — just in the opposite direction this time.

NotSummed twin encoding change

The existing NotSummed twin family used a 0xb0 | base formula with a 4-bit base mask 0x0F. ProvableSumTree at base 17 doesn't fit (mask collapses 17 → 1 = Reference). Rather than widen the mask (which would rebase every existing discriminant), this PR keeps the family range at 0xB0..=0xBF and switches to explicit per-variant slot assignment. Existing twins keep their values (180, 181, 183, 186); ProvableSumTree gets the new explicit slot 177.

Signed-sum correctness

Two arithmetic differences from the count machinery, documented inline:

  • No if sum == 0 → short-circuit. Count can safely skip a 0-count subtree; sum can be 0 with non-zero children (e.g., +5 and -5). The prover descends regardless.
  • i128 accumulator with i64 narrowing at entry points. The prover/verifier carry i128 end-to-end and narrow once at the outermost call via i64::try_from. i64::MAX + i64::MAX correctly returns InvalidProofError; i64::MAX + i64::MIN = -1 correctly fits.

How Has This Been Tested?

cargo test --workspace --all-features: 2938 passed, 0 failed (40 new tests across phases 3/4/5).

New test files / modules:

  • grovedb/src/tests/provable_sum_tree_tests.rs — 10 round-trip + wrapper + nested-propagation tests + integrity_walk_tests module (7 tampering tests for Phase 4)
  • grovedb/src/tests/aggregate_sum_query_tests.rs — 21 end-to-end query tests including negative sums, i64::MAX overflow rejection, tampering detection, NotSummed exclusion, validation rejection paths
  • merk/src/proofs/query/aggregate_sum.rs tests — 14 Merk-level prover/verifier internals

Mandatory test coverage from the original spec:

  • ✅ Direct insert + read round-trips with positive, negative, and i64::MIN/i64::MAX sums
  • ✅ Aggregation up the tree (nested ProvableSumTree)
  • prove_aggregate_sum_query round-trip
  • ✅ Mixed positive/negative sums net correctly
  • ✅ Overflow path (two i64::MAX children → InvalidProofError)
  • verify_grovedb catches deliberately-corrupted ProvableSumTree node
  • ✅ Existing SumTree tests still pass
  • Element::ProvableSumTree byte-identical serialization round-trip
  • NotSummed(ProvableSumTree) constructible + serializable + correctly suppresses parent sum

cargo clippy --workspace --all-features --no-deps: zero warnings.
cargo fmt --all -- --check: clean.

Breaking Changes

Pre-shipping V1 only. The NotSummed twin discriminants did NOT change in this PR (they were temporarily widened to 0xA0..=0xBF during one work-in-progress phase, then reverted in commit 46466d1c — existing twin values 180/181/183/186 are unchanged from before this PR).

The only new on-disk byte ranges are V1-only additions:

  • Base discriminant 17 (ProvableSumTree)
  • NonCounted twin 145
  • NotSummed twin 177
  • Proof Node wire tags 0x30..=0x3D (V1 proofs only)

No V0 wire format changes.

Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas (twin scheme rationale, i128 narrowing, NotSummed wrapper semantics)
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have made corresponding changes to the documentation

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added ProvableSumTree element type for cryptographic commitment of aggregate sums in node hashes
    • Introduced AggregateSumOnRange queries to prove and verify signed aggregate sums over key ranges without revealing individual values
    • Added sum-bearing proof nodes with deterministic encoding for verifiable range aggregation
  • Documentation

    • Added guides for aggregate sum range queries, sum hashing semantics, and element system integration
  • Tests

    • Added comprehensive test suites covering sum proof generation, verification, tampering detection, and cross-layer path validation

Review Change Stack

QuantumExplorer and others added 7 commits May 11, 2026 18:09
…tension

Phase 1 of the ProvableSumTree feature — the missing parallel to
ProvableCountTree that bakes the per-node sum into the node hash, making
aggregate-sum range queries cryptographically verifiable. This commit adds
the types-only foundation (no hash divergence yet — Phase 2 will introduce
node_hash_with_sum and the new proof Node variants).

DISCRIMINANTS

- Element::ProvableSumTree at variant index 17 / bincode discriminant 17
  (next free after the NotSummed wrapper byte at 16). This will renumber
  to 19 when PR #657 (CountIndexedTree) lands and reclaims 17/18.
- NonCountedProvableSumTree = 0x80 | 17 = 145.
- The NonCounted twin range widened from 0x80..=0x8F (4-bit base) to
  0x80..=0x9F (5-bit base) — is_non_counted() now checks the top 3 bits
  (& 0xe0 == 0x80) instead of the top 4. Existing twins 128..=142 stay put.
- The NotSummed twin scheme rebases analogously: prefix 0xb0 -> 0xa0,
  base mask 0x0F -> 0x1F, family range 0xA0..=0xBF. Existing twins move:
  NotSummedSumTree                180 -> 164
  NotSummedBigSumTree             181 -> 165
  NotSummedCountSumTree           183 -> 167
  NotSummedProvableCountSumTree   186 -> 170
  Plus the new NotSummedProvableSumTree = 0xa0 | 17 = 177.
  Safe because V1 is pre-shipping. is_not_summed() now uses & 0xe0 == 0xa0.

NEW APIS

- ElementType::ProvableSumTree, ElementType::NonCountedProvableSumTree,
  ElementType::NotSummedProvableSumTree.
- TreeType::ProvableSumTree (discriminant 11, is_sum_bearing = true,
  allows_sum_item = true, inner_node_type = ProvableSumNode).
- NodeType::ProvableSumNode and TreeFeatureType::ProvableSummedMerkNode(i64)
  with encode tag byte 7 and a parallel zero_sum() helper alongside
  zero_count().
- Element::new_provable_sum_tree*, empty_provable_sum_tree*, plus helpers
  (as_provable_sum_tree_value, is_provable_sum_tree).
- Element::NotSummed now accepts ProvableSumTree as a sum-tree inner type
  (constructor, serialize, deserialize).

PROOF DISPATCH

ProvableSumTree joins the "provable aggregate parent" family alongside
ProvableCountTree / ProvableCountSumTree in ElementType::proof_node_type:
subtree children use KvValueHashFeatureType and item children use KvCount.

PHASE-1 SCOPE BOUNDARIES

ProvableSumTree behaves identically to SumTree for storage, aggregation,
and hashing in Phase 1. The divergent node_hash_with_sum and the new
proof Node variants (KVSum, KVHashSum, etc.) land in Phase 2.
TreeFeatureType::ProvableSummedMerkNode maps to AggregateData::Sum at the
Element/aggregate level for now; Phase 2 may introduce a dedicated variant
once the hash diverges.

Workspace cargo test --all-features green (1497+ tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 2 of the ProvableSumTree feature — bakes the per-node sum into the
node hash so it becomes cryptographically committed via the parent's
hash chain, parallel to how `node_hash_with_count` commits the count
for `ProvableCountTree`. After this commit a `ProvableSumTree` with the
same {key/value/sum} contents as a plain `SumTree` produces a different
root hash, which is the whole point of the Phase 2 divergence.

Phase 1 (commit c95cf74) was types-only; aggregation, storage, and
hashing all used the SumTree code paths. Phase 2 introduces the new
hash function, the new proof-node variants needed to transport sums
through proofs, and the dispatch wiring on both prover and verifier
sides. Phases 3 (insert/read), 4 (verify_grovedb walk), and 5
(AggregateSumOnRange) remain.

HASH DISPATCH

- `merk::tree::hash::node_hash_with_sum(kv, l, r, i64)` mirrors
  `node_hash_with_count` byte-for-byte except the appended 8-byte
  field is `i64::to_be_bytes()`. Negative sums hash via their
  two's-complement BE form, which is platform-independent.
- New `AggregateData::ProvableSum(i64)` variant. The
  `From<TreeFeatureType>` conversion now maps
  `ProvableSummedMerkNode(v) -> ProvableSum(v)` (was `Sum(v)` in
  Phase 1) so `Tree::hash_for_link` and the commit path can
  dispatch through the new arm.
- `Tree::hash_for_link(TreeType::ProvableSumTree)` and both commit
  paths (left/right Link::Modified arms) now call
  `node_hash_with_sum` when the aggregate is `ProvableSum`.
  `Tree::aggregate_data` for `ProvableSummedMerkNode` yields
  `ProvableSum` instead of `Sum`.
- Helper updates: `child_aggregate_sum_data_as_i64` /
  `child_aggregate_sum_data_as_i128` treat `ProvableSum`
  identically to `Sum`; `child_aggregate_count_data_as_u64`
  returns 0. `child_ref_and_sum_size` covers the new variant.
- `Link::encode_into` / `decode_into` learn tag byte 7 for
  `AggregateData::ProvableSum` (parallel to the existing
  `ProvableSummedMerkNode` tag byte 7 in `TreeFeatureType`).
- `grovedb::batch` `InsertTreeWithRootHash` now reconstructs an
  `Element::ProvableSumTree` when seeing `AggregateData::ProvableSum`.

PROOF NODE VARIANTS

Five new `Node` enum variants in `grovedb-query/src/proofs/mod.rs`,
mirroring the Count family member-for-member but with `i64` sums:

- `KVSum(key, value, sum)`             — sum analogue of `KVCount`
- `KVHashSum(kv_hash, sum)`            — analogue of `KVHashCount`
- `KVRefValueHashSum(key, ref_value, ref_elem_hash, sum)`
- `KVDigestSum(key, value_hash, sum)`  — analogue of `KVDigestCount`
- `HashWithSum(kv_hash, l, r, sum)`    — analogue of `HashWithCount`

`merk::proofs::tree::Tree::hash()` now dispatches each new variant
through `node_hash_with_sum`. `KVValueHashFeatureType` /
`...WithChildHash` handling gains a `ProvableSummedMerkNode` arm so
proof-tree hashes recomputed from a Sum-bearing feature_type match
the Merk-tree side. `aggregate_data()` returns `ProvableSum(sum)`
for `KVSum` and `HashWithSum`; `key()` lists the three key-bearing
new variants alongside their Count counterparts.

`grovedb-element::ProofNodeType` gains `KvSum` and
`KvRefValueHashSum`; `ElementType::proof_node_type` now picks them
when the parent is `ProvableSumTree` (Phase 1 routed Sum-tree
children through the Count dispatch). Subtrees inside ProvableSum
still use `KvValueHashFeatureType` since the feature_type carries
the sum.

Proof generation in `merk/src/proofs/query/mod.rs` adds
`to_kv_sum_node`, `to_kvhash_sum_node`, `to_kvdigest_sum_node`
(parallel to the Count helpers) and an `is_provable_sum_tree`
branch that emits Sum-bearing variants. `chunks.rs`'s
`create_proof_node_for_chunk` dispatches the new ProofNodeType
arms.

GroveDB-side reference post-processing in
`grovedb/src/operations/proof/generate.rs` rewrites the merk-level
`KVValueHashFeatureType(_, _, _, ProvableSummedMerkNode(sum))` to
`KVRefValueHashSum`, mirroring the existing
`KVValueHashFeatureType -> KVRefValueHashCount` path. Both
ref-rewriting loops in that file are updated.

The regular query verifier in `merk/src/proofs/query/verify.rs`
rejects `HashWithSum` at non-aggregate positions (fail-fast,
matching the existing `HashWithCount` guard). `KVSum`,
`KVDigestSum`, and `KVRefValueHashSum` are dispatched via
`execute_node`. `KVHashSum` joins `KVHash` / `KVHashCount` in the
"non-data-bearing on path" branch and in the absence-proof
boundary set.

WIRE FORMAT

Tag bytes 0x30..=0x3D in the previously-unused 0x30..0x3F range:

  Push variants (V0 short + V1 wrapper for KV-style large values):
    0x30 = KVSum (small),     0x31 = KVSum (large)
    0x32 = KVHashSum
    0x33 = KVRefValueHashSum (small), 0x34 = KVRefValueHashSum (large)
    0x35 = KVDigestSum
    0x36 = HashWithSum
  PushInverted parallel:
    0x37 = KVSum (small),     0x38 = KVSum (large)
    0x39 = KVHashSum
    0x3a = KVRefValueHashSum (small), 0x3b = KVRefValueHashSum (large)
    0x3c = KVDigestSum
    0x3d = HashWithSum
  0x3e and 0x3f are intentionally reserved.

The on-wire i64 sum uses varint (via `ed::Encode for i64`) for
compactness, matching the Count family. The hash recomputation in
`node_hash_with_sum` uses the fixed 8-byte big-endian form
independently — wire encoding and hash input are deliberately
decoupled. `encoding_length()` and `Decode` arms parallel the Count
family verbatim.

V0 wire format is unchanged. All new tags are V1-only.

TESTS

- `merk::tree::hash` (4): `node_hash_with_sum` differs from
  `node_hash` even at sum=0; different sums give different hashes;
  `i64::MIN` / `i64::MAX` are distinct; determinism.
- `merk::tree` (2): a `ProvableSummedMerkNode` tree aggregates to
  `ProvableSum`, `hash_for_link(ProvableSumTree)` matches
  `node_hash_with_sum(...)` and diverges from plain
  `Tree::hash()`; mutating a node sum changes the root hash.
- `merk::proofs::tree` (4): forged sums on `HashWithSum`,
  `KVSum`, `KVHashSum` change the recomputed node hash;
  Phase 1 -> Phase 2 cornerstone — same {key/value/sum} contents
  give a different ProvableSumTree hash than a plain SumTree.
- `grovedb-query::proofs::encoding` (4): round-trip every new
  variant through `Op::Push` and `Op::PushInverted` at sum
  values {`i64::MIN`, -42, -1, 0, 1, 42, `i64::MAX`}; tag-byte
  sanity check for all 10 new tags.
- `merk::tree::tree_feature_type`: extended every existing
  `AggregateData` test to cover the new `ProvableSum` variant.

Workspace `cargo test --all-features` green: 2881 tests passing,
zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1.6 — revert the Phase 1.5 mask widening. The NotSummed family
returns to its original 4-bit prefix `0xb0` / range `0xB0..=0xBF`. The
five legal sum-tree inner types are mapped to twin slots explicitly,
1-at-a-time, instead of via the `prefix | base` bitwise formula:

  SumTree (base 4)               -> 180  (0xB4)
  BigSumTree (base 5)            -> 181  (0xB5)
  CountSumTree (base 7)          -> 183  (0xB7)
  ProvableCountSumTree (base 10) -> 186  (0xBA)
  ProvableSumTree (base 17)      -> 177  (0xB1)

Existing twins return to their original discriminant values; only the
new ProvableSumTree slot is freshly assigned. Pre-shipping V1, so this
discriminant churn is fine.

WHY EXPLICIT MAPPING

`prefix | inner_byte` can only generate twin slots when the inner
discriminant fits in the prefix's complement nibble. For ProvableSumTree
at base 17, the formula `0xb0 | 17` would produce `0xB1` AND then
`disc & 0x0F` would invert it back to base 1 (Reference) — a collision.
Widening the mask to 5 bits in Phase 1.5 rebased every existing twin
discriminant; reverting to per-variant mapping keeps the historical
values stable while still allowing arbitrary new slot assignments.

CONSEQUENCES

- `NOT_SUMMED_TWIN_PREFIX` stays as a const but is now only a family-range
  marker, never composed with a base byte.
- `NOT_SUMMED_BASE_MASK` removed — no remaining callers.
- `is_not_summed()` back to `& 0xf0 == 0xb0`.
- `base()` for NotSummed now uses an explicit per-variant match.
- `from_serialized_value` NotSummed branch uses an explicit
  `inner_byte → twin_variant` match.

NonCounted is unaffected — it still uses the bitwise formula because all
its bases fit cleanly in the low 5 bits under `0x80`.

Workspace cargo test --all-features green (2881 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 of the ProvableSumTree feature — wires the variant through the
extension traits, cost calculator, reconstruction helper, batch
propagation, and read-path subtree validation so that direct insertion,
nested aggregation, and child-sum mutation behave correctly end-to-end.

Phase 1 (commit c95cf74) added the variant and its twins; Phase 2
(commit 3364f08) introduced node_hash_with_sum and the proof Node
family. The "behave like SumTree" fallback Phase 1 leaned on covered
most surfaces but several dispatch sites guarded subsequent operations
through explicit per-variant match arms — those sites would silently
drop ProvableSumTree or fail to traverse into it. Phase 3 fills each
in deliberately.

EXTENSION TRAITS — merk/src/element/tree_type.rs

ElementTreeTypeExtensions had six trait methods that enumerated tree
variants explicitly: root_key_and_tree_type_owned, root_key_and_tree_type,
tree_flags_and_type, tree_type, maybe_tree_type, tree_feature_type. Each
was missing its ProvableSumTree arm — get_feature_type was the only one
already wired (Phase 1). Adding the missing arms unblocks callers across
get, batch, and visualize that thread tree types through these helpers
to decide layout, hashing, and aggregate-data extraction.
tree_feature_type now maps ProvableSumTree -> ProvableSummedMerkNode(sum)
explicitly, matching the parallel ProvableCountedMerkNode wiring.

COST CALCULATOR — merk/src/element/costs.rs

get_specialized_cost, the layered_value_byte_cost path in
specialized_costs_for_key_value, the layered_value_defined_cost type
filter, and the value_defined_cost dispatch all enumerated the eight
Merk-tree variants and would have either returned None or mis-sized a
ProvableSumTree element. Added explicit ProvableSumTree => SUM_TREE_COST_SIZE
arms (parity with SumTree as established in Phase 1) and the matching
LayeredValueDefinedCost branch.

RECONSTRUCTION — merk/src/element/reconstruct.rs

ElementReconstructExtensions::reconstruct_with_root_key, used by batch
propagation to rebuild a tree element after a root-key update, returned
None for ProvableSumTree. Added the arm that pulls
aggregate_data.as_sum_i64() into Element::ProvableSumTree(root, sum,
flags); without this, batch operations that mutated a ProvableSumTree
subtree would lose their tree element entirely during the parent's
upward propagation. as_sum_i64 already handles the AggregateData::ProvableSum
case (Phase 2).

BATCH PROPAGATION — grovedb/src/batch/mod.rs

The InsertTreeWithRootHash else-if chain that transcribes a Merk-tree
mutation into the appropriate root-hash-bearing operation enumerated
each tree-element variant explicitly. ProvableSumTree was missing,
so a batch that mutated a ProvableSumTree subtree would fall through
to the CommitmentTree arm — wrong shape entirely. Mirrored the
ProvableCountSumTree arm directly. The accompanying tree-cost match
list above (used by the apply_batch storage-cost callback) was also
missing the variant.

READ-PATH SUBTREE VALIDATION — grovedb/src/operations/get/mod.rs

check_subtree_exists rejected paths whose final segment resolved to a
ProvableSumTree because the variant wasn't in its accepted-tree
match list. This would have broken every query that traversed INTO a
ProvableSumTree.

TESTS — grovedb/src/tests/provable_sum_tree_tests.rs

Ten tests covering Phase 3's externally-observable surface:

- Round-trip insert/read with aggregate-sum tracking.
- Aggregation across mixed positive/negative/zero values + i64::MIN/
  i64::MAX extremes.
- Root-hash divergence vs a plain SumTree with identical children
  (the Phase 2 cornerstone, verified end-to-end via
  open_transactional_merk_at_path).
- Nested ProvableSumTree[A] -> ProvableSumTree[B] aggregate propagation;
  mutation of B's children shifts the grovedb root hash.
- Wrapper interactions: NonCounted(ProvableSumTree) contributes 0 to a
  CountTree parent; NotSummed(ProvableSumTree) contributes 0 to a
  SumTree parent; the wrapped tree's own aggregate is preserved
  (verified via get_raw, which retains the wrapper byte).
- Deleting a SumItem child shifts the ProvableSumTree root hash because
  the aggregate sum is hash-bound.
- Direct insert of a ProvableSumTree built from an existing template.

The wrapper round-trip tests use db.get_raw rather than db.get because
db.get strips wrappers via into_underlying — by design.

Workspace cargo test --all-features green: 2891 tests passing (was
2881 in Phase 2 + 10 new), zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 4 of the ProvableSumTree feature. The existing
`verify_merk_and_submerks_in_transaction` walk is cryptographically
complete — `combine_hash(value_hash(parent_bytes), inner_merk_root) ==
stored_element_value_hash` catches any byte-level tampering, and for
ProvableSumTree the inner aggregate is bound into the inner Merk's
root_hash via `node_hash_with_sum` (Phase 2). What that walk did not
catch was the *software-consistency* class of drift: a parent
`ProvableSumTree(_, N, _)` whose stored sum field N disagrees with the
inner Merk's actual `aggregate_data()` value M. For provable variants
both N and M are bound into element_value_hash, but they live on disk
independently and could disagree if Phase 3's propagation logic drifts.
For non-provable variants (SumTree, BigSumTree, CountTree, CountSumTree)
the recorded aggregate isn't hash-bound at all, so a pure software bug
in propagation would silently corrupt the tree.

LIB.RS — verify_merk_and_submerks_in_transaction

After the existing cryptographic check, for any tree element whose
inner Merk holds actual data (i.e. excluding the non-Merk-data trees
CommitmentTree/MmrTree/BulkAppendTree/DenseTree, which already
short-circuit via `uses_non_merk_data_storage`), the verifier now opens
the inner Merk, reads its `aggregate_data()`, and compares against the
parent's recorded aggregate field via a new free helper
`aggregate_consistency_labels`. The helper covers all seven aggregate-
bearing tree variants:

  - SumTree                vs. AggregateData::Sum
  - ProvableSumTree        vs. AggregateData::ProvableSum
  - BigSumTree             vs. AggregateData::BigSum
  - CountTree              vs. AggregateData::Count
  - CountSumTree           vs. AggregateData::CountAndSum
  - ProvableCountTree      vs. AggregateData::ProvableCount
  - ProvableCountSumTree   vs. AggregateData::ProvableCountAndSum

Plus an empty-Merk identity case (NoAggregateData with zero recorded
aggregate matches), and a fallback that reports any variant-shape
mismatch (e.g. ProvableSumTree paired with AggregateData::Count(_)).

VERIFICATIONISSUES SHAPE — placeholder hashes, not type extension

`VerificationIssues` is a private type alias
`HashMap<Vec<Vec<u8>>, (CryptoHash, CryptoHash, CryptoHash)>` whose
shape is consumed by `visualize_verify_grovedb`. To avoid breaking
its callers and the visualize hex output, mismatched aggregates are
packed into deterministic placeholder CryptoHashes via
`blake3(format!("recorded ..."))` and `blake3(format!("inner ..."))`,
slotted into the "expected" and "actual" fields. The "root" slot
reuses the inner-Merk root_hash for path locality. Documented inline.

INTEGRITY WALK TESTS — 7 new tests

A new `integrity_walk_tests` module in `provable_sum_tree_tests.rs`
exercises the verifier end-to-end via two raw-storage tampering
helpers:

  - `tamper_value_no_hash_update` decodes the on-disk TreeNode for a
    leaf, replaces only its element bytes, re-encodes (leaving the
    stored value_hash stale), writes back via the immediate storage
    context. Simulates byte-level tampering caught by the SumItem
    arm's `value_hash(bytes) != stored_value_hash` check.

  - `tamper_parent_element_with_consistent_hashes` splices in fresh
    element bytes AND recomputes hash + value_hash to remain
    crypto-consistent with the inner Merk's existing root_hash. Used
    for aggregate-mismatch scenarios — the crypto check passes,
    but the new aggregate-consistency check fires. Offsets into the
    on-disk TreeNodeInner encoding are derived from the decoded
    `value_as_slice().len()`.

Scenarios covered:

  1. Inner SumItem value tamper (different bytes) — crypto check
     catches it.
  2. Inner SumItem same-length value tamper — crypto check catches
     it (assert: hashes, not lengths, are what's verified).
  3. Parent ProvableSumTree aggregate mismatch (sum=999 stored vs.
     40 actual) — new aggregate-consistency check fires.
  4. Clean ProvableSumTree verifies clean (with mixed positive,
     negative, zero, and large values).
  5. Clean ProvableCountTree verifies clean.
  6. Parent ProvableCountTree aggregate mismatch (count=9999 vs. 3)
     — sanity check that the generalized helper handles the count
     variant too.
  7. Reload-after-write determinism: insert, drop the db handle,
     reopen, verify_grovedb reports zero issues; the parent's
     ProvableSumTree.sum_value field round-trips.

Workspace cargo test --all-features green: 2898 passing
(Phase 3 baseline of 2891 + 7 new), zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the marquee Phase 5 feature for ProvableSumTree: a query that asks
"what's the cryptographically-verifiable signed sum of children with keys
in range [a, b]?" against a ProvableSumTree, with proof size O(log n + |boundary|)
and a verify path that returns the root hash plus the aggregate i64 sum.

Mirrors AggregateCountOnRange line-for-line:
- QueryItem::AggregateSumOnRange(Box<QueryItem>) variant (wire tag 11)
- Query / SizedQuery / PathQuery::validate_aggregate_sum_on_range with the
  same nested-rejection, no-subquery, no-pagination, allowed-inner-range rules
- merk/src/proofs/query/aggregate_sum.rs (~760 lines) implementing
  create_aggregate_sum_on_range_proof + verify_aggregate_sum_on_range_proof
  with the same Disjoint/Contained/Boundary classification, HashWithSum
  self-verifying compression at fully-inside/outside subtrees, and
  KVDigestSum at boundaries
- grovedb/src/operations/proof/aggregate_sum.rs (~330 lines) for the
  GroveDB-level multi-layer envelope chain check
- prove_query / verify_query dispatch in generate.rs and verify.rs
- Tree-type rejection arms in BulkAppendTree, DenseTree, MMR for the new variant

Key correctness points handled differently from count:
- i128 accumulator throughout the verifier (sum can validly be 0 with
  non-zero children, so no "if sum == 0" short-circuit; final narrow to
  i64 with an explicit overflow error)
- No checked_sub equivalent for own_sum derivation — signed sums make
  arithmetic-only corruption detection meaningless; the hash chain binds
  the values regardless
- ProvableSumTree-only at the merk-level gate (Sum/BigSum use different
  hash dispatches and can't host this proof shape)

Tests: 35 new tests total (14 merk-level in aggregate_sum.rs, 21 GroveDB-
level in aggregate_sum_query_tests.rs) covering empty trees, single-key
ranges, full/sub/boundary ranges, negative sums, mixed-sign extremes
including i64::MAX + i64::MIN = -1, tampering rejection, wrong-tree
rejection, validation rejection of nested/Key/RangeFull/orthogonal-aggregate
inners, multi-layer paths, NotSummed-wrapped subtree exclusion, V0 envelope
round-trip. Workspace test count: 2898 → 2938, zero failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final phase of the ProvableSumTree feature — documentation. Adds:

- `docs/book/src/aggregate-sum-on-range-queries.md`: new dedicated chapter
  describing the AggregateSumOnRange query, the ProvableSumTree tree type
  it operates on, why the existing sum trees can't be queried this way,
  the proof node vocabulary (KVSum / KVHashSum / HashWithSum / KVDigestSum
  / KVRefValueHashSum at wire tags 0x30..=0x3D), and the signed-sum
  correctness notes (no zero-sum short-circuit; i128 accumulator with
  i64 narrowing at the entry points; overflow handling at i64::MAX
  extremes).
- `docs/book/src/element-system.md`: ProvableSumTree row added to the
  aggregate-tree table; ProvableSummedMerkNode added to the
  TreeFeatureType enum block; NonCounted/NotSummed wrapper indices
  surfaced; explanation of when to choose ProvableSumTree over plain
  SumTree (sum is part of the protocol invariant vs metadata) and the
  rationale for the explicit `NotSummedProvableSumTree = 177` slot.
- `docs/book/src/hashing.md`: parallel "Aggregate Hashing for
  ProvableSumTree" section showing node_hash_with_sum's i64 BE input
  layout and the wire-vs-hash encoding split.
- `docs/book/src/appendix-a.md`: rows for NonCounted (15), NotSummed
  (16), and ProvableSumTree (17) added to the discriminant table.
- `docs/book/src/aggregate-sum-queries.md`: disambiguation banner at the
  top distinguishing the existing sum-budget iterator from the new
  AggregateSumOnRange query, with a cross-link.
- `docs/book/src/SUMMARY.md`: registers the new chapter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 11, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds ProvableSumTree (sum bound into Merk hashes) and AggregateSumOnRange queries with full prover/verification, query APIs, proof op encoding, GroveDB plumbing, debugger types, docs, version flags, and extensive tests.

Changes

ProvableSumTree and Aggregate Sum on Range

Layer / File(s) Summary
Element types and hashing primitives
grovedb-element/src/*, merk/src/tree_type/*, merk/src/tree/*, grovedbg-types/src/lib.rs
Adds Element::ProvableSumTree, ElementType/TreeType/TreeFeatureType variants, node_hash_with_sum, AggregateData::ProvableSum, costs, serialization, and helpers.
Query shape and proof op encoding
grovedb-query/src/query*.rs, grovedb-query/src/proofs/*
Introduces QueryItem::AggregateSumOnRange, validators, and new sum-bearing proof Node variants with full encode/decode and display.
Merk proving and verifying
merk/src/merk/*, merk/src/proofs/**/*
Implements prove_aggregate_sum_on_range, no-proof sum, verifier for sum-only proofs, and integrates sum nodes in regular proofs and traversal.
GroveDB integration and APIs
grovedb/src/operations/*, grovedb/src/query/mod.rs, grovedb/src/lib.rs, grovedb/src/debugger.rs
Wires query validation, proof generation/verification, no-proof API (query_aggregate_sum), trunk/leaf-chain checks, debugger mappings, and aggregate consistency checks.
Docs, versions, tests, and rejections
docs/book/src/*, grovedb-version/src/version/*, */tests/*.rs, other trees’ proof modules
Adds docs for feature and hashing, version flags, extensive e2e and security tests, and explicit rejections on unsupported tree types (dense, MMR, bulk-append).

Sequence Diagram(s)

sequenceDiagram
  participant Client as Client (rgba(66, 135, 245, 0.5))
  participant GroveDb as GroveDb (rgba(72, 201, 176, 0.5))
  participant Merk as Merk (rgba(245, 176, 65, 0.5))
  Client->>GroveDb: PathQuery.new_aggregate_sum_on_range(path, range)
  GroveDb->>GroveDb: validate_aggregate_sum_on_range
  alt proof-based
    GroveDb->>Merk: prove_aggregate_sum_on_range(inner_range)
    Merk-->>GroveDb: ProofOps + sum i64
    GroveDb-->>Client: verify_aggregate_sum_query -> (root_hash, sum)
  else no-proof
    GroveDb->>Merk: sum_aggregate_on_range(inner_range)
    Merk-->>GroveDb: sum i64
    GroveDb-->>Client: sum i64
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • dashpay/grovedb#663: Adjusts aggregate-count carrier validation; related to aggregate variant shape enforcement added here.

Poem

A rabbit counts the sums in leaves and boughs,
Adds them in the shadows where the Merk tree bows.
With proofs that hum and hashes that sing,
It carries each total on cryptographic wing.
Thump-thump—range sealed, sums true—what a spring! 🐇✨

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/distracted-margulis-99463b

@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 92.98581% with 257 lines in your changes missing coverage. Please review.
✅ Project coverage is 91.10%. Comparing base (8755418) to head (6ebdb2c).
⚠️ Report is 1 commits behind head on develop.

Files with missing lines Patch % Lines
merk/src/proofs/query/aggregate_sum.rs 90.63% 82 Missing ⚠️
grovedb/src/operations/proof/generate.rs 70.17% 34 Missing ⚠️
merk/src/proofs/query/verify.rs 74.19% 24 Missing ⚠️
...vedb/src/operations/proof/aggregate_sum/helpers.rs 88.48% 16 Missing ⚠️
grovedb-query/src/query_item/mod.rs 96.96% 11 Missing ⚠️
grovedb/src/operations/proof/verify.rs 31.25% 11 Missing ⚠️
grovedb/src/operations/get/query.rs 72.22% 10 Missing ⚠️
grovedb-query/src/query.rs 92.45% 8 Missing ⚠️
merk/src/proofs/branch/mod.rs 30.00% 7 Missing ⚠️
grovedb/src/operations/proof/aggregate_sum/mod.rs 83.33% 6 Missing ⚠️
... and 16 more
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #661      +/-   ##
===========================================
+ Coverage    90.76%   91.10%   +0.34%     
===========================================
  Files          189      193       +4     
  Lines        56407    59993    +3586     
===========================================
+ Hits         51196    54655    +3459     
- Misses        5211     5338     +127     
Components Coverage Δ
grovedb-core 88.45% <89.25%> (-0.04%) ⬇️
merk 92.59% <90.90%> (+0.26%) ⬆️
storage 86.36% <ø> (ø)
commitment-tree 96.43% <ø> (ø)
mmr 96.76% <ø> (ø)
bulk-append-tree 89.77% <100.00%> (+0.62%) ⬆️
element 96.81% <99.32%> (+1.05%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Adds focused unit tests for production code added by this PR that the
existing test suite happened to exercise indirectly. None of these tests
add new behavior; they only pin down branches that codecov flagged as
uncovered on the patch.

Covers:

- grovedb-query/src/proofs/encoding.rs: long-value (>= 65536 bytes)
  round-trips for the four KV-style ProvableSumTree wire variants
  (KVSum, KVRefValueHashSum in both Push and PushInverted directions
  -- tag bytes 0x31, 0x34, 0x38, 0x3b).
- grovedb-query/src/proofs/mod.rs: Display tests for KVSum, KVHashSum,
  KVRefValueHashSum, KVDigestSum, and HashWithSum proof nodes.
- grovedb-element/src/element_type.rs: proof_node_type dispatch on
  ProvableSumTree parents (Items -> KvSum, References -> KvRefValueHashSum),
  plus as_str / Display for ProvableSumTree, NonCountedProvableSumTree,
  NotSummedProvableSumTree.
- grovedb-element/tests/element_constructors_helpers.rs: every
  ProvableSumTree constructor + is_provable_sum_tree /
  as_provable_sum_tree_value / into_provable_sum_tree_value helpers,
  including the wrong-element error paths.
- grovedb-element/tests/element_display_and_serialization.rs: extends
  the all-variants Display test to include ProvableSumTree.
- merk/src/proofs/tree.rs: forged-sum sensitivity for KVDigestSum and
  KVRefValueHashSum (the latter exercises the full
  combine(referenced_value_hash, node_value_hash) -> node_hash_with_sum
  path), aggregate_data() returning ProvableSum for KVSum / HashWithSum,
  and key() returning the right thing for all five sum node variants.
- merk/src/proofs/query/aggregate_sum.rs: two new regression tests for
  Merk::prove (regular query) on a ProvableSumTree, asserting that the
  emitted proof contains KVSum + KVHashSum (and KVDigestSum for an
  absent-key boundary). These hit the to_kv_sum_node / to_kvhash_sum_node
  / to_kvdigest_sum_node helpers whose only callers are inside
  create_proof_internal's ProvableSumTree branches.

All 2956 workspace tests pass (was 2938 before this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
merk/src/proofs/tree.rs (1)

55-68: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate ProvableSum out of every sum-bearing proof node.

KVHashSum, KVDigestSum, KVRefValueHashSum, and HashWithSum still collapse to NoAggregateData or InvalidProofError here. Once one of those variants is reconstructed as a child link, the parent loses that subtree's committed sum even though the proof node already carried it.

Suggested fix
     pub fn as_link(&self) -> Link {
         let (key, aggregate_data) = match &self.tree.node {
             Node::KV(key, _) | Node::KVValueHash(key, ..) => {
                 (key.as_slice(), AggregateData::NoAggregateData)
             }
             Node::KVValueHashFeatureType(key, _, _, feature_type)
             | Node::KVValueHashFeatureTypeWithChildHash(key, _, _, feature_type, _) => {
                 (key.as_slice(), (*feature_type).into())
             }
             Node::KVCount(key, _, count) => (key.as_slice(), AggregateData::ProvableCount(*count)),
-            Node::KVSum(key, _, sum) => (key.as_slice(), AggregateData::ProvableSum(*sum)),
+            Node::KVSum(key, _, sum)
+            | Node::KVDigestSum(key, _, sum)
+            | Node::KVRefValueHashSum(key, _, _, sum) => {
+                (key.as_slice(), AggregateData::ProvableSum(*sum))
+            }
+            Node::KVHashSum(_, sum) | Node::HashWithSum(.., sum) => {
+                (&[] as &[u8], AggregateData::ProvableSum(*sum))
+            }
             // for the connection between the trunk and leaf chunks, we don't
             // have the child key so we must first write in an empty one. once
             // the leaf gets verified, we can write in this key to its parent
             _ => (&[] as &[u8], AggregateData::NoAggregateData),
         };
@@
     pub(crate) fn aggregate_data(&self) -> Result<AggregateData, Error> {
         match &self.node {
             Node::KVValueHashFeatureType(.., feature_type)
             | Node::KVValueHashFeatureTypeWithChildHash(.., feature_type, _) => {
                 Ok((*feature_type).into())
             }
             Node::KVCount(_, _, count) => Ok(AggregateData::ProvableCount(*count)),
             Node::HashWithCount(.., count) => Ok(AggregateData::ProvableCount(*count)),
-            Node::KVSum(_, _, sum) => Ok(AggregateData::ProvableSum(*sum)),
-            Node::HashWithSum(.., sum) => Ok(AggregateData::ProvableSum(*sum)),
+            Node::KVSum(_, _, sum)
+            | Node::KVHashSum(_, sum)
+            | Node::KVDigestSum(_, _, sum)
+            | Node::KVRefValueHashSum(_, _, _, sum)
+            | Node::HashWithSum(.., sum) => Ok(AggregateData::ProvableSum(*sum)),
             Node::KV(..) | Node::KVValueHash(..) => Ok(AggregateData::NoAggregateData),
             _ => Err(Error::InvalidProofError(
                 "Cannot extract aggregate data from this node type".to_string(),
             )),
         }

Also applies to: 482-497

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/proofs/tree.rs` around lines 55 - 68, The match on self.tree.node
currently returns NoAggregateData for many sum-bearing variants, losing
committed sums; update the match in tree.rs (the block matching Node variants)
to propagate AggregateData::ProvableSum(sum) for all sum-bearing node variants
(e.g., KVHashSum, KVDigestSum, KVRefValueHashSum, HashWithSum and any other
*_Sum variants) instead of falling back to NoAggregateData or error, so the
parent link preserves the subtree's sum when reconstructed; use the existing
AggregateData::ProvableSum(...) constructor and the same key extraction pattern
as other arms.
merk/src/proofs/query/mod.rs (1)

379-419: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve sum-aware queried nodes for non-Element Merk values.

If ElementType::from_serialized_value(...) fails here, a matched ProvableSummedMerkNode still falls back to ProofNodeType::Kv. The verifier then recomputes node_hash(...) instead of node_hash_with_sum(...), so raw Merk ProvableSumTree proofs for queried items cannot verify even though the stored link hash committed the sum. The same fallback already exists for count; this new path extends that mismatch to sum proofs too.

Suggested fix
-            let proof_node_type = ElementType::from_serialized_value(self.tree().value_as_slice())
-                .map(|et| et.proof_node_type(parent_tree_type))
-                .unwrap_or(ProofNodeType::Kv); // Default to tamper-resistant for raw Merk
+            let proof_node_type = ElementType::from_serialized_value(self.tree().value_as_slice())
+                .map(|et| et.proof_node_type(parent_tree_type))
+                .unwrap_or_else(|_| match parent_tree_type {
+                    Some(ElementType::ProvableCountTree) => ProofNodeType::KvCount,
+                    Some(ElementType::ProvableSumTree) => ProofNodeType::KvSum,
+                    _ => ProofNodeType::Kv,
+                });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/proofs/query/mod.rs` around lines 379 - 419, The proof_node_type
fallback currently always uses ProofNodeType::Kv when
ElementType::from_serialized_value(...) fails, which loses sum-awareness for
ProvableSummedMerkNode; update the proof_node_type assignment so that when
from_serialized_value returns Err you detect the node kind (the existing matched
ProvableSummedMerkNode case) and choose the appropriate sum-aware ProofNodeType
(e.g., ProofNodeType::KvSum or the KvRefValueHashSum variant for
reference/summed nodes) instead of defaulting to Kv; change the
ElementType::from_serialized_value(...) .map(...).unwrap_or(...) logic around
the proof_node_type variable to inspect the current node/variant (the same
condition that produced a ProvableSummedMerkNode) and return the correct sum
proof node type, so downstream node generation (the match on proof_node_type
that calls to_kv_sum_node(), to_kv_value_hash_feature_type_node(), etc.)
preserves sum verification.
🧹 Nitpick comments (6)
merk/src/proofs/branch/mod.rs (1)

112-133: 💤 Low value

Consider deduplicating get_key_from_node across TrunkQueryResult and BranchQueryResult.

The two get_key_from_node implementations are now byte-identical and both had to be updated in lockstep for the new sum-bearing variants. Extracting a free function (or a private helper on Node) would eliminate the drift risk as more variants get added.

♻️ Sketch
// merk/src/proofs/branch/mod.rs (module-private helper)
#[cfg(any(feature = "minimal", feature = "verify"))]
fn key_from_node(node: &Node) -> Option<Vec<u8>> {
    match node {
        Node::KV(key, _)
        | Node::KVValueHash(key, ..)
        | Node::KVValueHashFeatureType(key, ..)
        | Node::KVValueHashFeatureTypeWithChildHash(key, ..)
        | Node::KVDigest(key, _)
        | Node::KVDigestCount(key, ..)
        | Node::KVRefValueHash(key, ..)
        | Node::KVCount(key, ..)
        | Node::KVRefValueHashCount(key, ..)
        | Node::KVSum(key, ..)
        | Node::KVDigestSum(key, ..)
        | Node::KVRefValueHashSum(key, ..) => Some(key.clone()),
        Node::Hash(_)
        | Node::KVHash(_)
        | Node::KVHashCount(..)
        | Node::HashWithCount(..)
        | Node::KVHashSum(..)
        | Node::HashWithSum(..) => None,
    }
}

Then have both impls call key_from_node(...).

Also applies to: 383-404

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/proofs/branch/mod.rs` around lines 112 - 133, Both
TrunkQueryResult::get_key_from_node and BranchQueryResult::get_key_from_node are
identical; extract a single helper (e.g., module-private fn key_from_node(node:
&Node) -> Option<Vec<u8>>) or a private Node method and move the match there
(include the same cfg attributes used now), then update both
TrunkQueryResult::get_key_from_node and BranchQueryResult::get_key_from_node to
delegate to that helper; apply the same change for the duplicate occurrence
around the other block (the 383-404 region) so both call the shared helper.
merk/src/merk/prove.rs (1)

201-206: ⚡ Quick win

Use cost_return_on_error! for the tree-type guard.

This new CostResult path does a manual early return instead of the repo’s standard cost-accumulating macro. Switching the guard to cost_return_on_error! keeps the error path consistent with the rest of the costed API. As per coding guidelines, "**/*.rs: Use cost_return_on_error! macro for early returns with cost accumulation in Rust source files`."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/merk/prove.rs` around lines 201 - 206, Replace the manual early
return in the tree-type guard with the repo macro cost_return_on_error! so the
error path accumulates cost consistently; specifically, where you check if
!matches!(tree_type, crate::TreeType::ProvableSumTree) and return
Error::InvalidProofError("AggregateSumOnRange..."), switch that to
cost_return_on_error!(Error::InvalidProofError(format!(...))) using the same
formatted message (referencing tree_type and ProvableSumTree) so the semantics
and message remain identical but the CostResult path uses the standard macro.
merk/src/element/tree_type.rs (1)

246-251: ⚡ Quick win

Extend the wrapper regression to ProvableSumTree.

get_feature_type(TreeType::ProvableSumTree) is the new sum-bearing branch, but the existing NotSummed regression tests still only enumerate the older parent types. Please add ProvableSumTree there as well so the zeroed-sum wrapper semantics stay pinned down.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/element/tree_type.rs` around lines 246 - 251, The NotSummed
wrapper-regression tests don’t include the new sum-bearing branch, so update the
enumeration used by the regression to also include TreeType::ProvableSumTree;
specifically, in the tests that call get_feature_type(...) and assert NotSummed
behavior, add TreeType::ProvableSumTree to the list of TreeType variants being
exercised so the zeroed-sum wrapper semantics remain pinned (look for the
NotSummed regression/test block that enumerates parent types and add the
ProvableSumTree variant).
merk/src/tree/link.rs (1)

447-456: ⚡ Quick win

Add a round-trip test for the new wire tag.

Tag 7 is introduced in both encode and decode, but this module’s unit tests still only exercise the older aggregate variants. A small round-trip for AggregateData::ProvableSum, including a negative value, would lock down the tag assignment and catch wire-format drift quickly.

Also applies to: 654-658

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/tree/link.rs` around lines 447 - 456, Add a unit test that
round-trips the new wire tag 7: construct AggregateData::ProvableSum with a
negative value (e.g., -42), serialize it using the same encode path that writes
the tag (the code that calls out.write_all(&[7]) and out.write_varint), then
deserialize using this module's decode path and assert the decoded AggregateData
equals the original and that the serialized bytes start with 0x07; add a
matching test for the other encode location mentioned (lines ~654-658) so both
encoding sites for AggregateData::ProvableSum are exercised.
grovedb/src/operations/proof/generate.rs (1)

337-341: ⚡ Quick win

Wrap the new aggregate-sum merk errors with call-site context.

Both new branches forward a bare MerkError, which makes aggregate-sum proof failures hard to distinguish from the surrounding proof-generation paths. Please wrap these with a contextual message before returning.

♻️ Suggested change
-                    .prove_aggregate_sum_on_range(&inner_range, grove_version)
-                    .map_err(Error::MerkError)
+                    .prove_aggregate_sum_on_range(&inner_range, grove_version)
+                    .map_err(|e| {
+                        Error::CorruptedData(format!(
+                            "prove_aggregate_sum_on_range failed: {}",
+                            e
+                        ))
+                    })

As per coding guidelines "Wrap errors with context using .map_err(|e| Error::CorruptedData(format!("context: {}", e))) pattern in Rust source files".

Also applies to: 1169-1173

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/proof/generate.rs` around lines 337 - 341, The current
calls that map merk errors from subtree.prove_aggregate_sum_on_range to
Error::MerkError lack context; replace .map_err(Error::MerkError) with a
contextual wrapper such as .map_err(|e|
Error::CorruptedData(format!("aggregate-sum proof failed: {}", e))) at each site
(the call inside the cost_return_on_error! around
subtree.prove_aggregate_sum_on_range and the other similar branch later), so the
returned Error includes a clear message plus the original merk error for easier
diagnosis.
grovedb/src/tests/aggregate_sum_query_tests.rs (1)

578-590: ⚡ Quick win

Exercise the new prover gate, not just verifier-side validation.

This only feeds a dummy proof into verify_aggregate_sum_query(). The new defense-in-depth behavior added in prove_query_non_serialized() is still unasserted here, so a regression in proof generation could slip through while this test keeps passing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 578 - 590, The
test currently only calls GroveDb::verify_aggregate_sum_query with a dummy
proof, so update it to also exercise the prover gate by calling the prover
function (e.g., GroveDb::prove_query_non_serialized or the appropriate
proof-generation method used in the codebase) with the constructed PathQuery
(created via PathQuery::new_aggregate_sum_on_range and mutated via set_subquery)
and assert that proof generation returns an Err; keep the existing
verify_aggregate_sum_query assertion as well so both prover-side
(prove_query_non_serialized) and verifier-side
(GroveDb::verify_aggregate_sum_query) validations are asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/book/src/appendix-a.md`:
- Around line 20-22: The table entry for `ProvableSumTree` has the wrong
TreeType discriminant (currently `11`); update the table row so the discriminant
is `17` to match the actual enum variant (`ProvableSumTree`) in the codebase,
keeping the rest of the columns unchanged (use `17` for the second column, keep
`(root_key, sum: i64, flags)` and `SUM_TREE_COST_SIZE` as-is) so the doc matches
the `ProvableSumTree` discriminant.

In `@grovedb-query/src/query_item/mod.rs`:
- Around line 165-166: The serializer is emitting PascalCase for the QueryItem
variant names (e.g., QueryItem::AggregateSumOnRange) via
serialize_newtype_variant but the deserializer uses #[serde(field_identifier,
rename_all = "snake_case")]; update the serialize_newtype_variant calls for
QueryItem::AggregateSumOnRange (and the other QueryItem variants in the same
match arm group) to use snake_case names (e.g., "aggregate_sum_on_range") so the
serialized variant names match deserialization expectations and round-trip
correctly.

In `@grovedb-query/src/query.rs`:
- Around line 413-420: The current loop over self.conditional_subquery_branches
only checks branch.subquery for has_aggregate_sum_on_range_anywhere(), so a
branch that contains AggregateSumOnRange in its selector/key is missed; update
the loop to also inspect the branch itself (not just branch.subquery) for
AggregateSumOnRange—e.g., for each branch in
conditional_subquery_branches.values() call a method or predicate on the branch
(e.g., branch.has_aggregate_sum_on_range_anywhere() or explicitly check
branch.selector/key structure) and return true if that check or the existing
branch.subquery.as_deref().has_aggregate_sum_on_range_anywhere() is true. Ensure
you reference conditional_subquery_branches, branch, branch.subquery and
has_aggregate_sum_on_range_anywhere in the fix.

In `@grovedb/src/lib.rs`:
- Around line 1061-1073: The current issues.insert call inside the
aggregate_consistency_labels branch overwrites any existing entry (such as one
created when combined_value_hash != element_value_hash) so callers lose the real
hash-chain mismatch; modify the logic around aggregate_consistency_labels(...)
and issues.insert(...) to detect an existing entry for new_path and either merge
the aggregate placeholders with the existing (root_hash, expected_placeholder,
actual_placeholder) tuple or append/replace with a composite value that
preserves both the original element_value_hash mismatch and the synthetic
aggregate placeholders; specifically update the code that calls
aggregate_consistency_labels(&element, &actual_aggregate) and the
issues.insert(new_path.to_vec(), ...) call to check
issues.get(&new_path.to_vec()) and preserve/merge previous data instead of
unconditionally inserting and overwriting.

In `@grovedb/src/operations/proof/aggregate_sum.rs`:
- Around line 193-195: The V1 layer walker is calling
verify_single_key_layer_proof_v0 from verify_v1_layer, which internally calls
execute_proof with a hardcoded 0 (V0); update verify_single_key_layer_proof_v0
(and its call site in verify_v1_layer) to use PROOF_VERSION_LATEST instead so V1
semantics are enforced consistently with verify_layer_proof_v1 and
verify_trunk_chunk_proof_v1; locate the execute_proof(...) invocation inside
verify_single_key_layer_proof_v0 and replace the literal 0 with
PROOF_VERSION_LATEST (or propagate a version constant through the call chain)
and run tests to ensure V1 proof shapes are rejected/accepted consistently.

In `@grovedb/src/operations/proof/verify.rs`:
- Around line 2694-2696: The match arm in get_key_value_from_node incorrectly
treats Node::KVRefValueHashSum as carrying an authenticated value; remove
KVRefValueHashSum from the arm that returns Some((key.clone(), value.clone()))
so opaque-hash sum nodes are not deserialized as values, and add the same
rejection/guard for Node::KVRefValueHashSum in extract_elements_and_leaf_keys
(mirror the existing handling for KVRefValueHashCount) to ensure trunk/branch
proofs never accept opaque reference_element_hash nodes as authenticated values;
update tests to cover both KVRefValueHashCount and KVRefValueHashSum being
rejected.

In `@grovedb/src/tests/provable_sum_tree_tests.rs`:
- Around line 601-651: The test currently only builds `template` via normal
inserts and never exercises the direct-insert branch; after you capture
`root_key` and `sum` from the `template` Element::ProvableSumTree, perform a
second insert that directly writes an
Element::ProvableSumTree(Some(root_key.clone()), sum, ...) (use the same Element
variant) into a new key (e.g., "direct") via `db.insert`, then `db.get` that new
key and assert it returns the same ProvableSumTree contents (root_key present
and sum equals 6) to validate the direct-insert path in
`direct_insert_provable_sum_tree_with_root_key_and_sum`.
- Around line 48-69: The test loop inserts sum items via db.insert
(Element::new_sum_item) and then fetches the parent with db.get and calls
fetched.as_provable_sum_tree_value(), but it never asserts the fetched aggregate
equals the running total; update the loop in provable_sum_tree_tests to compute
an expected_running_total (incrementing by value each iteration) and assert that
the value returned by fetched.as_provable_sum_tree_value() (from the
Element::ProvableSumTree) equals that expected_running_total after each insert
so intermediate propagation is verified.

In `@merk/src/proofs/query/aggregate_sum.rs`:
- Around line 1062-1071: The test currently returns early when
merk.apply(&entries, &[], None, v).unwrap().is_err(), which makes the overflow
regression test a no-op; instead update the branch to assert the expected
failure (e.g., assert matches the specific overflow error returned by merk.apply
for the two i64::MAX fixture) or, if the design expects Phase 1 to reject but
you still need verifier coverage, add a fabricated-proof path that constructs a
proof for the overflow case and always exercises the verifier; reference the
merk.apply call, the entries fixture containing two i64::MAX values, and the
verifier path in the verify layer when implementing the assertion or
fabricated-proof fallback.
- Around line 130-137: provable_sum_from_aggregate currently returns
Error::InvalidProofError for missing/wrong AggregateData, but these are
local/state invariant failures and should be Error::CorruptedData; change the
match arms in provable_sum_from_aggregate to return
Err(Error::CorruptedData(...)) with a descriptive message, and similarly replace
InvalidProofError uses in the other failing aggregate-handling sites (the blocks
around the ranges corresponding to lines ~240-250 and ~283-293) so they wrap
underlying errors/context using the .map_err(|e|
Error::CorruptedData(format!("context: {}", e))) pattern when calling
aggregate_data() and return CorruptedData on wrong enum variants (e.g.,
"expected ProvableSum aggregate data, got …") to correctly classify storage
corruption.

In `@merk/src/tree_type/mod.rs`:
- Around line 47-51: The doc comment for the ProvableSumTree variant currently
implies it's a Phase-1 alias of SumTree; update the comment on the
ProvableSumTree declaration to describe its actual behavior: state that it
implements a distinct provable-sum hash and proof path (not just a Phase-2
plan), mention which fields/types differ (inner_node_type and
empty_tree_feature_type) and that hashes/proofs are computed using the
provable-sum algorithm introduced in this PR, and remove the “Phase 2 will
diverge” wording so callers understand it is a separate tree type with its own
hashing/proof semantics.

---

Outside diff comments:
In `@merk/src/proofs/query/mod.rs`:
- Around line 379-419: The proof_node_type fallback currently always uses
ProofNodeType::Kv when ElementType::from_serialized_value(...) fails, which
loses sum-awareness for ProvableSummedMerkNode; update the proof_node_type
assignment so that when from_serialized_value returns Err you detect the node
kind (the existing matched ProvableSummedMerkNode case) and choose the
appropriate sum-aware ProofNodeType (e.g., ProofNodeType::KvSum or the
KvRefValueHashSum variant for reference/summed nodes) instead of defaulting to
Kv; change the ElementType::from_serialized_value(...) .map(...).unwrap_or(...)
logic around the proof_node_type variable to inspect the current node/variant
(the same condition that produced a ProvableSummedMerkNode) and return the
correct sum proof node type, so downstream node generation (the match on
proof_node_type that calls to_kv_sum_node(),
to_kv_value_hash_feature_type_node(), etc.) preserves sum verification.

In `@merk/src/proofs/tree.rs`:
- Around line 55-68: The match on self.tree.node currently returns
NoAggregateData for many sum-bearing variants, losing committed sums; update the
match in tree.rs (the block matching Node variants) to propagate
AggregateData::ProvableSum(sum) for all sum-bearing node variants (e.g.,
KVHashSum, KVDigestSum, KVRefValueHashSum, HashWithSum and any other *_Sum
variants) instead of falling back to NoAggregateData or error, so the parent
link preserves the subtree's sum when reconstructed; use the existing
AggregateData::ProvableSum(...) constructor and the same key extraction pattern
as other arms.

---

Nitpick comments:
In `@grovedb/src/operations/proof/generate.rs`:
- Around line 337-341: The current calls that map merk errors from
subtree.prove_aggregate_sum_on_range to Error::MerkError lack context; replace
.map_err(Error::MerkError) with a contextual wrapper such as .map_err(|e|
Error::CorruptedData(format!("aggregate-sum proof failed: {}", e))) at each site
(the call inside the cost_return_on_error! around
subtree.prove_aggregate_sum_on_range and the other similar branch later), so the
returned Error includes a clear message plus the original merk error for easier
diagnosis.

In `@grovedb/src/tests/aggregate_sum_query_tests.rs`:
- Around line 578-590: The test currently only calls
GroveDb::verify_aggregate_sum_query with a dummy proof, so update it to also
exercise the prover gate by calling the prover function (e.g.,
GroveDb::prove_query_non_serialized or the appropriate proof-generation method
used in the codebase) with the constructed PathQuery (created via
PathQuery::new_aggregate_sum_on_range and mutated via set_subquery) and assert
that proof generation returns an Err; keep the existing
verify_aggregate_sum_query assertion as well so both prover-side
(prove_query_non_serialized) and verifier-side
(GroveDb::verify_aggregate_sum_query) validations are asserted.

In `@merk/src/element/tree_type.rs`:
- Around line 246-251: The NotSummed wrapper-regression tests don’t include the
new sum-bearing branch, so update the enumeration used by the regression to also
include TreeType::ProvableSumTree; specifically, in the tests that call
get_feature_type(...) and assert NotSummed behavior, add
TreeType::ProvableSumTree to the list of TreeType variants being exercised so
the zeroed-sum wrapper semantics remain pinned (look for the NotSummed
regression/test block that enumerates parent types and add the ProvableSumTree
variant).

In `@merk/src/merk/prove.rs`:
- Around line 201-206: Replace the manual early return in the tree-type guard
with the repo macro cost_return_on_error! so the error path accumulates cost
consistently; specifically, where you check if !matches!(tree_type,
crate::TreeType::ProvableSumTree) and return
Error::InvalidProofError("AggregateSumOnRange..."), switch that to
cost_return_on_error!(Error::InvalidProofError(format!(...))) using the same
formatted message (referencing tree_type and ProvableSumTree) so the semantics
and message remain identical but the CostResult path uses the standard macro.

In `@merk/src/proofs/branch/mod.rs`:
- Around line 112-133: Both TrunkQueryResult::get_key_from_node and
BranchQueryResult::get_key_from_node are identical; extract a single helper
(e.g., module-private fn key_from_node(node: &Node) -> Option<Vec<u8>>) or a
private Node method and move the match there (include the same cfg attributes
used now), then update both TrunkQueryResult::get_key_from_node and
BranchQueryResult::get_key_from_node to delegate to that helper; apply the same
change for the duplicate occurrence around the other block (the 383-404 region)
so both call the shared helper.

In `@merk/src/tree/link.rs`:
- Around line 447-456: Add a unit test that round-trips the new wire tag 7:
construct AggregateData::ProvableSum with a negative value (e.g., -42),
serialize it using the same encode path that writes the tag (the code that calls
out.write_all(&[7]) and out.write_varint), then deserialize using this module's
decode path and assert the decoded AggregateData equals the original and that
the serialized bytes start with 0x07; add a matching test for the other encode
location mentioned (lines ~654-658) so both encoding sites for
AggregateData::ProvableSum are exercised.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e52b141b-25f4-4a67-a7a6-f7c5ab321b01

📥 Commits

Reviewing files that changed from the base of the PR and between 1da6299 and dfa2046.

📒 Files selected for processing (57)
  • docs/book/src/SUMMARY.md
  • docs/book/src/aggregate-sum-on-range-queries.md
  • docs/book/src/aggregate-sum-queries.md
  • docs/book/src/appendix-a.md
  • docs/book/src/element-system.md
  • docs/book/src/hashing.md
  • grovedb-bulk-append-tree/src/proof/mod.rs
  • grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs
  • grovedb-element/src/element/constructor.rs
  • grovedb-element/src/element/helpers.rs
  • grovedb-element/src/element/mod.rs
  • grovedb-element/src/element/serialize.rs
  • grovedb-element/src/element/visualize.rs
  • grovedb-element/src/element_type.rs
  • grovedb-element/tests/element_constructors_helpers.rs
  • grovedb-element/tests/element_display_and_serialization.rs
  • grovedb-query/src/proofs/encoding.rs
  • grovedb-query/src/proofs/mod.rs
  • grovedb-query/src/proofs/tree_feature_type.rs
  • grovedb-query/src/query.rs
  • grovedb-query/src/query_item/intersect.rs
  • grovedb-query/src/query_item/mod.rs
  • grovedb/src/batch/mod.rs
  • grovedb/src/debugger.rs
  • grovedb/src/lib.rs
  • grovedb/src/operations/get/mod.rs
  • grovedb/src/operations/get/query.rs
  • grovedb/src/operations/insert/mod.rs
  • grovedb/src/operations/proof/aggregate_sum.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/operations/proof/mod.rs
  • grovedb/src/operations/proof/verify.rs
  • grovedb/src/query/mod.rs
  • grovedb/src/tests/aggregate_sum_query_tests.rs
  • grovedb/src/tests/mod.rs
  • grovedb/src/tests/provable_count_sum_tree_tests.rs
  • grovedb/src/tests/provable_sum_tree_tests.rs
  • grovedbg-types/src/lib.rs
  • merk/src/element/costs.rs
  • merk/src/element/delete.rs
  • merk/src/element/get.rs
  • merk/src/element/reconstruct.rs
  • merk/src/element/tree_type.rs
  • merk/src/merk/chunks.rs
  • merk/src/merk/prove.rs
  • merk/src/proofs/branch/mod.rs
  • merk/src/proofs/chunk/chunk.rs
  • merk/src/proofs/query/aggregate_sum.rs
  • merk/src/proofs/query/mod.rs
  • merk/src/proofs/query/verify.rs
  • merk/src/proofs/tree.rs
  • merk/src/tree/hash.rs
  • merk/src/tree/link.rs
  • merk/src/tree/mod.rs
  • merk/src/tree/tree_feature_type.rs
  • merk/src/tree_type/costs.rs
  • merk/src/tree_type/mod.rs

Comment thread docs/book/src/appendix-a.md Outdated
Comment on lines +20 to +22
| 15 | `NonCounted` | wrapper | `Box<Element>` | inner + 1 byte | Opts inner out of parent count aggregation |
| 16 | `NotSummed` | wrapper | `Box<Element>` | inner + 1 byte | Opts inner out of parent sum aggregation |
| 17 | `ProvableSumTree` | 11 | `(root_key, sum: i64, flags)` | SUM_TREE_COST_SIZE | Sum baked into hash (see [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | 💤 Low value

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Locate the TreeType enum definition.
ast-grep --pattern 'pub enum TreeType {
  $$$
}'

# As fallback, search for "TreeType" enum variants with explicit discriminants.
rg -nP --type=rust -C2 '\bProvableSumTree\b.*=\s*\d+'
rg -nP --type=rust -B1 -A1 '\benum TreeType\b'

Repository: dashpay/grovedb

Length of output: 3780


Correct TreeType discriminant for ProvableSumTree.

Line 22 lists ProvableSumTree with TreeType 11, but the actual discriminant is 17 (per grovedb-element/src/element_type.rs). Update to:

| 17 | `ProvableSumTree` | 17 | `(root_key, sum: i64, flags)` | SUM_TREE_COST_SIZE | Sum baked into hash (see [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/book/src/appendix-a.md` around lines 20 - 22, The table entry for
`ProvableSumTree` has the wrong TreeType discriminant (currently `11`); update
the table row so the discriminant is `17` to match the actual enum variant
(`ProvableSumTree`) in the codebase, keeping the rest of the columns unchanged
(use `17` for the second column, keep `(root_key, sum: i64, flags)` and
`SUM_TREE_COST_SIZE` as-is) so the doc matches the `ProvableSumTree`
discriminant.

Comment thread grovedb-query/src/query_item/mod.rs Outdated
Comment thread grovedb-query/src/query.rs
Comment thread grovedb/src/lib.rs Outdated
Comment thread grovedb/src/operations/proof/aggregate_sum.rs Outdated
Comment thread grovedb/src/tests/provable_sum_tree_tests.rs Outdated
Comment thread grovedb/src/tests/provable_sum_tree_tests.rs
Comment thread merk/src/proofs/query/aggregate_sum.rs
Comment thread merk/src/proofs/query/aggregate_sum.rs Outdated
Comment thread merk/src/tree_type/mod.rs Outdated
QuantumExplorer and others added 5 commits May 11, 2026 21:32
The trunk/branch proof extractor was rejecting Node::KVRefValueHash and
Node::KVRefValueHashCount as having an opaque value_hash, but the new
Phase 2 KVRefValueHashSum variant was missing from the rejection arm.
Without this guard, get_key_value_from_node still surfaces (key, value)
for KVRefValueHashSum nodes and the verifier would deserialize and
insert the value bytes into the elements map. The embedded
node_value_hash is opaque (combine_hash of the node_value_hash and the
referenced_value_hash) and cannot be recomputed from the value bytes
alone, so a forged value could ride along while the per-node merk
hash chain still appears valid.

Add KVRefValueHashSum to the rejection arm in
extract_elements_and_leaf_keys, alongside KVRefValueHash and
KVRefValueHashCount, with a regression test that mirrors the existing
KVRefValueHash trunk-proof rejection test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
has_aggregate_sum_on_range_anywhere previously walked only
branch.subquery for each conditional branch, ignoring the branch
selector (the IndexMap key). Selectors are themselves QueryItems and
the type system permits an AggregateSumOnRange tag there even though
it is not a meaningful conditional matcher. The shape check is meant
to be exhaustive — if any ASOR is present "anywhere", the prover must
refuse to route through the regular-proof path — so the walker must
surface a selector-tagged ASOR too.

Iterate `(selector, branch)` instead of `branches.values()` and
short-circuit on `selector.is_aggregate_sum_on_range()` before
recursing into `branch.subquery`. Add a regression test that mirrors
the existing count walker test and explicitly covers the selector
case.

The same gap exists for has_aggregate_count_on_range_anywhere but is
left as-is here since it predates this PR and changing it would be
out of scope.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small correctness improvements flagged by review:

* grovedb/src/lib.rs verify_merk_and_submerks_in_transaction: when both
  the cryptographic combined_value_hash check and the
  aggregate_consistency check fail for the same subtree path, the
  aggregate-consistency branch's `issues.insert(...)` clobbered the
  cryptographic mismatch entry. The real merk-hash chain mismatch is
  the more diagnostic message, so switch to `.entry().or_insert(...)`
  to preserve the first-inserted entry per path.

* merk/src/proofs/query/aggregate_sum.rs: the prover walks our *own*
  in-memory merk. If `aggregate_data()` refuses to surface a
  `ProvableSum` for a node in a tree we already gated as
  `ProvableSumTree`, that is local storage/state corruption, not a
  peer-supplied invalid proof. Reclassify three sites from
  `Error::InvalidProofError` to `Error::CorruptedData` to match the
  repo error-handling convention.

* grovedb/src/operations/proof/generate.rs: the two ASOR call sites
  forwarded bare `Error::MerkError` for `prove_aggregate_sum_on_range`
  failures, making them indistinguishable from surrounding
  proof-generation merk errors. Wrap each with
  `Error::CorruptedData(format!("prove_aggregate_sum_on_range failed: {}", e))`
  per the repo convention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address review feedback on test gaps in the ProvableSumTree /
AggregateSumOnRange suite:

* provable_sum_tree_tests::populated_provable_sum_tree_round_trips:
  the per-iteration loop fetched the parent psum but only asserted its
  variant — the running aggregate after each insert was never checked.
  Track an `expected_sum` accumulator and assert
  `as_provable_sum_tree_value() == expected_sum` after every insert so
  the running aggregate (7 → 20 → 40) is pinned down, not just the
  final sum.

* direct_insert_provable_sum_tree_with_root_key_and_sum: the test was
  named for a direct-insert path but never actually performed one — it
  built a populated template tree and inspected its on-disk shape,
  then stopped. Capture the template's root_key + sum, then run a
  batch `insert_only_known_to_not_already_exist_op` over a
  ProvableSumTree element carrying those values at a fresh top-level
  key (the non-batch insert path forbids non-empty Tree elements with
  "a tree should be empty at the moment of insertion when not using
  batches", so the documented direct-insert semantics are only
  reachable via the batch API). Assert the round-tripped element
  preserves the captured root_key + sum.

* element/tree_type.rs get_feature_type_zeros_sum_for_not_summed_in_sum_parents:
  extend the NotSummed wrapper regression to include
  `TreeType::ProvableSumTree` so the new Phase 2 sum-bearing parent
  variant's zero-sum semantics stay pinned alongside SumTree,
  BigSumTree, CountSumTree, and ProvableCountSumTree.

* tree/link.rs round_trip_aggregate_data_provable_sum_negative: pin
  down the new wire tag 7 introduced for
  `AggregateData::ProvableSum`. Encode a negative-sum reference link
  (also exercises the signed-i64 varint path), assert tag 7 is
  present in the bytes, then decode and assert the link's fields
  round-trip identically.

* aggregate_sum_query_tests::aggregate_sum_with_subquery_is_rejected_at_validation:
  previously only fed a dummy proof to the verifier-side validator,
  leaving the new prover-side gate
  (`prove_query_non_serialized` short-circuit) unasserted. Build the
  malformed PathQuery, set up the standard 15-key fixture, and assert
  `db.prove_query(...)` returns `Err` so a regression in the prover
  gate can't slip through with this test still passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The TreeType::ProvableSumTree doc said "Phase 1: behaves identically
to SumTree everywhere except in inner_node_type /
empty_tree_feature_type. Phase 2 will diverge the hash computation."
Phase 2 has shipped — the hash dispatch now goes through
node_hash_with_sum and the new proof-node families (KVSum, KVHashSum,
KVDigestSum, KVRefValueHashSum, HashWithSum) plus the
AggregateSumOnRange query are all in place. Rewrite the comment to
describe the current (post-Phase-2) semantics: an i64 sum baked into
every node's hash, making sum tampering catchable via proof
verification, as the sum-side counterpart of ProvableCountTree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member Author

This is Claude addressing the review feedback. Pushed 2b8490e, ae7b971, 8cec7b5, fce7132, ff5645b.

Fixed:

  • 2b8490e (Critical feat: subtree get/insert #1): reject KVRefValueHashSum in extract_elements_and_leaf_keys alongside KVRefValueHash and KVRefValueHashCount. Without this, the trunk/branch verifier would deserialize and insert the value bytes from a forged KVRefValueHashSum node — the embedded node_value_hash is opaque (combine_hash of node_value_hash and referenced_value_hash) and can't be recomputed at the trunk layer. Added regression test in trunk_proof_tests.rs.
  • ae7b971 (Major fix: circular reference path #3): has_aggregate_sum_on_range_anywhere now also inspects the conditional-branch selector (map key) for AggregateSumOnRange, not only branch.subquery. The selector is itself a QueryItem and the type permits ASOR there. Added a regression test covering the selector case.
  • 8cec7b5 (Minor feat: abstract Merk storage #7): verify_merk_and_submerks_in_transaction uses .entry().or_insert(...) so the aggregate-consistency placeholder doesn't clobber a cryptographic combined_value_hash != element_value_hash entry that was inserted earlier for the same path.
  • 8cec7b5 (Major feat: node.js neon bindings #8): three Error::InvalidProofError sites in merk/src/proofs/query/aggregate_sum.rs reclassified to Error::CorruptedData — the prover walks our own merk, so an aggregate_data failure is local state corruption per the repo convention.
  • 8cec7b5 (Nitpick feat: proof serialization and deserialization #18): both ASOR prove_aggregate_sum_on_range call sites in grovedb/src/operations/proof/generate.rs wrap their MerkError with contextual CorruptedData(format!("prove_aggregate_sum_on_range failed: {}", e)) per the repo's error-context convention.
  • fce7132 (Minor feat: Simple checkpoint #9): the populated-tree round-trip loop now tracks a running expected_sum and asserts the ProvableSumTree's aggregate matches it after every insert (7 → 20 → 40), not just the final value.
  • fce7132 (Minor feat: proof verification #10): direct_insert_provable_sum_tree_with_root_key_and_sum now actually performs the direct-insert it was named for — the non-batch insert path rejects non-empty Tree elements ("a tree should be empty…"), so the test uses apply_batch with insert_only_known_to_not_already_exist_op to write a ProvableSumTree carrying the captured root_key + sum at a fresh top-level key, then asserts both fields round-trip.
  • fce7132 (Nitpick feat: multi path proof verification #16): the NotSummed wrapper zero-sum regression now also exercises TreeType::ProvableSumTree, asserting ProvableSummedMerkNode(0).
  • fce7132 (Nitpick feat: delete operation for elements and trees #17): new round-trip test for the wire tag 7 introduced for AggregateData::ProvableSum in merk/src/tree/link.rs — encodes a negative-sum reference link, asserts tag 7 is present, decodes and checks all fields round-trip.
  • fce7132 (Nitpick test: range proofs #19): aggregate_sum_with_subquery_is_rejected_at_validation now also calls prove_query(...) against the real 15-key fixture and asserts Err, so the prover-side gate (not just verifier-side) is regression-tested.
  • ff5645b (Minor feat: subtree pairs iterator #12): refreshed the stale "Phase 1 / Phase 2 will diverge" comment on TreeType::ProvableSumTree and the matching comment in element/tree_type.rs to describe the actual post-Phase-2 semantics (sum baked into node hash via node_hash_with_sum).

Skipped (false positives / out of scope):

  • chore: embed Merk #2 (serde variant name PascalCase vs snake_case): pre-existing intentional mismatch already documented in query_item/mod.rs:1427AggregateCountOnRange has the same shape; the project uses serde_test token streams for round-trip testing precisely because of this. Fixing only the new variant would diverge it from the existing one. Skipping for parity.
  • feat: Implement HADS using single RocksDB #4 (V0 hardcode in verify_single_key_layer_proof_v0): identical V0 hardcode in aggregate_count.rs at the same call sites. This is a pre-existing pattern, not introduced by this PR — the "v0" suffix in the function name is intentional. Fixing only the sum side would create count/sum inconsistency.
  • feat: Proof Construction #5 (sum variants collapse to NoAggregateData / InvalidProofError in proofs/tree.rs): the count counterparts (KVHashCount, KVDigestCount, KVRefValueHashCount) collapse the same way. The variants in question aren't produced by ProvableSumTree chunk generation (chunks emit KVSum / KVValueHashFeatureType), and the only aggregate_data() caller during restore (chunk_tree root) is a KVValueHashFeatureType. Pre-existing parallel-to-count pattern, no observable bug from this PR.
  • ci: Test Pipeline #6 (proof_node_type fallback drops sum-awareness for raw Merk): same fallback drops count-awareness for raw Merk too (pre-existing). The fallback documents itself as "Default to tamper-resistant for raw Merk" — fixing only the sum side would diverge from count. Out of scope to expand to both.
  • feat: insert if not exists #11 (overflow test no-op early return): the test author already acknowledges the apply-side may or may not catch overflow with the comment "this scenario is additionally exercised at the verify layer via fabricated proofs"; the verify-path test handles the gate. Strengthening the apply-path assertion is low value.
  • feat: transactions #13 (appendix-a.md TreeType discriminant): the "11" is the TreeType enum's own discriminant (see merk/src/tree_type/mod.rs:72), distinct from the Element discriminant. CodeRabbit's claim that it should be "17" is wrong — confirmed correct as-is.
  • typo on subtrees #14, feat: multi path + multi key proof construction #15 (deduplication / cost_return_on_error! reshape of pre-existing code): low value churn touching code outside this PR's strict scope.

Test totals: 2959 passing (was 2956). 3 new tests added across the fixes above. No regressions.

Per project memory rule: identifying as Claude on this comment. The branch claude/distracted-margulis-99463b only; no force pushes, no changes to develop.

…ayer

Security finding (Codex): the `verify_aggregate_sum_query` and
`verify_aggregate_count_query` chain walkers only checked
`element.is_any_tree()` for path elements. At the terminal (leaf) layer
this is insufficient — if the honest tree at the queried path happens
to be an EMPTY Merk-backed tree of any type (NormalTree, SumTree,
BigSumTree, CountTree, CountSumTree, ProvableCountTree,
ProvableCountSumTree, ProvableSumTree), its stored
`value_hash = combine_hash(H(element_bytes), NULL_HASH)`. The merk
verifier accepts empty proof bytes as `(NULL_HASH, 0)`, so an attacker
can construct a forged proof with:

  - layer 0: honest single-key proof of the leaf path key in its parent
  - layer 1: empty bytes (forged)

and the chain check passes uniformly. The verifier returns `sum = 0`
(or `count = 0`) against the trusted root hash, even though the leaf
isn't a Provable{Sum,Count}Tree.

The numeric answer is correct (an empty tree has sum 0 / count 0), so
this isn't a value forgery — but it IS a type-confusion soundness gap:
a caller that infers "leaf is a ProvableSumTree" from "the aggregate
verifier accepted" is deceived. The prover-side gate in
`Merk::prove_aggregate_{sum,count}_on_range` already rejects
non-provable inputs, but the verifier didn't mirror that invariant.

THE FIX

In `enforce_lower_chain`, add an `is_terminal: bool` parameter. At
intermediate depths nothing changes (`is_any_tree()` still suffices —
the GroveDB grove can route through any tree type on the way down). At
the terminal depth — passed `is_terminal = true` when
`depth + 1 == path_keys.len()` — the verifier now requires:

  - aggregate-sum:   `matches!(element, Element::ProvableSumTree(..))`
  - aggregate-count: `matches!(element, Element::ProvableCountTree(..) |
                                        Element::ProvableCountSumTree(..))`

Wrapper variants (NonCounted, NotSummed) are stripped via the existing
`into_underlying()` so they continue to work transparently.

TESTS

Three new regression tests that surgically construct the forgery from a
real honest single-key envelope and confirm the verifier now rejects:

  - `empty_leaf_type_confusion_forgery_rejected` (sum side, empty
    NormalTree at leaf)
  - `empty_provable_count_tree_at_leaf_rejected_for_sum` (sum side,
    empty ProvableCountTree at leaf — confirms type-specificity)
  - `empty_leaf_type_confusion_forgery_rejected` (count side, empty
    NormalTree at leaf)

The path == 0 case is unaffected: the merk-level hash divergence
between `node_hash` and `node_hash_with_sum` / `node_hash_with_count`
makes it computationally infeasible to forge a proof that matches the
trusted root, so the path-elements check is unnecessary at the root.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member Author

This is Claude. Pushed da53ef07 addressing the Codex finding about the aggregate-sum verifier's loose tree-type check at the terminal layer.

Verified the finding is real. With the leaf at an empty Merk-backed tree of any type (not just empty NormalTree — also empty SumTree, CountTree, ProvableCountTree, etc.), every such tree's stored value_hash = combine_hash(H(bytes), NULL_HASH) matches the verifier's recomputation when paired with a forged empty leaf proof. The bypass returns sum = 0 (the correct answer for an empty tree of any type), but the implicit claim "the leaf is a ProvableSumTree" is deceptive — a soundness gap, not a numeric forgery.

Same bug existed on the count side. Pre-existing this PR; fixed in the same commit.

Fix. enforce_lower_chain now takes is_terminal: bool. At intermediate depths the existing is_any_tree() check still applies (intermediate path elements may be any tree type). At the terminal depth the verifier requires:

  • aggregate-sum: matches!(element, Element::ProvableSumTree(..))
  • aggregate-count: matches!(element, Element::ProvableCountTree(..) | Element::ProvableCountSumTree(..))

Wrapper variants (NonCounted, NotSummed) continue to work transparently via the existing into_underlying() strip.

Path == [] case. Not vulnerable. At depth 0 the chain check is bypassed (no parent layer), and the merk-level hash divergence between node_hash and node_hash_with_sum/node_hash_with_count makes it computationally infeasible to forge a proof against the trusted root.

Regression tests (3 new, all use V0 envelope via GROVE_V2 for surgical proof reconstruction):

  • empty_leaf_type_confusion_forgery_rejected (sum, NormalTree leaf)
  • empty_provable_count_tree_at_leaf_rejected_for_sum (sum, ProvableCountTree leaf — confirms type-specificity)
  • empty_leaf_type_confusion_forgery_rejected (count, NormalTree leaf)

Each test honest-probes the leaf path with a single-key query to harvest valid layer-0 and layer-1 merk proof bytes, builds a forged GroveDBProofV0 with an empty leaf, encodes, and verifies the new gate rejects it. Without the fix all three would accept the forgery.

Workspace: cargo test --workspace --all-features → 2962 pass / 0 fail (was 2959 / 0).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
grovedb/src/operations/proof/aggregate_count.rs (1)

120-123: ⚠️ Potential issue | 🔴 Critical

Add empty-path validation for AggregateCountOnRange queries.

The concern is valid: when path_query.path is empty, path_keys is also empty, causing depth == path_keys.len() to be true immediately at entry (depth=0). This bypasses the terminal-type gate in enforce_lower_chain (lines 155-170), which validates that non-leaf descendants are ProvableCountTree or ProvableCountSumTree.

While the validation chain (PathQuerySizedQueryQuery::validate_aggregate_count_on_range) currently rejects malformed inner ranges (Key, RangeFull, nested aggregates), no explicit empty-path check was found in the grovedb-level validation. The code also lacks test coverage for empty-path aggregate count queries.

Two solutions:

  1. Reject empty paths explicitly in PathQuery::validate_aggregate_count_on_range() or the underlying Query validation.
  2. Document a caller-side guarantee that aggregate-count queries are never constructed with empty paths (root-layer queries).

Same applies to lines 194-196 in verify_v1_layer.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/proof/aggregate_count.rs` around lines 120 - 123, The
AggregateCountOnRange flow currently allows empty paths to reach the leaf branch
immediately (depth == path_keys.len()) and bypass enforce_lower_chain
validation; update validation to reject empty paths by adding an explicit check
in PathQuery::validate_aggregate_count_on_range (or the underlying Query
validation) to return an error when path.is_empty(), and mirror the same check
for the v1 verifier branch (verify_v1_layer) to prevent empty-path
AggregateCountOnRange queries from being processed; reference verify_count_leaf
and enforce_lower_chain to ensure the leaf/descendant gating remains correct
after adding this validation.
♻️ Duplicate comments (1)
grovedb/src/operations/proof/aggregate_sum.rs (1)

204-206: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use V1 proof semantics for LayerProof path steps.

verify_v1_layer still routes through verify_single_key_layer_proof_v0, and that helper still hardcodes execute_proof(..., 0). That means every non-leaf step in a V1 aggregate-sum proof is validated under V0 compatibility rules, so this verifier can still accept path-layer proof shapes that the rest of the V1 verifier rejects. Please thread the proof version through here and use PROOF_VERSION_LATEST on the V1 path.

Based on learnings: PROOF_VERSION_LATEST is a workspace-scoped constant for GroveDB V1 proof verification paths and should be used there rather than treating V1 as V0.

Also applies to: 262-277

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/proof/aggregate_sum.rs` around lines 204 - 206, The V1
layer verification is incorrectly delegating to verify_single_key_layer_proof_v0
which calls execute_proof with version 0; update the V1 path so verify_v1_layer
(and the call sites around the shown snippet and the similar block at lines
~262-277) thread a proof version through to the helper and call the appropriate
helper or overload that uses PROOF_VERSION_LATEST instead of hardcoding
0—specifically, change the call site that currently invokes
verify_single_key_layer_proof_v0(merk_bytes, &next_key, path_query) to use a
V1-aware verifier (or a new verify_single_key_layer_proof that accepts a
proof_version param) and ensure execute_proof is invoked with
PROOF_VERSION_LATEST for V1 path steps.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@grovedb/src/operations/proof/aggregate_sum.rs`:
- Around line 79-109: The code currently allows aggregate-sum queries with an
empty path which skips the terminal type check; fix by either rejecting
empty-path aggregate-sum queries early in validation (in
path_query.validate_aggregate_sum_on_range or immediately after) or by adding a
root-path type gate before calling verify_v0_layer/verify_v1_layer (call
enforce_lower_chain with is_terminal=true or perform the same ProvableSumTree
check when path_query.path.is_empty()), and add a regression test that submits
an empty-path aggregate-sum proof against a non-ProvableSumTree root to ensure
the verifier fails.

---

Outside diff comments:
In `@grovedb/src/operations/proof/aggregate_count.rs`:
- Around line 120-123: The AggregateCountOnRange flow currently allows empty
paths to reach the leaf branch immediately (depth == path_keys.len()) and bypass
enforce_lower_chain validation; update validation to reject empty paths by
adding an explicit check in PathQuery::validate_aggregate_count_on_range (or the
underlying Query validation) to return an error when path.is_empty(), and mirror
the same check for the v1 verifier branch (verify_v1_layer) to prevent
empty-path AggregateCountOnRange queries from being processed; reference
verify_count_leaf and enforce_lower_chain to ensure the leaf/descendant gating
remains correct after adding this validation.

---

Duplicate comments:
In `@grovedb/src/operations/proof/aggregate_sum.rs`:
- Around line 204-206: The V1 layer verification is incorrectly delegating to
verify_single_key_layer_proof_v0 which calls execute_proof with version 0;
update the V1 path so verify_v1_layer (and the call sites around the shown
snippet and the similar block at lines ~262-277) thread a proof version through
to the helper and call the appropriate helper or overload that uses
PROOF_VERSION_LATEST instead of hardcoding 0—specifically, change the call site
that currently invokes verify_single_key_layer_proof_v0(merk_bytes, &next_key,
path_query) to use a V1-aware verifier (or a new verify_single_key_layer_proof
that accepts a proof_version param) and ensure execute_proof is invoked with
PROOF_VERSION_LATEST for V1 path steps.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ed5bfdc7-5371-4238-872a-7e002c2093b6

📥 Commits

Reviewing files that changed from the base of the PR and between dfa2046 and da53ef0.

📒 Files selected for processing (14)
  • grovedb-query/src/query.rs
  • grovedb/src/lib.rs
  • grovedb/src/operations/proof/aggregate_count.rs
  • grovedb/src/operations/proof/aggregate_sum.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/operations/proof/verify.rs
  • grovedb/src/tests/aggregate_count_query_tests.rs
  • grovedb/src/tests/aggregate_sum_query_tests.rs
  • grovedb/src/tests/provable_sum_tree_tests.rs
  • grovedb/src/tests/trunk_proof_tests.rs
  • merk/src/element/tree_type.rs
  • merk/src/proofs/query/aggregate_sum.rs
  • merk/src/tree/link.rs
  • merk/src/tree_type/mod.rs
🚧 Files skipped from review as they are similar to previous changes (8)
  • merk/src/tree/link.rs
  • grovedb-query/src/query.rs
  • merk/src/element/tree_type.rs
  • grovedb/src/tests/provable_sum_tree_tests.rs
  • merk/src/proofs/query/aggregate_sum.rs
  • grovedb/src/lib.rs
  • merk/src/tree_type/mod.rs
  • grovedb/src/operations/proof/generate.rs

Comment thread grovedb/src/operations/proof/aggregate_sum.rs Outdated
Codex follow-up + CodeRabbit: the previous fix added a terminal-type
gate in `enforce_lower_chain`, but `verify_v0_layer` and
`verify_v1_layer` short-circuit to the leaf verifier when
`depth == path_keys.len()`. With an empty path (`path == []`) that's
true at depth 0, so the type gate is never invoked.

In practice the empty-path case is already protected by hash divergence:
the GroveDB root merk is always a `NormalTree` (built with
`Element::empty_tree()` by API), so its root_hash uses `node_hash`. An
attacker's forged proof of `HashWithSum` / `HashWithCount` ops would
reconstruct via `node_hash_with_sum` / `node_hash_with_count` — distinct
hash functions, no collision. So the caller's root-hash compare catches
the forgery cryptographically.

But the defense-in-depth principle says: don't rely on the cryptographic
divergence implicitly. Reject up-front, before any proof handling.
PathQuery::validate_aggregate_{sum,count}_on_range now check
`self.path.is_empty()` and return a clear InvalidQuery error naming why
(root is always NormalTree, no valid Provable* target at root).

The check fires at the entry of `verify_aggregate_{sum,count}_query`
(which call `validate_*` first thing) and at `prove_query` (the
generator also validates the path query before dispatch).

TESTS

  - `empty_path_aggregate_sum_rejected_at_validation`
  - `empty_path_aggregate_count_rejected_at_validation`

Both pin the rejection at both the PathQuery validator and the verify
entrypoint. 2964 workspace tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member Author

This is Claude. Pushed cc1828dd addressing CodeRabbit's empty-path follow-up on both aggregate verifiers.

Verification of the finding. Real concern. verify_v0_layer / verify_v1_layer short-circuit to the leaf when depth == path_keys.len(), which is true at depth 0 when path == []. The new terminal-type gate in enforce_lower_chain is never invoked in that case.

In practice the empty-path case is already protected by hash divergence: the GroveDB root merk is always a NormalTree (API guarantees this), so its root_hash = node_hash(...). A forged proof of HashWithSum / HashWithCount ops would reconstruct via node_hash_with_sum / node_hash_with_count — distinct hash functions, no collision. So the caller's trusted-root compare catches the forgery cryptographically. But defense-in-depth: don't rely on hash divergence implicitly. Reject up-front.

Fix. PathQuery::validate_aggregate_{sum,count}_on_range now check self.path.is_empty() and return a clear InvalidQuery error naming why (root is always NormalTree, no valid Provable* target at root). The check fires at the entry of verify_aggregate_{sum,count}_query (validators are the first call) and at proof generation (the prover also validates the path query before dispatch).

Tests (2 new, pin rejection at both the PathQuery validator and the verify_*_query entrypoint):

  • empty_path_aggregate_sum_rejected_at_validation
  • empty_path_aggregate_count_rejected_at_validation

Workspace: 2964 pass / 0 fail (was 2962).

On the V1-hardcoded-V0 duplicate. Still skipping in this PR for the reason given in the prior thread — verify_single_key_layer_proof_v0 is shared with the count side, and the same hardcoded execute_proof(..., 0) exists in the pre-existing aggregate_count.rs (introduced in PR #656). Fixing only the sum copy diverges them; fixing both is appropriately a follow-up PR scoped to "PROOF_VERSION_LATEST plumbing for layer-walking helpers" rather than landed inside this feature PR.

QuantumExplorer and others added 8 commits May 12, 2026 03:53
Mirrors PR #662's `query_aggregate_count` for the signed-sum side.
Callers that need a sum value but not a proof (e.g. server handlers
answering `prove=false` sum requests) can now bypass proof
construction, serialization, and verification entirely.

The merk-level walk is `O(log n + |boundary|)` in the number of
distinct keys, identical complexity to the prover but without the
proof-op allocations or hash recomputations. The signed-sum
arithmetic carries the same `i128` accumulator the prover and
verifier use (so adversarial intermediate sums never wrap), and
narrows to `i64` at the public entry point. An out-of-i64 result is
classified as `Error::CorruptedData` since a real `ProvableSumTree`
maintains every aggregate as `i64` at every level.

NEW APIS

- `Merk::sum_aggregate_on_range(&inner_range, grove_version)
   -> CostResult<i64, Error>` in `merk/src/merk/get.rs`. Checks
  `tree_type == ProvableSumTree`; rejects any other tree type with
  `Error::InvalidProofError`. Returns 0 for an empty merk.
- `RefWalker::sum_aggregate_on_range(&inner_range, grove_version)`
  in `merk/src/proofs/query/aggregate_sum.rs`. Walks the same
  Contained / Disjoint / Boundary classification path as
  `create_aggregate_sum_on_range_proof`, but emits no proof ops.
- `GroveDb::query_aggregate_sum(path_query, transaction, grove_version)
   -> CostResult<i64, Error>` in `grovedb/src/operations/get/query.rs`.
  Validates the PathQuery up-front via
  `validate_aggregate_sum_on_range` (same gate the prover and
  verifier use — catches malformed ASOR queries plus the
  empty-path rejection from the prior commit before any storage
  reads), opens the leaf merk at `path_query.path`, and delegates
  to the merk-level walk.
- New `query_aggregate_sum_on_range` field on
  `GroveDBOperationsQueryVersions`, wired through v1/v2/v3 at
  version `0`.

NotSummed-correctness is preserved via the same
`own_sum = node_sum - left_struct - right_struct` derivation the
prover uses. NotSummed-wrapped subtrees have stored aggregate 0, so
the subtraction yields 0 at the wrapper boundary - they do not
contribute to the in-range total.

The returned sum is **not** independently verifiable: callers are
trusting their own merk read path. For a verifiable sum, continue
using `prove_query` + `verify_aggregate_sum_query`. Documented
explicitly on both entry points.

TESTS

- 10 new merk-level cross-checks
  (`merk/src/proofs/query/aggregate_sum.rs::tests`): each range
  variant against `prove_aggregate_sum_on_range`'s computed sum,
  plus empty-merk-returns-0, NormalTree rejection, ProvableCountTree
  rejection (precise tree-type match, not "any provable aggregate
  tree"), and a mixed-positive/negative scenario that exercises the
  signed `own_sum` subtraction.
- 11 new GroveDB-level cross-checks
  (`grovedb/src/tests/aggregate_sum_query_tests.rs::tests`): every
  range shape on a populated `ProvableSumTree`, empty subtree
  returns 0, negative-sum scenario, invalid-inner-range
  (`Key`) rejected with `InvalidQuery`, empty-path rejected with
  `InvalidQuery`, NormalTree leaf rejected with `MerkError` from
  the merk-level gate.

Workspace `cargo test --all-features`: 2985 passing / 0 failing
(was 2964 / 0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…margulis-99463b

# Conflicts:
#	grovedb-version/src/version/grovedb_versions.rs
#	grovedb-version/src/version/v1.rs
#	grovedb-version/src/version/v2.rs
#	grovedb-version/src/version/v3.rs
#	grovedb/src/tests/aggregate_count_query_tests.rs
#	merk/src/merk/get.rs
…OnRange

Adds parallel coverage for the variant-11 dispatch paths in encode/
decode_with_depth/borrow_decode_with_depth/Display/serde, plus helper
accessors (lower_bound, upper_bound, is_aggregate_sum_on_range,
aggregate_sum_inner, ...). Mirrors the existing AggregateCountOnRange
test set so each match arm has direct exercise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds direct round-trip tests of get_flags_owned / get_flags_mut /
set_flags on every aggregate-bearing variant (CountSumTree,
ProvableCountTree, ProvableCountSumTree, ProvableSumTree,
ItemWithSumItem, CommitmentTree, MmrTree, BulkAppendTree,
DenseAppendOnlyFixedSizeTree, BigSumTree, CountTree) plus
NotSummed / NonCounted delegation arms. Each test pins one match arm.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Direct unit tests for the previously untestable internal helper.
Each aggregate-bearing tree variant now has both a matching (returns
None) and mismatching (returns Some) case, plus tests for the
empty-merk identity arms (zero-recorded with NoAggregateData),
non-Merk data tree arms (always None), and the catch-all
variant/shape mismatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds 10 mutation-style tests for the GroveDB-side aggregate-sum
verifier (verify_v0_layer / verify_v1_layer / verify_sum_leaf /
verify_single_key_layer_proof_v0 / enforce_lower_chain). Each test
pins one previously-uncovered error arm:

  - V1 unexpected non-merk leaf bytes
  - V0 and V1 missing lower_layer for path key
  - Malformed leaf sum proof (Phase 1 rejection)
  - Corrupted non-leaf merk bytes (single-key proof failure)
  - Non-leaf proof without target key
  - KV replaced by KVDigest in non-leaf (no value bytes)
  - Undeserializable value bytes on the path
  - Intermediate-tree non-tree element rejection
  - Unparsable envelope bincode decode error

Plus mirrors of the count-side AggregateSumOnRange rejection tests
in proof/generate.rs for dense/MMR/BulkAppend index helpers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Direct unit tests for previously-uncovered internal helpers in
merk/src/proofs/query/aggregate_sum.rs:

  - provable_sum_from_aggregate Err arm for every non-ProvableSum
    AggregateData variant (CorruptedData classification check)
  - provable_sum_from_aggregate happy path including i64::MIN/MAX
  - is_provable_sum_bearing false for every non-ProvableSumTree
    TreeType variant
  - classify_subtree additional disjoint-above / contained-within /
    boundary-overlapping-upper cases
  - key_strictly_inside unbounded endpoint and equality cases
  - empty ProvableSumTree prove+verify round-trip

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds parallel-variant rejection tests in the BulkAppendTree and
dense fixed-size Merkle tree proof modules. Both tree types have
no count or sum commitment in their node hash, so their
index-resolution helpers reject AggregateCountOnRange and
AggregateSumOnRange query items outright. This exercises the
previously-uncovered rejection arms in both proof modules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small targeted coverage additions:

  - merk/src/element/tree_type.rs: direct per-variant tests for the
    ProvableSumTree / CommitmentTree / BulkAppendTree /
    DenseAppendOnlyFixedSizeTree / MmrTree arms of
    root_key_and_tree_type, tree_flags_and_type, tree_type,
    maybe_tree_type, and tree_feature_type, plus a
    ProvableSumTree-through-NotSummed delegation test.

  - grovedb/src/tests/aggregate_sum_query_tests.rs: tests that
    drive node_to_string's KVSum / KVHashSum / KVDigestSum /
    HashWithSum / KVRefValueHashSum Display arms in
    grovedb/src/operations/proof/mod.rs by formatting real
    ProvableSumTree proofs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (4)
grovedb-query/src/query_item/mod.rs (1)

320-322: 💤 Low value

Stale inline comment in NonAggregateInner::deserialize.

The comment at lines 320-322 only mentions excluding "AggregateCountOnRange", but the Field enum and NON_AGGREGATE_VARIANTS (line 378-389) correctly exclude both aggregate variants. The struct-level doc at lines 297-310 was updated to reference both; this inline comment is the lone leftover.

📝 Suggested wording tweak
-        // Field set excludes "AggregateCountOnRange"; encountering that tag
-        // produces a serde "unknown variant" error before any inner
-        // recursion can happen.
+        // Field set excludes both `AggregateCountOnRange` and
+        // `AggregateSumOnRange`; encountering either tag produces a serde
+        // "unknown variant" error before any inner recursion can happen.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb-query/src/query_item/mod.rs` around lines 320 - 322, Update the stale
inline comment inside NonAggregateInner::deserialize to reflect that the Field
set excludes both aggregate variants (e.g., "AggregateCountOnRange" and
"AggregateSomethingElse") rather than only "AggregateCountOnRange"; reference
the Field enum and NON_AGGREGATE_VARIANTS to confirm which aggregate variants
are excluded and rewrite the comment to mention both aggregate variants so it
matches the struct-level doc and the actual NON_AGGREGATE_VARIANTS definition.
grovedb/src/tests/aggregate_sum_query_tests.rs (2)

1038-1041: 💤 Low value

Redundant disjunction in assertion.

msg.contains("must be a ProvableSumTree") is implied by msg.contains("ProvableSumTree"). Either drop the second branch to pin the exact phrase, or drop the first since it's already covered.

Proposed simplification
-                assert!(
-                    msg.contains("must be a ProvableSumTree") || msg.contains("ProvableSumTree"),
-                    "verifier rejected as expected but with an unrelated message: {msg}"
-                );
+                assert!(
+                    msg.contains("must be a ProvableSumTree"),
+                    "verifier rejected as expected but with an unrelated message: {msg}"
+                );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 1038 - 1041, The
assertion in aggregate_sum_query_tests.rs redundantly checks msg.contains("must
be a ProvableSumTree") || msg.contains("ProvableSumTree"); update the assert to
use a single, precise check on the error string (either msg.contains("must be a
ProvableSumTree") to pin the exact phrase, or msg.contains("ProvableSumTree") if
you want a looser match) and adjust the failure message accordingly; locate the
assert that references the local variable msg and replace the disjunction with
the chosen single contains() call.

318-385: 💤 Low value

Overflow test can silently no-op if insertion is permissive.

If both i64::MAX inserts succeed (which the comment on Line 339-341 acknowledges is possible depending on Phase 1's aggregation rules), but the prover also doesn't detect the overflow and produces a proof, then the test exercises the verifier's narrowing check correctly. However, if both inserts succeed AND the prover never produces a proof that triggers overflow (e.g. Phase 1 silently saturates instead), the assertion on Line 379-382 only fires when prove_result is Ok. The test will pass even if a regression makes overflow undetectable, because the test bails out at Line 365-369. Consider asserting that either insert fails or verify rejects — never the silent-no-op branch — to keep this test load-bearing.

Proposed hardening
-        if !ok1 || !ok2 {
-            // Insertion already rejected the overflow — that's the
-            // healthiest end state. Bail out.
-            return;
-        }
+        let either_insert_rejected = !ok1 || !ok2;
         let pq = PathQuery::new_aggregate_sum_on_range(
             vec![TEST_LEAF.to_vec(), b"st".to_vec()],
             QueryItem::RangeInclusive(b"a".to_vec()..=b"b".to_vec()),
         );
-        let prove_result = db.grove_db.prove_query(&pq, None, v).unwrap();
-        match prove_result {
-            Err(_) => { /* prover detected overflow — fine */ }
-            Ok(proof) => {
-                let verify_result = GroveDb::verify_aggregate_sum_query(&proof, &pq, v);
-                assert!(
-                    verify_result.is_err(),
-                    "verifier must reject a sum that doesn't fit in i64"
-                );
-            }
-        }
+        if either_insert_rejected {
+            return;
+        }
+        let prove_result = db.grove_db.prove_query(&pq, None, v).unwrap();
+        match prove_result {
+            Err(_) => { /* prover detected overflow — fine */ }
+            Ok(proof) => {
+                let verify_result = GroveDb::verify_aggregate_sum_query(&proof, &pq, v);
+                assert!(
+                    verify_result.is_err(),
+                    "verifier must reject a sum that doesn't fit in i64"
+                );
+            }
+        }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 318 - 385, Test
currently early-returns when both inserts succeed, allowing a silent-no-op
regression to pass; change provable_sum_tree_overflow_at_i64_max_is_rejected so
it asserts that at least one of these holds: insertion rejected ( !ok1 || !ok2
), prover rejected (prove_result is Err), or verifier rejected
(GroveDb::verify_aggregate_sum_query returns Err). Keep using
PathQuery::new_aggregate_sum_on_range and the existing
prove_result/verify_result flow but replace the early return with a final
assertion that (!ok1 || !ok2) || prove_result.is_err() ||
verify_result.is_err(), so the test fails if all steps silently succeed.
grovedb/src/operations/get/query.rs (1)

662-667: 💤 Low value

Consider wrapping MerkError with contextual CorruptedData per repo convention.

The PR description notes that callers of prove_aggregate_sum_on_range were updated to wrap the MerkError with contextual CorruptedData messages "per repo convention." The no-proof entry point here uses the same pattern as the existing query_aggregate_count (.map_err(Error::MerkError) on line 821), so the in-file convention is consistent — but a callsite-specific wrap (e.g. Error::CorruptedData(format!("query_aggregate_sum at path {:?}: {}", path_slices, e))) would give operators better triage signal when a non-ProvableSumTree resolves at this path or storage is corrupted. As per coding guidelines: "Wrap errors with context using .map_err(|e| Error::CorruptedData(format!(\"context: {}\", e))) pattern in Rust source files".

Happy to also apply the same to query_aggregate_count (line 817-822) in a follow-up if you'd like to keep the two mirrored.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/operations/get/query.rs` around lines 662 - 667, Replace the bare
.map_err(Error::MerkError) on the subtree.sum_aggregate_on_range call with a
contextual wrapper that maps the MerkError into Error::CorruptedData including
identifying context (e.g., the path_slices and that this occurred in
query_aggregate_sum / sum aggregate resolution); specifically change the error
mapping for subtree.sum_aggregate_on_range(...) to .map_err(|e|
Error::CorruptedData(format!("query_aggregate_sum at path {:?}: {}",
path_slices, e))) so callers can triage corrupted or non-ProvableSumTree cases
more easily while keeping the rest of the cost_return_on_error(...) call intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@grovedb-query/src/query_item/mod.rs`:
- Around line 320-322: Update the stale inline comment inside
NonAggregateInner::deserialize to reflect that the Field set excludes both
aggregate variants (e.g., "AggregateCountOnRange" and "AggregateSomethingElse")
rather than only "AggregateCountOnRange"; reference the Field enum and
NON_AGGREGATE_VARIANTS to confirm which aggregate variants are excluded and
rewrite the comment to mention both aggregate variants so it matches the
struct-level doc and the actual NON_AGGREGATE_VARIANTS definition.

In `@grovedb/src/operations/get/query.rs`:
- Around line 662-667: Replace the bare .map_err(Error::MerkError) on the
subtree.sum_aggregate_on_range call with a contextual wrapper that maps the
MerkError into Error::CorruptedData including identifying context (e.g., the
path_slices and that this occurred in query_aggregate_sum / sum aggregate
resolution); specifically change the error mapping for
subtree.sum_aggregate_on_range(...) to .map_err(|e|
Error::CorruptedData(format!("query_aggregate_sum at path {:?}: {}",
path_slices, e))) so callers can triage corrupted or non-ProvableSumTree cases
more easily while keeping the rest of the cost_return_on_error(...) call intact.

In `@grovedb/src/tests/aggregate_sum_query_tests.rs`:
- Around line 1038-1041: The assertion in aggregate_sum_query_tests.rs
redundantly checks msg.contains("must be a ProvableSumTree") ||
msg.contains("ProvableSumTree"); update the assert to use a single, precise
check on the error string (either msg.contains("must be a ProvableSumTree") to
pin the exact phrase, or msg.contains("ProvableSumTree") if you want a looser
match) and adjust the failure message accordingly; locate the assert that
references the local variable msg and replace the disjunction with the chosen
single contains() call.
- Around line 318-385: Test currently early-returns when both inserts succeed,
allowing a silent-no-op regression to pass; change
provable_sum_tree_overflow_at_i64_max_is_rejected so it asserts that at least
one of these holds: insertion rejected ( !ok1 || !ok2 ), prover rejected
(prove_result is Err), or verifier rejected (GroveDb::verify_aggregate_sum_query
returns Err). Keep using PathQuery::new_aggregate_sum_on_range and the existing
prove_result/verify_result flow but replace the early return with a final
assertion that (!ok1 || !ok2) || prove_result.is_err() ||
verify_result.is_err(), so the test fails if all steps silently succeed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1b061371-8b2d-4352-96a7-ee7c9c94e759

📥 Commits

Reviewing files that changed from the base of the PR and between da53ef0 and d1b4651.

📒 Files selected for processing (14)
  • grovedb-element/src/element/helpers.rs
  • grovedb-query/src/query_item/mod.rs
  • grovedb-version/src/version/grovedb_versions.rs
  • grovedb-version/src/version/v1.rs
  • grovedb-version/src/version/v2.rs
  • grovedb-version/src/version/v3.rs
  • grovedb/src/lib.rs
  • grovedb/src/operations/get/query.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/query/mod.rs
  • grovedb/src/tests/aggregate_count_query_tests.rs
  • grovedb/src/tests/aggregate_sum_query_tests.rs
  • merk/src/merk/get.rs
  • merk/src/proofs/query/aggregate_sum.rs
🚧 Files skipped from review as they are similar to previous changes (5)
  • grovedb/src/lib.rs
  • grovedb-element/src/element/helpers.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/query/mod.rs
  • merk/src/proofs/query/aggregate_sum.rs

Four low-value but clean tweaks from CodeRabbit on PR #661:

- `grovedb-query/src/query_item/mod.rs`: refresh the stale
  `NonAggregateInner::deserialize` inline comment to mention both
  excluded aggregate variants (Count + Sum), matching the struct-level
  doc and `NON_AGGREGATE_VARIANTS`.
- `grovedb/src/tests/aggregate_sum_query_tests.rs`: drop the redundant
  disjunction `msg.contains("must be a ProvableSumTree") ||
  msg.contains("ProvableSumTree")` — the first clause already implies
  the second; pin the exact phrase.
- `grovedb/src/tests/aggregate_sum_query_tests.rs`: harden
  `provable_sum_tree_overflow_at_i64_max_is_rejected` so it no longer
  silently passes when insert AND prove AND verify all accept an
  overflow. Replace the early-return-on-both-inserts-accepted with an
  explicit "at least one stage must reject" assertion.
- `grovedb/src/operations/get/query.rs`: wrap the MerkError from
  `Merk::sum_aggregate_on_range` (and the count sibling) with
  contextual `CorruptedData(format!("query_aggregate_{sum,count} at
  path {:?}: {}", path_slices, e))` per the repo
  error-wrapping convention. Two test assertions updated from
  `MerkError(_)` to `CorruptedData(_)` to match.

Workspace `cargo test --all-features`: 3102 pass / 0 fail (unchanged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member Author

This is Claude. Pushed b1027dcb addressing all 4 CodeRabbit nitpicks:

  • Stale comment in NonAggregateInner::deserialize: now names both aggregate variants.
  • Redundant disjunction in the empty-leaf forgery test: dropped the looser msg.contains("ProvableSumTree") clause; pinned the exact phrase.
  • Silent-no-op overflow test: replaced the early-return-on-both-inserts-accepted with an explicit "at least one of {insert, prove, verify} must reject" assertion. The test now fails if all three silently accept the i64::MAX + i64::MAX overflow.
  • MerkError context wrap: query_aggregate_sum (and the count sibling, as you offered) now wrap the merk-level error with Error::CorruptedData(format!("query_aggregate_{sum,count} at path {:?}: {}", path_slices, e)) per the repo's error-handling convention. Two test assertions updated from MerkError(_) to CorruptedData(_) to match the new contract.

Workspace cargo test --all-features: 3102 pass / 0 fail (unchanged).

CodeRabbit flagged the TreeType column for ProvableSumTree as "should be
17" — the technical claim was wrong (11 is the correct TreeType
discriminant per `merk/src/tree_type/mod.rs:77`), but the confusion was
fair: the column header "TreeType" was ambiguous and the table had
several pre-existing inaccuracies in adjacent rows. This commit fixes
the ambiguity AND the bugs.

Changes:
- Rename column header from "TreeType" to "TreeType disc" and add an
  intro paragraph explaining that "Element disc" and "TreeType disc"
  are discriminants of two SEPARATE enums.
- Add the TreeType-variant label to every tree row for consistency
  (some had it, most didn't). The new format is `N (VariantName)` —
  e.g. `5 (ProvableCountTree)` — which CodeRabbit-style auto-review
  can't misread.
- Fix three pre-existing wrong TreeType disc values:
    `BigSumTree`:        4 -> 2
    `CountTree`:         2 -> 3
    `CountSumTree`:      3 -> 4
  (These were drift from the actual `TreeType::discriminant()`
  implementation; the file had `4 (BigSumTree)` etc. but those labels
  were wrong.)
- Swap the row order at Element discriminants 8 and 9 to match the
  actual `Element` enum order:
    8 = `ProvableCountTree`  (was incorrectly listed as `ItemWithSumItem`)
    9 = `ItemWithSumItem`    (was incorrectly listed as `ProvableCountTree`)
- Tighten the `ProvableCountSumTree` Purpose blurb to note "only count
  in hash" since the sum is tracked metadata, not bound — this is the
  half-step variant a future `ProvableCountAndSumTree` would replace.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@QuantumExplorer
Copy link
Copy Markdown
Member Author

This is Claude. Going back through the CodeRabbit review thread to make sure nothing was silently dismissed.

Just-pushed 86b527be addressing the appendix-a TreeType column. CodeRabbit's specific claim ("TreeType should be 17") was wrong (11 is the correct TreeType enum discriminant per merk/src/tree_type/mod.rs:77), BUT the confusion was fair — the column header was ambiguous and I noticed the table had several pre-existing wrong values in adjacent rows. Fixed all of it:

  • Column header renamed TreeTypeTreeType disc with an explanatory intro paragraph.
  • Added N (VariantName) labels to every tree row for unambiguous reading.
  • Corrected three pre-existing wrong TreeType discriminants (BigSumTree, CountTree, CountSumTree).
  • Swapped rows 8 and 9 — Element disc 8 is ProvableCountTree, 9 is ItemWithSumItem (the original table had them backwards).
  • Tightened the ProvableCountSumTree Purpose to "only count in hash" since sum is unbound metadata in that variant.

Re: the two remaining Major-tagged findings I'm still skipping:

  1. PascalCase vs snake_case serde for AggregateSumOnRange — verified via JSON round-trip that AggregateCountOnRange has the exact same shape and round-trips correctly today; the field_identifier-only rename_all applies to keys, not variant names. Fixing only the new variant would diverge it from the count side, and the count side ships in production. Out of scope.

  2. V0 hardcode in verify_single_key_layer_proof_v0 called from verify_v1_layer — same helper exists in aggregate_count.rs from PR feat: add AggregateCountOnRange query for provable count trees #656 with the same hardcoded execute_proof(..., 0) for V1 layer steps. Fixing only the sum copy diverges the two. Appropriate as a follow-up PR scoped to "PROOF_VERSION_LATEST plumbing for aggregate-query layer-walking helpers" that touches both sides.

Both are documented for the cidx rebase / future PR. Everything else from the review has been addressed.

QuantumExplorer and others added 2 commits May 15, 2026 07:04
…margulis-99463b

# Conflicts:
#	grovedb-query/src/query.rs
#	grovedb/src/operations/proof/aggregate_count.rs
#	grovedb/src/operations/proof/generate.rs
#	grovedb/src/query/mod.rs
Codex security finding: the regular query verifier's lower- and
upper-bound `last_push` match arms in
`merk/src/proofs/query/verify.rs::execute_proof` (lines 206 / 241)
accept the Count-family boundary node variants (`KVCount`,
`KVDigestCount`, `KVRefValueHashCount`) but omit the parallel Sum
family (`KVSum`, `KVDigestSum`, `KVRefValueHashSum`). Regular proofs
against a `ProvableSumTree` can legitimately emit `KVDigestSum` as the
absence-boundary node for a queried key, so a multi-item query like
`Key("aa")` followed by `Range("g".."j")` would reject the perfectly
valid proof with `Cannot verify lower bound of queried range` whenever
the preceding boundary happened to be sum-flavored.

The downstream absence check at line ~572 already handled all six
node types (Count + Sum), making the omission an asymmetry between
the two checks within the same function.

THE FIX

Add `KVSum`, `KVDigestSum`, `KVRefValueHashSum` to both the lower- and
upper-bound `last_push` match arms. While at it, also extend
`boundaries_in_proof` (line ~742) to surface `KVDigestSum` boundary
keys alongside `KVDigest` and `KVDigestCount` — same class of
omission, same trivial extension.

TESTS

New `provable_sum_tree_bound_regression_tests` module at the bottom of
`verify.rs` covering:

  - `key_plus_range_on_provable_sum_tree_left_to_right_verifies` — the
    exact `[Key("aa"), Range("g".."j")]` shape Codex flagged, in
    forward iteration. Without the fix this returns
    `InvalidProofError("Cannot verify lower bound of queried range")`.
  - `key_plus_range_on_provable_sum_tree_right_to_left_verifies` —
    same query with `left_to_right = false`. The bug is symmetric, so
    the regression coverage is too.
  - `kv_digest_sum_appears_in_boundaries_in_proof` — proves that
    `boundaries_in_proof` now surfaces `KVDigestSum`-flavor boundary
    keys produced by `ProvableSumTree` proofs.

Workspace `cargo test --all-features`: 3150 pass / 0 fail (was 3147 /
0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
merk/src/proofs/query/verify.rs (1)

884-901: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Keep key_exists_as_boundary_in_proof in sync with the new Sum-family boundary semantics.

boundaries_in_proof() now recognizes KVDigestSum, but key_exists_as_boundary_in_proof() still only matches KVDigest / KVDigestCount. That makes the two public helpers disagree on valid ProvableSumTree absence proofs.

Suggested fix
         match &op {
             Op::Push(Node::KVDigest(k, _))
             | Op::PushInverted(Node::KVDigest(k, _))
             | Op::Push(Node::KVDigestCount(k, _, _))
             | Op::PushInverted(Node::KVDigestCount(k, _, _))
+            | Op::Push(Node::KVDigestSum(k, _, _))
+            | Op::PushInverted(Node::KVDigestSum(k, _, _))
                 if k.as_slice() == key =>
             {
                 return Ok(true);
             }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/proofs/query/verify.rs` around lines 884 - 901, The helper
key_exists_as_boundary_in_proof is out of sync with boundaries_in_proof: update
key_exists_as_boundary_in_proof to treat KVDigestSum the same as KVDigest and
KVDigestCount so Sum-family boundary nodes are recognized; locate the match/case
in key_exists_as_boundary_in_proof that currently matches
Op::Push/Op::PushInverted with Node::KVDigest and Node::KVDigestCount and add
corresponding branches for Node::KVDigestSum (with the same key extraction and
equality check logic) so both helpers behave identically for ProvableSumTree
absence proofs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@merk/src/proofs/query/verify.rs`:
- Around line 884-901: The helper key_exists_as_boundary_in_proof is out of sync
with boundaries_in_proof: update key_exists_as_boundary_in_proof to treat
KVDigestSum the same as KVDigest and KVDigestCount so Sum-family boundary nodes
are recognized; locate the match/case in key_exists_as_boundary_in_proof that
currently matches Op::Push/Op::PushInverted with Node::KVDigest and
Node::KVDigestCount and add corresponding branches for Node::KVDigestSum (with
the same key extraction and equality check logic) so both helpers behave
identically for ProvableSumTree absence proofs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fa52ced9-e02c-4968-b7a3-8797a17c5a19

📥 Commits

Reviewing files that changed from the base of the PR and between d1b4651 and 5b3a046.

📒 Files selected for processing (16)
  • docs/book/src/appendix-a.md
  • grovedb-bulk-append-tree/src/proof/tests.rs
  • grovedb-dense-fixed-sized-merkle-tree/src/proof/tests.rs
  • grovedb-query/src/aggregate_count.rs
  • grovedb-query/src/query.rs
  • grovedb-query/src/query_item/mod.rs
  • grovedb/src/operations/get/query.rs
  • grovedb/src/operations/proof/aggregate_count/helpers.rs
  • grovedb/src/operations/proof/aggregate_count/leaf_chain.rs
  • grovedb/src/operations/proof/aggregate_count/per_key.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/query/mod.rs
  • grovedb/src/tests/aggregate_count_query_tests.rs
  • grovedb/src/tests/aggregate_sum_query_tests.rs
  • merk/src/element/tree_type.rs
  • merk/src/proofs/query/verify.rs
🚧 Files skipped from review as they are similar to previous changes (6)
  • grovedb/src/operations/get/query.rs
  • grovedb/src/tests/aggregate_count_query_tests.rs
  • grovedb/src/query/mod.rs
  • grovedb/src/tests/aggregate_sum_query_tests.rs
  • grovedb-query/src/query_item/mod.rs
  • grovedb/src/operations/proof/generate.rs

QuantumExplorer and others added 5 commits May 15, 2026 16:37
Develop landed a regression test
(`query_validation_error_to_static_str_projects_invalid_operation_and_catches_other_variants`,
commit 7a64938) that pins the catch-all fallback string returned by
`query_validation_error_to_static_str` to
`"AggregateCountOnRange query validation failed"`. This PR had
generalised the helper to serve both count and sum, returning
`"aggregate query validation failed"`, which broke the develop test
under GitHub's "merge into base" CI workflow.

Split the helper into two so each aggregate variant's error surface
stays self-describing:

  - `query_validation_error_to_static_str` — count side, restored to
    the `"AggregateCountOnRange query validation failed"` label so
    develop's regression test stays green.
  - `sum_query_validation_error_to_static_str` — new sum-side helper
    returning `"AggregateSumOnRange query validation failed"`. Used by
    `SizedQuery::validate_aggregate_sum_on_range`.

Both follow the same projection contract: `InvalidOperation(msg)`
passes the static string through unchanged; any other variant
(unreachable from real validators) gets the variant-specific
fallback.

No behavior change at the InvalidOperation happy path, which is all
real callers reach.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit symmetry finding on top of commit 5b3a046: that fix
extended `boundaries_in_proof` to recognize `KVDigestSum` boundary
nodes from `ProvableSumTree` proofs, but missed the parallel helper
`key_exists_as_boundary_in_proof`. The two public helpers are
documented to behave identically; without this both helpers
disagreed on valid `ProvableSumTree` absence proofs.

Add `Op::Push(Node::KVDigestSum(..))` and the PushInverted variant
to the match in `key_exists_as_boundary_in_proof`. Tighten the
doc-comment to spell out that the two helpers share node-type
coverage.

Extended the regression test
`kv_digest_sum_appears_in_boundaries_in_proof` (now renamed to
`kv_digest_sum_appears_in_both_boundary_helpers`) so every boundary
key surfaced by `boundaries_in_proof` is also reported by
`key_exists_as_boundary_in_proof`, pinning the symmetry.

Workspace `cargo test --all-features` for the affected module: 3 of
3 regression tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ee code

Remove "Phase 1 / Phase 2 / etc." prefixes that referred to the PR's
implementation timeline. Retains "Phase 1 / Phase 2" labels that describe
runtime decode-vs-walk algorithm steps of the aggregate-count/sum verifiers
(documented in docs/book/src/aggregate-count-queries.md). In the
test-fixture stack-builder (merk/src/proofs/tree.rs) and the
provable_sum_tree direct-insert test, renamed enumeration-style "Phase N"
labels to "Step N" for clarity. Also renamed the phase2_* test fn prefixes
in encoding.rs and tree.rs to drop the timeline label.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeRabbit re-flagged a pre-existing asymmetry in `QueryItem`'s manual
serde implementations: the `Serialize` impl emitted PascalCase variant
tags via `serialize_newtype_variant` (e.g. `"AggregateSumOnRange"`,
`"Range"`), but the `Deserialize` impl uses `Field` enums marked
`#[serde(field_identifier, rename_all = "snake_case")]`, expecting
snake_case (`"aggregate_sum_on_range"`, `"range"`).

The asymmetry was invisible to bincode (the format GroveDB actually
uses in proofs and storage) because bincode identifies variants by
index, not by the textual tag. But it broke round-trip through every
text-based format that carries variant names verbatim (JSON, YAML,
TOML). An in-code comment at the existing token-stream test site
even documented the issue as "a pre-existing mismatch ... that
breaks JSON round-trip but is invisible to formats that don't carry
variant names textually."

THE FIX

Change every `serialize_newtype_variant` (and the one
`serialize_unit_variant` for `RangeFull`) call in
`grovedb-query/src/query_item/mod.rs` to emit snake_case variant
tags. The variant indices stay the same so the bincode wire format
is unchanged — only textual formats see the new tag names.

Affected variants: `Key`, `Range`, `RangeInclusive`, `RangeFull`,
`RangeFrom`, `RangeTo`, `RangeToInclusive`, `RangeAfter`,
`RangeAfterTo`, `RangeAfterToInclusive`, `AggregateCountOnRange`,
`AggregateSumOnRange` — i.e. every variant, not just the aggregate
ones. This is the symmetric fix; doing only the new
`AggregateSumOnRange` variant would have diverged it from the
existing `AggregateCountOnRange` (and from the other ten variants
that have always been broken the same way).

Also updated the in-code comment at the token-stream test site to
reflect the new contract.

TESTS

Two new `serde_test::assert_tokens` round-trip regression tests pin
the snake_case contract on both aggregate variants:

  - serde_round_trip_aggregate_sum_on_range_uses_snake_case_tag
  - serde_round_trip_aggregate_count_on_range_uses_snake_case_tag

assert_tokens exercises both Serialize AND Deserialize against the
same token stream, so any future regression on either side fails the
test immediately.

Workspace cargo test --all-features: 3152 pass / 0 fail (was 3150 / 0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…0 envelopes

V0 (`MerkOnlyLayerProof`) envelopes predate the aggregate-sum feature and
cannot legitimately carry an aggregate-sum proof, so the V0 layer walker
was unreachable in any honestly-produced proof. Mirror the count-side
changes from PR #663:

  - Convert `aggregate_sum.rs` into a subdirectory mirroring
    `aggregate_count/`: `mod.rs` (public API + V0 rejection),
    `helpers.rs` (envelope decode, single-key layer verification,
    chain enforcement, leaf sum verification), `leaf_chain.rs` (V1
    leaf-chain walker). Removes the dead `verify_v0_layer` path.

  - Add a prover-side V0 gate in `prove_query_non_serialized`: when
    grove_version dispatches to V0 and the path query carries an
    `AggregateSumOnRange` anywhere, return `NotSupported` instead of
    emitting a V0 envelope the verifier would (correctly) reject.

  - Update tests: replace the V0 round-trip test with a V0-rejection
    test; broaden the empty-leaf type-confusion test to accept either
    the V0-rejection or the terminal-type-gate error; remove the now-
    unreachable V0 missing-lower-layer test (V1 counterpart already
    pins the missing-layer behavior); refresh stale doc-comments that
    pointed at `aggregate_sum.rs` line numbers.

No carrier-shape support yet — `AggregateSumOnRange` is still leaf-only
on the merk side, so there is no `classification.rs` / `per_key.rs`
mirror. The leaf-only shape can be extended later in parallel with
matching merk-level work, just like the count side did.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
grovedb/src/tests/aggregate_sum_query_tests.rs (1)

957-1083: ⚡ Quick win

These regressions never reach the terminal-type check.

Both forged-proof tests build a GROVE_V2/V0 envelope and treat the V0 rejection path as success, so they still pass even if enforce_lower_chain stops checking for ProvableSumTree. To actually pin the fix this PR introduced, forge the same shape under a V1 envelope and assert the terminal-type error specifically.

Also applies to: 1128-1225

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@grovedb/src/tests/aggregate_sum_query_tests.rs` around lines 957 - 1083, The
tests currently build a V0 (GROVE_V2) envelope and thus succeed by hitting the
V0 rejection path; change them to construct the same forged proof under a V1
envelope so the verifier exercises the terminal-type check in
enforce_lower_chain. Specifically, in the test functions
empty_leaf_type_confusion_forgery_rejected (and the other occurrence around
1128-1225), set the GroveVersion reference v to the V1 constant (use GROVE_V3 or
the project’s V1 version constant) so the match arm expecting GroveDBProof::V1
is used, adjust the forged envelope construction to produce a GroveDBProof::V1
envelope shape, then call GroveDb::verify_aggregate_sum_query(&forged_bytes,
&attack_pq, v) and assert that the returned error message contains the
terminal-type text (e.g., "must be a ProvableSumTree") to ensure
enforce_lower_chain is actually exercised.
merk/src/tree/mod.rs (1)

1231-1255: ⚡ Quick win

Centralize the provable-hash dispatch.

These match tables now duplicate hash_for_link()’s AggregateData→hash mapping in three places. That is a fragile proof contract: one missed branch on the next tree-type addition will make stored child hashes and computed root hashes diverge. Please extract a single helper and reuse it here and in hash_for_link().

Also applies to: 1279-1303

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/tree/mod.rs` around lines 1231 - 1255, The match on AggregateData
that selects node_hash_with_count/node_hash_with_sum is duplicated and must be
centralized: extract a helper function (e.g., compute_provable_hash or
provable_hash_for_aggregate) that takes (&AggregateData, &TreeTypeContext...) or
the same parameters used by hash_for_link() and returns the computed hash (using
node_hash_with_count, node_hash_with_sum, or tree.hash()), then replace the
duplicated match blocks in this file (the block around the current match and the
similar block at 1279-1303) to call that helper, and update hash_for_link() to
call the same helper so all AggregateData→hash logic lives in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@grovedb/src/operations/proof/aggregate_sum/leaf_chain.rs`:
- Around line 43-59: The proof walker accepts extra unexpected lower_layers in
V1 aggregate-sum chains; to harden it, validate that each non-terminal layer
contains exactly one lower_layer equal to the expected next_key and that the
terminal (leaf) layer contains zero lower_layers. In the function surrounding
verify_sum_leaf and verify_single_key_layer_proof_v0 where you access
layer.lower_layers (using variables depth, path_keys, next_key, layer,
path_query), add checks: if depth < path_keys.len() - 1 ensure
layer.lower_layers.len() == 1 and that the sole key equals next_key; if depth ==
path_keys.len() ensure layer.lower_layers.is_empty(); if any check fails return
Error::InvalidProof(path_query.clone(), format!(...)) with a clear message about
unexpected lower_layers.

In `@merk/src/tree/mod.rs`:
- Around line 674-691: For the TreeType::ProvableSumTree arm, do not silently
fall back to self.hash() when aggregate_data() is not
AggregateData::ProvableSum; instead fail closed by asserting or returning an
explicit error: replace the fallback branch with an assert/panic or propagate an
Err that clearly states that a ProvableSum was expected but aggregate_data()
returned something else (include the aggregate_data() value and node identity),
so node_hash_with_sum(self.inner.kv.hash(), self.child_hash(true),
self.child_hash(false), sum) is the only successful path for ProvableSumTree and
any mismatch is surfaced immediately; update callers if needed to handle the new
error path.

---

Nitpick comments:
In `@grovedb/src/tests/aggregate_sum_query_tests.rs`:
- Around line 957-1083: The tests currently build a V0 (GROVE_V2) envelope and
thus succeed by hitting the V0 rejection path; change them to construct the same
forged proof under a V1 envelope so the verifier exercises the terminal-type
check in enforce_lower_chain. Specifically, in the test functions
empty_leaf_type_confusion_forgery_rejected (and the other occurrence around
1128-1225), set the GroveVersion reference v to the V1 constant (use GROVE_V3 or
the project’s V1 version constant) so the match arm expecting GroveDBProof::V1
is used, adjust the forged envelope construction to produce a GroveDBProof::V1
envelope shape, then call GroveDb::verify_aggregate_sum_query(&forged_bytes,
&attack_pq, v) and assert that the returned error message contains the
terminal-type text (e.g., "must be a ProvableSumTree") to ensure
enforce_lower_chain is actually exercised.

In `@merk/src/tree/mod.rs`:
- Around line 1231-1255: The match on AggregateData that selects
node_hash_with_count/node_hash_with_sum is duplicated and must be centralized:
extract a helper function (e.g., compute_provable_hash or
provable_hash_for_aggregate) that takes (&AggregateData, &TreeTypeContext...) or
the same parameters used by hash_for_link() and returns the computed hash (using
node_hash_with_count, node_hash_with_sum, or tree.hash()), then replace the
duplicated match blocks in this file (the block around the current match and the
similar block at 1279-1303) to call that helper, and update hash_for_link() to
call the same helper so all AggregateData→hash logic lives in one place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6e4d5122-0ebe-46d7-afa4-5422fdfced48

📥 Commits

Reviewing files that changed from the base of the PR and between 5b3a046 and da5c103.

📒 Files selected for processing (28)
  • grovedb-element/src/element_type.rs
  • grovedb-element/tests/element_constructors_helpers.rs
  • grovedb-query/src/proofs/encoding.rs
  • grovedb-query/src/proofs/mod.rs
  • grovedb-query/src/proofs/tree_feature_type.rs
  • grovedb-query/src/query_item/mod.rs
  • grovedb/src/debugger.rs
  • grovedb/src/operations/proof/aggregate_sum/helpers.rs
  • grovedb/src/operations/proof/aggregate_sum/leaf_chain.rs
  • grovedb/src/operations/proof/aggregate_sum/mod.rs
  • grovedb/src/operations/proof/generate.rs
  • grovedb/src/operations/proof/mod.rs
  • grovedb/src/query/mod.rs
  • grovedb/src/tests/aggregate_sum_query_tests.rs
  • grovedb/src/tests/provable_sum_tree_tests.rs
  • grovedb/src/tests/trunk_proof_tests.rs
  • grovedbg-types/src/lib.rs
  • merk/src/element/tree_type.rs
  • merk/src/merk/chunks.rs
  • merk/src/proofs/query/aggregate_sum.rs
  • merk/src/proofs/query/mod.rs
  • merk/src/proofs/query/verify.rs
  • merk/src/proofs/tree.rs
  • merk/src/tree/link.rs
  • merk/src/tree/mod.rs
  • merk/src/tree/tree_feature_type.rs
  • merk/src/tree_type/costs.rs
  • merk/src/tree_type/mod.rs
✅ Files skipped from review due to trivial changes (1)
  • merk/src/merk/chunks.rs
🚧 Files skipped from review as they are similar to previous changes (20)
  • merk/src/tree_type/costs.rs
  • grovedb/src/operations/proof/mod.rs
  • grovedb-query/src/proofs/mod.rs
  • grovedbg-types/src/lib.rs
  • merk/src/tree/tree_feature_type.rs
  • grovedb/src/tests/trunk_proof_tests.rs
  • grovedb-element/tests/element_constructors_helpers.rs
  • merk/src/tree/link.rs
  • grovedb-query/src/proofs/encoding.rs
  • merk/src/proofs/query/verify.rs
  • merk/src/element/tree_type.rs
  • merk/src/proofs/query/mod.rs
  • grovedb/src/query/mod.rs
  • merk/src/tree_type/mod.rs
  • grovedb/src/tests/provable_sum_tree_tests.rs
  • merk/src/proofs/query/aggregate_sum.rs
  • grovedb-query/src/proofs/tree_feature_type.rs
  • grovedb/src/operations/proof/generate.rs
  • merk/src/proofs/tree.rs
  • grovedb-element/src/element_type.rs

Comment thread grovedb/src/operations/proof/aggregate_sum/leaf_chain.rs
Comment thread merk/src/tree/mod.rs
Comment on lines +674 to +691
TreeType::ProvableSumTree => {
// For ProvableSumTree, include the aggregate sum in the hash
// via `node_hash_with_sum`. This is what makes the root hash
// diverge from a plain SumTree containing the same elements.
let aggregate_data = self
.aggregate_data()
.unwrap_or(AggregateData::NoAggregateData);
if let AggregateData::ProvableSum(sum) = aggregate_data {
node_hash_with_sum(
self.inner.kv.hash(),
self.child_hash(true),
self.child_hash(false),
sum,
)
} else {
// Fallback to regular hash if aggregate data is unexpected
self.hash()
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Fail closed when a ProvableSumTree cannot produce ProvableSum.

Lines 678-690 currently downgrade to self.hash() when aggregate_data() overflows or the node/tree_type pairing is inconsistent. That strips the sum commitment from a caller that explicitly requested ProvableSumTree hashing, which weakens the verification contract instead of surfacing corruption. Please make this path error/assert rather than emitting a plain node hash.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@merk/src/tree/mod.rs` around lines 674 - 691, For the
TreeType::ProvableSumTree arm, do not silently fall back to self.hash() when
aggregate_data() is not AggregateData::ProvableSum; instead fail closed by
asserting or returning an explicit error: replace the fallback branch with an
assert/panic or propagate an Err that clearly states that a ProvableSum was
expected but aggregate_data() returned something else (include the
aggregate_data() value and node identity), so
node_hash_with_sum(self.inner.kv.hash(), self.child_hash(true),
self.child_hash(false), sum) is the only successful path for ProvableSumTree and
any mismatch is surfaced immediately; update callers if needed to handle the new
error path.

QuantumExplorer and others added 4 commits May 15, 2026 18:06
Addresses CodeRabbit findings on PR #661.

**Finding 1 (Major, fixed)** — `verify_v1_leaf_chain` accepted arbitrary
`lower_layers` entries at non-leaf depths and under the leaf merk. That
let two byte-distinct envelopes verify to the same `(root, sum)` because
the smuggled siblings/children were never inspected, and gave downstream
syntactic scanners an unverified surface to read from. Added a strict
shape gate that mirrors the honest prover (`prove_aggregate_sum_on_range`
short-circuit always emits `lower_layers: BTreeMap::new()` at the leaf,
and each path-prefix wrapper inserts exactly one entry):

  - At `depth == path_keys.len()` (leaf merk): require
    `lower_layers.is_empty()`.
  - At non-leaf depths: require `lower_layers.len() == 1` and the sole
    key to equal the expected descent key `path_keys[depth]`.

Two new regression tests (`sum_v1_envelope_with_extra_lower_layer_*`
and `sum_v1_envelope_with_lower_layers_under_leaf_*`) construct
byte-modified envelopes that the gate rejects.

**Finding 4 (nitpick, addressed)** — the existing empty-leaf
type-confusion tests build V0 envelopes and now hit the V0-rejection
gate before the terminal-type gate in `enforce_lower_chain` runs.
Added `empty_leaf_type_confusion_forgery_rejected_under_v1_envelope`
which builds the same forgery under a V1 `LayerProof` envelope so the
terminal-type gate fires directly. The test asserts the specific
"must be a ProvableSumTree" error from `enforce_lower_chain` so
future refactors that drop the gate are caught.

**Finding 2 (Major, skipped with reason)** — CodeRabbit asks to make
`hash_for_link` fail-closed (panic/Err) when a `ProvableSumTree` node's
`aggregate_data()` doesn't return `ProvableSum`. The current fallback
to `self.hash()` is identical across all three `Provable*` variants
(`ProvableCountTree`, `ProvableCountSumTree`, `ProvableSumTree`) and
also appears in commit-time dispatch (lines 1233-1304). Fixing only the
sum arm creates asymmetry with the count side; the broader refactor
(plus the matching dispatch-centralization in Finding 3) is out of
scope for this sum-feature PR and is documented in MEMORY M1 as
intentional.

**Finding 3 (nitpick, skipped with reason)** — the duplicated
`AggregateData → hash` dispatch in `merk/src/tree/mod.rs` predates this
PR and applies to all three `Provable*` variants. Centralizing it
would touch hot proof-emission paths; out of scope here.

Also updated `sum_v1_envelope_with_missing_lower_layer_is_rejected`
to accept the new strict-shape error message — removing an entry now
trips the shape gate first instead of the older missing-layer arm.
Both messages pin the same property.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… wrong-key

After splitting the strict-shape gate in \`verify_v1_leaf_chain\` into
two reachable error arms (entry-count vs. entry-key), the
ok_or_else-arm on the subsequent \`lower_layers.get(&next_key)\` became
naturally reachable for the wrong-key case (single entry but under
an unexpected key). Previously the combined-OR gate made both error
sites mutually exclusive, leaving the get-arm dead.

Coverage impact (aggregate_sum/ subdir, local tarpaulin):
  - leaf_chain.rs: 36/41 (88%) -> 44/44 (100%)
  - Subdir total:  ~80%       -> 89.94%

New tests in aggregate_sum_query_tests:
  - sum_v1_envelope_with_wrong_keyed_lower_layer_is_rejected — single
    \`lower_layers\` entry under \"impostor\" instead of \"st\". Exercises
    the key-shape gate distinctly from the count-shape gate.
  - sum_proof_with_trailing_bytes_is_rejected — mirror of
    \`aggregate_count_proof_with_trailing_bytes_is_rejected\`. Pins the
    canonical-decode invariant in \`decode_grovedb_proof\`.

Tightened assertion in
\`sum_v1_envelope_with_extra_lower_layer_is_rejected\` to match the new
\"lower-layer entries at depth N\" error string, and broadened the
assertion in \`sum_v1_envelope_with_missing_lower_layer_is_rejected\`
to accept either the old missing-layer message or the new
entry-count message (removing the only entry trips the count gate
first).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…d.rs

The same `decode_grovedb_proof` function lived in both
`aggregate_count/helpers.rs` and `aggregate_sum/helpers.rs`, differing
only in whether the error string named "aggregate-count" or
"aggregate-sum" as the offending shape. Two copies of the same
canonical-decode contract is a maintenance hazard — drift between the
two would mean one aggregate path could (e.g.) accept trailing bytes
that the other rejects, breaking the equality-by-bytes assumption the
contract is meant to guarantee.

Hoist the function to `operations/proof/mod.rs` as
`decode_grovedb_proof_canonical` with `pub(super)` visibility. Both
helper modules now call `super::decode_grovedb_proof_canonical`. The
error message generalizes from "aggregate-{count,sum} proof has N
trailing bytes" to "proof has N trailing bytes" since the call site
provides the surrounding context; the existing
`*_proof_with_trailing_bytes_is_rejected` tests assert
`msg.contains("trailing bytes")` and remain green.

No behavior change beyond the wording adjustment in the trailing-
bytes error. Tests: workspace 3088 / 0 fail; aggregate tests
223 / 0 fail (113 sum + 75 count + 35 others).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Migrate the four `bincode::decode_from_slice(...)?.0` sites in
`operations/proof/verify.rs` (`verify_query_with_options`,
`verify_query_get_parent_tree_info_with_options`, `verify_query_raw`,
`verify_trunk_chunk_proof`) to call
`super::decode_grovedb_proof_canonical(proof)?` — the same canonical
decoder the aggregate-count and aggregate-sum entry points already use.

Before this change, the general verifier silently accepted trailing
bytes after the encoded `GroveDBProof` envelope, because the `.0`
discarded the bincode `consumed` count. The aggregate-count and
aggregate-sum entry points rejected trailing bytes via their own
private decoder. That asymmetry meant the same logical proof could
have many distinct byte encodings through the general-verify surface
but only one through aggregate-verify — a malleability gap that
breaks any equality-by-bytes / caching / dedup assumption a consumer
might rely on. The cryptographic chain still bound the answer, so
this wasn't a soundness break.

The general-verify path now matches: trailing bytes are rejected with
`Error::CorruptedData("proof has N trailing bytes after the encoded
envelope")`.

Added `verify_query_rejects_proof_with_trailing_bytes` in
`proof_advanced_tests.rs` to lock the new behavior in place, mirror
of `sum_proof_with_trailing_bytes_is_rejected` and
`aggregate_count_proof_with_trailing_bytes_is_rejected`.

Tests: workspace 3089 / 0 fail (3088 + 1 new regression test).
No honest test/proof emitted trailing bytes, so no existing test
needed updating.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant