feat: add AggregateCountOnRange query for provable count trees#656
Conversation
Adds a new QueryItem variant `AggregateCountOnRange(Box<QueryItem>)` that counts the elements matched by an inner range and returns a single u64 together with an O(log n) cryptographic proof, instead of returning the elements themselves. Targets `ProvableCountTree` and `ProvableCountSumTree` (plus their `NonCounted*` wrappers); rejects all other tree types at proof time. Why: counting a sub-range of keys in a provable count tree was previously forced through a regular query, paying O(result-size) bytes for an answer that's already cryptographically committed at every internal node via `node_hash_with_count`. This adds a dedicated proof shape that collapses fully-inside subtrees into a single self-verifying op. Mechanics: - New proof node `Node::HashWithCount(kv_hash, left_child_hash, right_child_hash, count)` that is *self-verifying*: the verifier recomputes `node_hash_with_count(...)` from the four committed fields, so a forged count diverges from the parent's expected hash and the Merkle-root chain check fails. - `Merk::prove_aggregate_count_on_range` walks the AVL tree, classifying each subtree (Disjoint / Contained / Boundary) using inherited exclusive key-bound windows; emits `Hash` / `HashWithCount` / `KVDigestCount` accordingly. - Multi-layer GroveDB proof glue routes leaf-merk emission inside both `prove_subqueries` and `prove_subqueries_v1` short-circuit paths; envelope structure is unchanged so existing serialization works. - `GroveDb::verify_aggregate_count_query(proof, path_query, version) -> Result<(CryptoHash, u64), Error>` walks the layer chain, verifies single-key existence proofs at each non-leaf layer, delegates to the merk count verifier at the leaf, and enforces the `combine_hash(H(value), lower_root) == parent_proof_hash` chain. - Validation enforced at `Query` / `SizedQuery` / `PathQuery`: AggregateCountOnRange must be the only item, no subqueries, no pagination, inner item not Key / RangeFull / nested AggregateCountOnRange. - `execute` refactored to expose `execute_with_options(verify_avl_balance: bool)`; count proofs intentionally collapse one side to height 1 while descending the other, so the AVL balance check is bypassed only for the count verifier (existing callers unchanged). Documentation: new GroveDB book chapter at `docs/book/src/aggregate-count-queries.md` covering the contract, allowed range variants, validation rules, and per-case proof shapes (open and closed ranges, with mermaid diagrams). Cross-linked from the query-system chapter. Tests: - 10 classification unit tests covering Disjoint/Contained/Boundary decisions across every range bound shape. - 13 merk-level integration tests covering every allowed range variant + empty merk + tree-type rejection + a count-forgery test that proves the cryptographic binding. - 11 GroveDB end-to-end tests at `grovedb/src/tests/aggregate_count_query_tests.rs` covering ProvableCountTree, ProvableCountSumTree, multi-layer paths, validation rejections, normal-tree rejection, and a GroveDB-level forgery test. Full workspace: builds clean, clippy clean, all 1465+ grovedb lib tests + 387+ merk lib tests pass with no regressions. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR implements end-to-end support for aggregate-count-on-range queries: adds QueryItem::AggregateCountOnRange and Node::HashWithCount, safe serialization and opcodes, merk-level prover and verifier with subtree classification, GroveDB multi-layer envelope verification, integration into proof-gen, and extensive tests and docs. ChangesAggregate Count On Range Query Implementation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…t 1.95)
CI's clippy on Rust 1.95 caught a `collapsible_if` lint at
`grovedb-query/src/query.rs:402` that local clippy on an older toolchain
didn't surface. Rewrite the nested `if let Some(branches) = ... { if
!branches.is_empty() { ... } }` as a single let-chain `if let Some(...) &&
!branches.is_empty()`. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…debugger
CI's clippy --all-features run surfaced two more `collapsible_if` lints in
the count-proof classifier (rust 1.95) plus a non-exhaustive match on the
new `Node::HashWithCount` variant in the debugger module that the default
feature set hadn't compiled previously.
- Collapse `if let (Some(s_hi), Some(r_lo)) = ... { if s_hi <= r_lo { ... } }`
into let-chain form. Same for the symmetric upper-bound check. No behavior
change.
- Add a HashWithCount arm to `merk_proof_node_to_grovedbg`. The debugger UI
doesn't have a dedicated rendering for the new variant yet, so we surface
its committed `node_hash_with_count(...)` and the count via the existing
`KVValueHashFeatureType` slot — same approach already used for
`KVHashCount` in this function.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #656 +/- ##
===========================================
+ Coverage 90.54% 90.63% +0.09%
===========================================
Files 182 184 +2
Lines 52873 54763 +1890
===========================================
+ Hits 47875 49637 +1762
- Misses 4998 5126 +128
🚀 New features to boost your workflow:
|
|
Review findings:
|
|
Suggested fix direction for the findings above:
|
Codecov flagged this PR's patch coverage at 70.67%, below the 80% target. This commit raises it by adding focused unit + integration tests that exercise the previously-uncovered branches: grovedb-query/src/proofs/encoding.rs (+5 tests): - HashWithCount Push round-trip with all-different hashes and varint count. - HashWithCount PushInverted round-trip with u64::MAX count. - HashWithCount with count=0 + all-zero hashes (leaf-shaped collapsed subtree). - Decoder iterator covering a mixed Op stream of HashWithCount, Hash, KVDigestCount, Parent, Child, and PushInverted(HashWithCount). grovedb-query/src/query.rs (+10 tests): Cover every numbered rule in `Query::validate_aggregate_count_on_range` independently: happy path, extra items, non-ACOR-only item, inner Key, inner RangeFull, nested ACOR, default subquery branch, default subquery_path, conditional subquery branches non-empty, conditional branches empty-map (accepted). Plus a test for the `aggregate_count_on_range` helper's well-shaped detection. grovedb/src/query/mod.rs (+5 tests): SizedQuery rejects non-None limit and non-None offset; SizedQuery forwards Query-level rejections as InvalidQuery; SizedQuery happy path; PathQuery::validate_aggregate_count_on_range delegation; PathQuery::has_aggregate_count_on_range recognition. grovedb/src/tests/aggregate_count_query_tests.rs (+5 tests): - Three-layer path round trip (TEST_LEAF -> outer Tree -> inner ProvableCountTree) — exercises multi-layer chain enforcement. - Tampered non-leaf-layer byte rejected by the chain check. - Round trip through the V0 envelope (`MerkOnlyLayerProof`) by running against `GROVE_V2`, complementing the existing V1-envelope coverage. - Verifier rejects malformed path_query (inner Key) before any decoding. - Construction-time rejection of nested AggregateCountOnRange. All ~29 new tests pass; full workspace test suite remains green; clippy --workspace --all-features clean; cargo fmt clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three review findings from the PR review, all now addressed:
[P0] Count verifier accepts forged proof shapes
(merk/src/proofs/query/aggregate_count.rs)
The previous verifier only allowlisted node types (Hash / HashWithCount /
KVDigestCount) inside the execute() visit_node closure. That let a
malicious prover:
1. Send a single Push(Hash(expected_root)) for a non-empty tree and
receive (expected_root, 0) for any range.
2. Replace an in-range HashWithCount with a Hash carrying the same
node_hash (chain still matches), undercounting by the missing count.
3. Attach extra KVDigestCount children below a keyless Hash /
HashWithCount; their hash() ignores reconstructed children, so the
root still matches but a count-everything verifier would credit the
bogus child as +1.
The fix is a two-phase verifier:
- Phase 1: reconstruct the proof tree via execute_with_options, with a
coarse allowlist on node types but NO counting.
- Phase 2: walk the reconstructed tree with the same inherited
exclusive subtree-key bounds the prover used (None at the root),
calling classify_subtree(bounds, range) at each position and binding
the node type to the classification:
Disjoint -> must be a leaf Hash; contributes 0.
Contained -> must be a leaf HashWithCount; contributes its count.
Boundary -> must be a KVDigestCount whose key is strictly inside
the inherited bounds; recurse into both children with
tightened bounds, +1 if range.contains(key).
Counts are summed with checked_add (overflow -> invalid proof).
Three new tests prove the attacks are now rejected:
- shape_walk_rejects_single_hash_undercount
- shape_walk_rejects_hash_swap_for_contained_subtree
- shape_walk_rejects_keyless_node_with_attached_children
[P2] Prover bypassed aggregate-count validation
(grovedb/src/operations/proof/generate.rs, both v0 and v1 paths)
The short-circuit checked only `query.items.first()`, so a query with
extra items, subquery branches, limits, or an invalid inner (Key /
RangeFull / nested AggregateCountOnRange) silently produced a count
proof. Now the short-circuit fires whenever any item at the level is an
AggregateCountOnRange, and immediately calls
PathQuery::validate_aggregate_count_on_range() to extract the inner
range, surfacing the precise validation error if the surrounding
PathQuery is not a well-formed aggregate-count query. Same change in
both v0 and v1 prove_subqueries paths.
[P2] Recursive QueryItem decoding had no depth limit
(grovedb-query/src/query_item/mod.rs)
Variant 10 (AggregateCountOnRange) recursively decoded an inner
QueryItem before any validation, so a small payload of repeated
variant-10 bytes could stack-overflow the bincode / borrow decoder.
Added a bounded decode_with_depth + borrow_decode_with_depth helper
matching the existing Query / SubqueryBranch decode-depth guard pattern.
MAX_QUERY_ITEM_DECODE_DEPTH = 4 (single legal level + headroom).
Defense-in-depth: an inner AggregateCountOnRange is also rejected
immediately at decode time, since it is invalid by validation rules
anyway.
Three new tests:
- decode_rejects_nested_aggregate_count_on_range
- decode_caps_depth_for_malicious_payload
- decode_accepts_valid_one_level_aggregate_count_on_range
Workspace state: cargo build clean; cargo clippy --workspace
--all-features -D warnings clean; cargo fmt clean; all 1475+ grovedb
lib tests + 27 merk aggregate_count tests + 3 query_item decode tests
pass; full workspace test suite green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Pushed [P0] Count verifier shape walk —
Counts use
[P2] Prover validation — both [P2] QueryItem decode depth + nested rejection — added CI:
|
|
Re-reviewed the latest head (
One remaining item I would still fix before merging: [P2] Serde The bincode decode path is now depth-bounded, but the optional Verification I ran locally:
|
…ange Follow-up to @QuantumExplorer's third review pass: the bincode decode path was depth-bounded in 5989cc4, but the optional `serde` `Deserialize` path still recursively accepted nested `AggregateCountOnRange` payloads via `variant_access.newtype_variant::<QueryItem>()`. With the `serde` feature enabled, a serde-backed client could send arbitrarily deep nested AggregateCountOnRange and exhaust the stack inside `QueryItem::deserialize` before any validation ran — re-introducing the same DoS class the bincode fix closed. Fix: when `Field::AggregateCountOnRange` dispatches in the QueryItem deserialize visitor, the inner item is now deserialized via a new `NonAggregateInner(QueryItem)` newtype wrapper. Its `Deserialize` impl mirrors the QueryItem variant set but **omits** `AggregateCountOnRange` from its `Field` enum entirely — so a nested-aggregate payload is rejected by serde's enum dispatcher immediately, with no recursion into `QueryItem::deserialize`. Tests (gated on `feature = "serde"`, using `serde_test`'s token-level driver to bypass an unrelated pre-existing PascalCase/snake_case mismatch in the `Serialize`/`Deserialize` impls that breaks textual formats): - `serde_decode_rejects_nested_aggregate_count_on_range` — token stream for `AggregateCountOnRange(AggregateCountOnRange(...))` produces an `unknown field 'aggregate_count_on_range'` error from `NonAggregateInner`'s field dispatcher. - `serde_decode_accepts_valid_one_level_aggregate_count_on_range` — token stream for the only legal shape (`AggregateCountOnRange` wrapping a non-aggregate range) deserializes successfully. Added `serde_test = "1.0"` as a dev-dependency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Pushed Fix: when Tests (gated on
Added Workspace state: `cargo clippy --workspace --all-features -D warnings` clean, `cargo test --workspace` green. |
The aggregate-count-queries chapter was written before the implementation landed and several sections drifted from what actually shipped. This commit reconciles them: - Result type: replace the AggregateCountQueryResult struct sketch with the actual bare-tuple API (verify_aggregate_count_query -> Result<(CryptoHash, u64), Error>). The struct was rejected in review in favor of the bare tuple since the count is a u64 and the path_query already echoes the inner range. - HashWithCount self-verifying form: update the node-types table, every mermaid diagram, and the role explanations to reflect the shipped HashWithCount(kv_hash, left_child_hash, right_child_hash, count) form. The earlier draft used "KVHashCount + 2 child Hash" ops, which the implementation discarded in favor of one self-verifying op. The diagrams no longer show e/g/i/k as separate Hash children under HashWithCount nodes (their hashes now live inside the parent HashWithCount's embedded child-hash fields). The closed-range example drops from "13 push ops" to "9 push ops" accordingly. - New "Verifier shape walk" section: documents the two-phase verification (decode -> shape walk against classify_subtree with inherited bounds) and explicitly enumerates the three attack classes the shape walk catches that a naive allowlist-only verifier would let through (single-Hash undercount, HashWithCount->Hash swap, keyless child smuggling). Mirrors the security findings addressed in 5989cc4. - New "Decode safety" section: documents the MAX_QUERY_ITEM_DECODE_DEPTH = 4 bound and NonAggregateInner serde wrapper that prevent stack-exhaustion via repeated variant-10 payloads. Mirrors the bincode + serde decode hardening from 5989cc4 and fd272c7. - API sketch: switch to PathQuery::new_aggregate_count_on_range helper; destructure the verifier result as (root, count) to match the actual signature. - "Open Design Questions" -> "Settled design choices": items 1-3 from the original list are now locked in by the shipped validation. Added a fourth bullet documenting the HashWithCount 4-field decision and the rationale (review rejection of the simpler (node_hash, count) form). Cost-limit interaction stays as the only remaining design note. Book builds cleanly via mdbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The section was a leftover from the pre-implementation "Open Design Questions" list. Every bullet is now redundant: - Items 1-3 (one ACOR per Query, add_parent_tree_on_subquery forbidden, SizedQuery limit/offset rejected) are covered by the numbered Validation rules earlier in the chapter. - Item 4 (HashWithCount is self-verifying) is covered by the "Why HashWithCount is self-verifying" callout under the node-types table and elaborated in the Verifier shape walk section. - Item 5 (cost-limit interaction) is covered by the Cost Model section. The section ended on a "noted for review" framing that no longer matches reality — everything in it is shipped, validated, and tested. Removing it tightens the chapter and avoids duplicating rationale across two sections that could drift apart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grovedb-query/src/query.rs`:
- Around line 330-335: The helper aggregate_count_on_range currently only
returns Some when self.items is exactly one ACOR item; change it to scan
self.items for any item where is_aggregate_count_on_range() is true and return a
reference to that item (e.g., find/iter().find(...)/iter().position then
&self.items[idx]) so malformed queries that include an ACOR still get detected;
keep shape enforcement delegated to validate_aggregate_count_on_range so that
only detection is changed.
In `@merk/src/proofs/query/aggregate_count.rs`:
- Around line 123-128: The match in is_provable_count_bearing only checks bare
ProvableCountTree/ProvableCountSumTree and rejects wrapped variants; before
matching, normalize tree_type by stripping any wrapper variants to the inner
tree kind (i.e., apply the same wrapper-unwrapping/normalization used elsewhere)
and then match against ProvableCountTree and ProvableCountSumTree; apply the
identical normalization/fix in Merk::prove_aggregate_count_on_range so wrapped
count trees are accepted there too.
In `@merk/src/proofs/query/verify.rs`:
- Around line 479-489: The match arm must reject Node::HashWithCount in the
plain query verifier: update the match handling around
Node::Hash(_)|Node::KVHash(_)|Node::KVHashCount(..)|Node::HashWithCount(..) so
that when a Node::HashWithCount(..) is encountered (and in_range is true) the
verifier fails fast (return an Err / propagate a verification failure) instead
of treating it like a benign collapsed subtree; this prevents forged proofs that
rely on Tree::hash() recomputing counts and avoids letting execute_node be
satisfied by hidden keyed children—reserve Node::HashWithCount for the
aggregate-count verifier only.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1fc5ce85-ba3b-4c5e-9725-2902ea6a8f7e
📒 Files selected for processing (30)
docs/book/book.tomldocs/book/mermaid-fixup.jsdocs/book/src/SUMMARY.mddocs/book/src/aggregate-count-queries.mddocs/book/src/query-system.mdgrovedb-bulk-append-tree/src/proof/mod.rsgrovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rsgrovedb-query/Cargo.tomlgrovedb-query/src/proofs/encoding.rsgrovedb-query/src/proofs/mod.rsgrovedb-query/src/query.rsgrovedb-query/src/query_item/intersect.rsgrovedb-query/src/query_item/mod.rsgrovedb/src/debugger.rsgrovedb/src/operations/proof/aggregate_count.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/operations/proof/mod.rsgrovedb/src/operations/proof/verify.rsgrovedb/src/query/mod.rsgrovedb/src/tests/aggregate_count_query_tests.rsgrovedb/src/tests/mod.rsgrovedb/src/tests/provable_count_sum_tree_tests.rsmerk/benches/branch_queries.rsmerk/src/merk/chunks.rsmerk/src/merk/prove.rsmerk/src/proofs/branch/mod.rsmerk/src/proofs/query/aggregate_count.rsmerk/src/proofs/query/mod.rsmerk/src/proofs/query/verify.rsmerk/src/proofs/tree.rs
👮 Files not reviewed due to content moderation or server errors (5)
- grovedb-query/src/query_item/mod.rs
- grovedb-bulk-append-tree/src/proof/mod.rs
- grovedb-dense-fixed-sized-merkle-tree/src/proof/mod.rs
- grovedb-query/src/query_item/intersect.rs
- grovedb/src/operations/proof/verify.rs
[Major] Detect ACOR presence even on malformed queries (grovedb-query/src/query.rs) `Query::aggregate_count_on_range()` previously returned `None` unless the query was already the well-shaped single-item ACOR form. Queries that *contain* AggregateCountOnRange plus extra items (or with ACOR not at items[0]) reported `None` and could be mistakenly routed through the regular-query path. The helper now scans the whole `items` vec for any ACOR item; shape enforcement remains delegated to `validate_aggregate_count_on_range`. This is detection-only — the prover-side routing (which already used `iter().any()`) was unaffected, but downstream callers using the helper as a precondition gate now correctly hand malformed queries to the validator. [Major] Reject HashWithCount in the plain query verifier (merk/src/proofs/query/verify.rs) The plain `Query::execute_proof` verifier was treating `HashWithCount` the same as `Hash`/`KVHash`/`KVHashCount` — accepted as a "non-keyed node, OK if not in_range". That was unsafe: `Tree::hash()` for `HashWithCount` recomputes its hash from the embedded `(kv_hash, l, r, count)` while ignoring any reconstructed children, so a malicious prover could include a `HashWithCount` in a regular query proof, hang fake KV pushes under it (which `execute_node` would credit as query results), and still preserve the parent's hash chain. `HashWithCount` is now split into its own match arm that fails fast with a clear error. The dedicated aggregate-count verifier (`verify_aggregate_count_on_range_proof`) remains the only path that accepts the variant; that path also enforces the shape walk so attached children are independently rejected. New test: `regular_query_verifier_rejects_hash_with_count_node` builds an honest range proof against a normal merk, splices a `HashWithCount` push at the front, and asserts `Query::execute_proof` returns an `InvalidProofError`. [Skipped — not a bug] is_provable_count_bearing missing NonCounted variants CodeRabbit suggested normalizing `tree_type` for `NonCounted*` wrapper variants. There is no such `TreeType` variant — `NonCounted` is an `ElementType` wrapper that the merk-open path strips via `Element::NonCounted(inner) => inner.root_key_and_tree_type_owned()` (merk/src/element/tree_type.rs:63). The merk's `tree_type` for any NonCounted-wrapped provable count tree element is already the bare `TreeType::ProvableCountTree` / `ProvableCountSumTree`, so the existing gate is correct. Workspace state: cargo build clean; cargo clippy --workspace --all-features -D warnings clean; cargo test --workspace green (including the two new tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Pushed `02e75c15` addressing the CodeRabbit findings. [Major #1] Detect ACOR presence even on malformed queries — Fixed. `Query::aggregate_count_on_range` now scans the entire `items` vec via `iter().find`, so queries with extra items or with the ACOR not at index 0 are still detected. Shape enforcement remains in `validate_aggregate_count_on_range`. Updated the `aggregate_count_on_range_helper_*` test to cover ACOR-not-at-index-0 and ACOR-with-extra-item cases. [Major #3] Reject HashWithCount in the plain query verifier — Fixed. Real attack: `Tree::hash()` for `HashWithCount` recomputes from its embedded `(kv_hash, l, r, count)` and ignores any reconstructed children, so a malicious prover could include a `HashWithCount` in a regular query proof, attach fake KV pushes under it (which `execute_node` would credit as query results), and the parent's hash chain would still verify. Now `Node::HashWithCount(..)` has its own match arm in the plain verifier that fails fast with `"HashWithCount node is only valid in aggregate-count proofs; encountered in regular query verification"`. New test `regular_query_verifier_rejects_hash_with_count_node` exercises this end-to-end. [Major #2] is_provable_count_bearing missing NonCounted variants — Skipped, not a bug. CodeRabbit suggested normalizing `tree_type` for `NonCounted*` wrapper variants, but there's no such `TreeType` variant — `NonCounted` is an `ElementType` wrapper that the merk-open path strips via `Element::NonCounted(inner) => inner.root_key_and_tree_type_owned()` (merk/src/element/tree_type.rs:63). The merk's `tree_type` for any NonCounted-wrapped provable count tree element is already the bare `TreeType::ProvableCountTree` / `ProvableCountSumTree`, so the existing gate is correct. (The book wording "NonCounted* wrapper variants" was the source of confusion; it referred to ElementType wrappers, not TreeType.) `cargo clippy --workspace --all-features -D warnings` clean; full workspace test suite green including the two new tests. |
…+ proof size
Pushes test coverage from ~7/10 to ~9/10 by adding the four missing
classes the original review called out:
merk/src/proofs/query/aggregate_count.rs (+2 tests):
- fuzz_byte_mutation_no_silent_forgery: enumerates every byte of an
honest count proof, flips it to three different values, and asserts
the verifier never produces a "silent forgery" — Ok((honest_root,
count')) where count' != honest_count. Three safe outcomes are
permitted: rejection, root divergence, or same-(root, count) (which
happens for non-canonical re-encodings like Push <-> PushInverted).
The unsafe case panics with a precise location. Asserts both
rejection and divergence branches fire as a sanity check.
- fuzz_random_trees_and_ranges_round_trip: deterministic xorshift RNG
builds 16 ProvableCountTrees of varying sizes with random multi-byte
keys, runs 6 random ranges per tree (covering all 6 bounded /
half-bounded variants), and asserts the verifier's count matches a
brute-force keys.iter().filter(range.contains).count(). Catches
off-by-one / edge-of-tree / multi-byte-key bugs the example-based
tests would miss.
grovedb/src/tests/aggregate_count_query_tests.rs (+2 tests):
- non_counted_children_are_included_in_aggregate_count_v1_contract:
pins the v1 contract that AggregateCountOnRange counts every in-range
key including NonCounted-wrapped ones. Inserts 5 normal items + 1
NonCounted item into a ProvableCountTree and asserts the count is
6, not 5. The book chapter previously claimed exclusion; that doc
was aspirational and is corrected in the same commit. The test's
doc-comment points callers to read the parent
Element::ProvableCountTree(_, count, _) bytes directly when they
want the aggregate-excluding-NonCounted total.
- proof_size_snapshot_for_15_key_closed_range: pins the proof byte
size for the canonical 15-key + RangeInclusive("c"..="l") setup at
~650 bytes (window [300, 900]). Catches gross regressions in the
proof shape — e.g. if the count short-circuit stops firing or if
every node starts emitting full child hashes.
docs/book/src/aggregate-count-queries.md:
Replace the "NonCounted children are excluded by design" callout with
an accurate description of the v1 contract: in-range keys are counted
regardless of NonCounted-ness; callers wanting the parent's
aggregate-excluding-NonCounted total should read the parent element
bytes directly. Notes that a NonCounted-aware count mode could be
added in a future revision (would require tracking
structural-vs-in-range counts separately during the shape walk).
Workspace state: cargo build clean, cargo clippy --workspace
--all-features -D warnings clean, cargo test --workspace green
including the four new tests.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `_verify_options_imported_marker` fn was a placeholder added to keep
the `VerifyOptions` import alive "for future count-aware verify
options." There's no concrete plan for those options on this branch,
and a placeholder fn that exists only to satisfy a dead import is
worse than dropping both — if/when the options arrive, restoring the
import is a one-liner.
Removed:
- the `_verify_options_imported_marker` fn
- `VerifyOptions` from the use block (kept `QueryProofVerify` since
`execute_proof` still depends on it)
Build + clippy clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NonCounted-wrapped children should not contribute to the parent's aggregate count (they zero out their contribution at insertion time), but they still have a provable count equal to their left+right descendants. Implement the exclusion end-to-end: - Prover (`emit_count_proof`): Disjoint and Contained subtrees now both emit `HashWithCount(kv_hash, l, r, count)` so the structural count of every outside subtree is cryptographically bound to the parent's hash chain. Boundary nodes derive their own_count as `node_count - left_link_aggregate - right_link_aggregate`. - Verifier (`verify_count_shape`): returns `(in_range_count, structural_count)`. At Boundary nodes, `own_count` is derived via `checked_sub` from the parent aggregate and the left/right structural counts (rejects malformed proofs that would saturate). - Phase-1 allowlist no longer accepts plain `Node::Hash` in count proofs — only `HashWithCount` and `KVDigestCount`. Plain `Hash` carries no count, so a malicious prover could lie about the structural count and skew the parent's `own_count` derivation. - Test `non_counted_children_are_excluded_from_aggregate_count` inserts 5 normal + 1 NonCounted item and asserts count = 5. - Book chapter updated to document the new exclusion semantics and the `(in_range, structural)` verifier return type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
grovedb/src/tests/aggregate_count_query_tests.rs (1)
282-297: ⚡ Quick winMutate a real
HashWithCountnode instead of the first matching byte.This loop scans the entire serialized GroveDB proof and flips the first
0x1e/0x1fbyte it finds, but those values can also occur inside hashes or envelope bytes. That means the test can fail for unrelated corruption while no longer proving that a forgedHashWithCountcount is rejected. Parsing down to the merk proof before mutating the count field would make this check much more precise.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/tests/aggregate_count_query_tests.rs` around lines 282 - 297, The test currently flips the first byte equal to 0x1e/0x1f in the serialized proof (variables proof, tampered, count_offset), which can hit bytes inside hashes/envelopes; instead parse the serialized merk proof into node/opcode boundaries and locate a genuine HashWithCount node before mutating its count varint. Update the loop to walk the proof with a cursor that reads an opcode then the exact node payload lengths (opcode 0x1e/0x1f => opcode | kv_hash[32] | left[32] | right[32] | count_varint) so you compute count_offset only when the opcode is at a true node boundary, then increment that varint and set tampered = true; ensure the parser skips over non-opcode bytes rather than matching raw bytes inside hashes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/book/src/aggregate-count-queries.md`:
- Around line 217-220: Update the excluded-subtree example diagrams and
descriptions so that any out-of-range or excluded leaf positions are shown and
treated as HashWithCount rather than plain Hash: replace occurrences of `Hash`
used to represent excluded subtrees in the excluded-subtree examples (including
the blocks referenced around the existing diagrams and the verifier walk) with
`HashWithCount`, and ensure the text and role-table consistency reflects that
excluded leaves must be disjoint positions represented by `HashWithCount` (since
count proofs explicitly reject plain `Hash`). Adjust the captions/labels and any
verifier-walk narrative that checks leaf types to reference `HashWithCount` in
those examples.
---
Nitpick comments:
In `@grovedb/src/tests/aggregate_count_query_tests.rs`:
- Around line 282-297: The test currently flips the first byte equal to
0x1e/0x1f in the serialized proof (variables proof, tampered, count_offset),
which can hit bytes inside hashes/envelopes; instead parse the serialized merk
proof into node/opcode boundaries and locate a genuine HashWithCount node before
mutating its count varint. Update the loop to walk the proof with a cursor that
reads an opcode then the exact node payload lengths (opcode 0x1e/0x1f => opcode
| kv_hash[32] | left[32] | right[32] | count_varint) so you compute count_offset
only when the opcode is at a true node boundary, then increment that varint and
set tampered = true; ensure the parser skips over non-opcode bytes rather than
matching raw bytes inside hashes.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d731dbfa-432c-46a4-9f1c-476a1b34be0c
📒 Files selected for processing (6)
docs/book/src/aggregate-count-queries.mdgrovedb-query/src/query.rsgrovedb/src/operations/proof/aggregate_count.rsgrovedb/src/tests/aggregate_count_query_tests.rsmerk/src/proofs/query/aggregate_count.rsmerk/src/proofs/query/verify.rs
🚧 Files skipped from review as they are similar to previous changes (3)
- merk/src/proofs/query/aggregate_count.rs
- grovedb/src/operations/proof/aggregate_count.rs
- grovedb-query/src/query.rs
- Book chapter: update the per-case example diagrams (Case 1 RangeFrom / RangeAfter, Case 2 RangeInclusive) so excluded-subtree leaves are shown as `HashWithCount` rather than plain `Hash`. The role table and verifier-walk section were already correct (out-of-range subtrees must use `HashWithCount` so the structural count is bound to the parent's hash chain), but the example diagrams still showed the old `Hash`-only shape and contradicted the security model. Verifier total tables also updated to include the outside subtrees with their +0 contributions. - count_forgery_is_caught_at_grovedb_level: rewrite to parse the GroveDB envelope (bincode) and the leaf merk proof (merk::proofs::Decoder) properly, mutating the count of a real `Op::Push(Node::HashWithCount)` op at a true op boundary instead of byte-scanning for 0x1e/0x1f. The previous approach could match a 0x1e byte inside an embedded 32-byte hash — the verifier would still reject, but for the wrong reason (root mismatch from a tampered hash, not a tampered count). The new approach actually proves the property the test name claims. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Addressed both CodeRabbit comments in abf86d0: 1. Excluded-subtree examples ( 2. Forgery test (
This guarantees we mutate an actual count varint and not a All 19 grovedb-level + 30 merk-level aggregate-count tests pass. |
| { | ||
| let inner_range = cost_return_on_error_no_add!( | ||
| cost, | ||
| path_query.validate_aggregate_count_on_range().cloned() |
There was a problem hiding this comment.
[P2] Aggregate-count validation still runs too late
This validation only runs after traversal reaches the target query level. If the path is missing, the recursive prover never sees the AggregateCountOnRange item, so malformed aggregate-count queries can still return a regular path/absence proof instead of being rejected. I reproduced this with PathQuery::new_aggregate_count_on_range([TEST_LEAF, "missing"], QueryItem::Key(...)): prove_query returned Ok(Some(proof)) even though the inner Key is invalid for aggregate count.
I would fix this by validating aggregate-count path queries at the prove_query / v1 entry point before prove_subqueries starts, for example: if path_query.has_aggregate_count_on_range() { path_query.validate_aggregate_count_on_range()?; }. If aggregate-count items can be constructed inside nested subquery branches, make the detector recursive and reject those shapes too, since aggregate-count is terminal and should not be hidden under a regular subquery path.
…ejections
Adds nine new tests targeting previously-uncovered branches in the
aggregate-count code, lifting per-file line coverage on the new files
from 62.45%/92.17% to 76.68%/92.98%.
GroveDB-side (`tests/aggregate_count_query_tests.rs`):
- v1_envelope_with_non_merk_proof_bytes_is_rejected: swaps a leaf
layer's `ProofBytes::Merk(_)` for `MMR(_)` and asserts the V1 walker
rejects with InvalidProof.
- v1_envelope_with_missing_lower_layer_is_rejected: drops the leaf
layer's pointer entry in `lower_layers`.
- v1_envelope_with_corrupted_non_leaf_merk_bytes_is_rejected:
truncates a non-leaf merk proof to one byte to exercise the
single-key proof error path.
- v1_envelope_with_malformed_leaf_count_proof_is_rejected: replaces
the leaf merk proof with a single Push(Hash) op stream so the
leaf-level count verifier surfaces a wrapped `InvalidProof` error
rather than reaching the chain check.
Generate.rs (`operations/proof/generate.rs`):
- {dense_tree, mmr_tree, bulk_append_tree}_rejects_aggregate_count_on_range:
unit-test the `query_items_to_*` private helpers directly so we
exercise the AggregateCountOnRange rejection arms without needing
to set up populated dense/MMR/bulk-append trees in a real grove.
Merk-side (`proofs/query/aggregate_count.rs`):
- shape_walk_rejects_disjoint_hashwithcount_with_children: splices
a HashWithCount child under a Disjoint-position HashWithCount
(Phase-1-allowed but Phase-2-rejected) to exercise the leaf-only
invariant at Disjoint positions.
- shape_walk_rejects_non_hashwithcount_at_disjoint: swaps a
Disjoint HashWithCount for a plain Hash carrying the same node
hash; either Phase 1 (allowlist) or Phase 2 (expected-type)
rejects.
- shape_walk_rejects_kvdigestcount_outside_inherited_bounds:
rewrites a KVDigestCount key to fall outside the parent's
inherited subtree bounds.
26 grovedb-level + 33 merk-level aggregate-count tests now pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex finding (PR #656 review): the AggregateCountOnRange validation only fires inside `prove_subqueries` when recursion reaches the ACOR-bearing leaf level. If the path doesn't exist (e.g. a missing key under TEST_LEAF), the recursive prover never sees the ACOR item, so a malformed aggregate-count query — invalid inner range, ACOR hidden in a subquery branch, etc. — would silently route through the regular-proof path and return Ok with a regular path/absence proof. Reproduced as `aggregate_count_with_missing_path_and_invalid_inner_is_ rejected_at_entry`: a PathQuery with path = [TEST_LEAF, "missing"] and an inner `Key` (invalid for ACOR) returned `Ok(86)` (an 86-byte proof) before this fix. Fix: - Add `Query::has_aggregate_count_on_range_anywhere` — recursive detector that walks the top-level items, the default subquery branch, and every conditional subquery branch. - At the entry of `prove_query_non_serialized` (before v0/v1 dispatch), if the recursive detector finds any ACOR, immediately call `validate_aggregate_count_on_range`. This rejects: * invalid inner items (Key, RangeFull, nested ACOR) * ACOR with extra top-level items or with subquery branches * ACOR hidden inside a subquery branch (top-level shape isn't canonical, validate fails) Tests: - `has_aggregate_count_on_range_anywhere_walks_subqueries` (grovedb-query): plain query, top-level ACOR, ACOR in default subquery, ACOR in conditional subquery. - `aggregate_count_with_missing_path_and_invalid_inner_is_rejected_at_entry` (grovedb): the exact reproducer from Codex's review. - `aggregate_count_hidden_in_subquery_branch_is_rejected_at_entry` (grovedb): broader concern — terminal ACOR cannot be hidden under a subquery branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This is Claude. Real bug — thanks for the catch and the precise reproducer. Fixed in c13b5f4. Reproduction confirmed first. Added Fix follows your recommended structure:
Coverage of the broader concern. Added Workspace: 1486 grovedb lib tests + 28 aggregate_count grovedb + 4 grovedb-query aggregate_count tests pass. |
…ure paths Adds four new tests targeting previously-uncovered defensive branches in `verify_single_key_layer_proof_v0` and `enforce_lower_chain`: - non_leaf_proof_without_target_key_is_rejected: replace the "ct" KV op with a Hash op so the merk single-key verifier returns Ok with no matching key in result_set, hitting the "did not contain the expected key" arm. - non_leaf_proof_with_kv_replaced_by_kvdigest_is_rejected: replace the "ct" KV variant with KVDigest (key + value_hash, no value), so the result_set contains "ct" but `value = None`, hitting the "no value bytes" arm. - non_leaf_proof_with_undeserializable_value_is_rejected: mutate the "ct" value bytes to garbage that no Element variant tag matches, hitting the deserialize-failure arm of `enforce_lower_chain`. - non_leaf_proof_with_non_tree_element_is_rejected: mutate the "ct" value bytes to a serialized Element::Item, exercising the `is_any_tree()` guard — aggregate-count proofs can only descend through tree elements. All four are gated to `Err(InvalidProof)` and accept either the specific defensive branch firing or an upstream merk verifier rejection, since the order in which the validation steps fail can shift across mutations. Either path closes the attack surface. Also adds a `mutate_test_leaf_layer_ops` test helper that decodes the V1 envelope, walks to the TEST_LEAF non-leaf merk proof bytes, runs an arbitrary closure over the parsed ops, and re-encodes — shared by all four new tests. Per-file line coverage on `grovedb/src/operations/proof/aggregate_count.rs`: 73.91% → 84.58%. 32 grovedb-level aggregate_count tests now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…erify feature (#658) * feat(merk,grovedb): expose aggregate_count proof verification under verify feature The AggregateCountOnRange proof primitive (PR #656) had its modules gated behind feature = "minimal" in both grovedb-merk and grovedb, making the verifier entry points (`verify_aggregate_count_on_range_proof` and `GroveDb::verify_aggregate_count_query`) unreachable from downstream lean-verifier crates that depend on `grovedb`/`grovedb-merk` with `default-features = false, features = ["verify"]`. Widen the module gates to `any(feature = "minimal", feature = "verify")` to match the pattern already used by `query::common`, `query::merge`, `query::query_item`, and `query::verify` in the same file. Refactor aggregate_count.rs to keep prover-only items (`RefWalker`, `Fetch`, `AggregateData`, `emit_count_proof`, etc.) gated `feature = "minimal"` while exposing the verifier (`verify_aggregate_count_on_range_proof`, `verify_count_shape`, `classify_subtree`, `key_strictly_inside`) under the wider gate. Add a `cfg(all(test, feature = "verify"))` test module with hex-fixture round-trip, empty-merk, and byte-mutation no-silent-forgery tests that exercise the verifier without any prover dependency, plus an `#[ignore]`d helper to regenerate the fixture if the proof encoding ever changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: widen verify_only_tests gate so default CI runs them The verify_only_tests module was gated cfg(all(test, feature = "verify")), which hid it from CI runs that only enable the default `full` feature set (which doesn't include `verify`). Codecov flagged the test bodies as uncovered patch lines as a result. Widen to cfg(all(test, any(feature = "minimal", feature = "verify"))). The verifier function is available under both gates and these tests have no prover dependency, so the `verify`-only gate was buying nothing except missing CI coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: convert ignored fixture-dump helper into active parity check The #[ignore]d dump_verify_only_fixtures test never ran in CI, so its ~17 lines counted as uncovered patch lines. Replace it with an active verify_only_fixture_matches_fresh_prover_output test that runs every build, asserts the hardcoded fixture in verify_only_tests still matches fresh prover output byte-for-byte, and prints regeneration-ready constants on mismatch. This catches encoding drift automatically and gets the fixture maintenance lines into CI coverage. Promote the FIXTURE_*_HEX/COUNT constants to pub(super) so the parity test can compare against them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test: drop redundant feature gates on test modules Both test modules were gated with feature = "minimal" / any(minimal, verify), but the crate convention (e.g., merk/src/element/tree_type.rs) is plain #[cfg(test)] — tests rely on default = ["full"] (which pulls in minimal) being active at test time. The extra gates were defensive beyond what the surrounding code does. Also tighten the verify_only_tests doc comment now that the drift check runs on every build instead of being a separate ignored helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rely V0 proof envelopes (`GroveDBProofV0` / `MerkOnlyLayerProof`) predate the ACOR feature — ACOR was introduced in #656, well after V1 envelopes became the default for grove versions used by Dash Platform v12+. The V0+ACOR combination is impossible in any deployed Platform release, so the previous compatibility shims are dead code. This change makes that explicit: - Prover entry: `prove_query_non_serialized` rejects ACOR queries paired with grove versions that dispatch to V0 with `Error::NotSupported("requires V1 proof envelopes")`. The check fires before any work happens, so callers can't accidentally emit a V0 ACOR proof. - Verifier entries: both `verify_aggregate_count_query` (leaf) and `verify_aggregate_count_query_per_key` (leaf or carrier) refuse `GroveDBProof::V0` envelopes via a shared `require_v1_envelope` helper, surfacing `Error::InvalidProof`. A forged V0 envelope carrying ACOR-looking bytes can't even reach the verification logic. - Removed dead code: `verify_v0_leaf_chain` and `verify_v0_with_classification` are deleted. `MerkOnlyLayerProof` and `GroveDBProofV0` are no longer imported by `aggregate_count.rs`. - Tests updated: `provable_count_tree_works_on_grove_v2_envelope` inverts to `aggregate_count_rejects_grove_v2_envelope`, asserting the prover refuses the combination. `acor_carrier_rejects_v0_envelope` similarly tightens its assertion to expect `NotSupported` from the prover instead of `InvalidProof` from the verifier. The V0 short-circuit in `prove_subqueries` (the v0 path) stays as defense-in-depth — unreachable through the entry gate but a safety net for any future internal caller that bypasses `prove_query_non_serialized`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…663) * feat(grovedb,query): allow AggregateCountOnRange as carrier subquery Allow `AggregateCountOnRange` as a subquery item under an outer `Keys`/ `Range*` carrier query — the natural multi-outer-key extension of the existing single-leg ACOR. The carrier returns one `(outer_key, u64)` pair per matched outer key while leaving the leaf wire format and existing entry points byte-identical. Why: Dash Platform's count-query module needs to answer `SELECT brand, COUNT(*) ... WHERE brand IN (...) GROUP BY brand` against `byBrand / <brand> / color / <ProvableCountTree>` layouts. Today grovedb rejects the nested shape at `Query::validate_*`, forcing callers to either fan out one proof per outer key (k× larger proofs) or fall back to the unproven path. With this change the same GroveDB proof contains all k counts and is verified in one pass. Changes: - `grovedb-query`: split validation into a leaf validator (renamed from the old `validate_aggregate_count_on_range`) and a new carrier validator that requires `Key`/`Range*` outer items, a leaf-ACOR subquery, and no conditional branches. The top-level `validate_aggregate_count_on_range` now dispatches by query shape and returns the leaf inner range either way. - `grovedb`: add `validate_leaf_aggregate_count_on_range` at SizedQuery/PathQuery for entry points that must reject the carrier. No prover changes needed — the existing recursion naturally walks outer items -> subquery_path -> leaf ACOR via `query_items_at_path`. - `grovedb`: add `verify_aggregate_count_query_per_key` returning `(RootHash, Vec<(Vec<u8>, u64)>)`. Leaf queries surface as a single entry with empty key + the same count `verify_aggregate_count_query` returns. The legacy `verify_aggregate_count_query` now strictly validates the leaf shape; carrier queries must use the new entry. - Tests: 10 new carrier-shape validation tests in grovedb-query and 7 end-to-end carrier proof tests in grovedb (two-outer-key happy path, absent-outer-key returns present-only, both-levels rejection, Range-outer carrier, leaf symmetry, non-ACOR rejection, and count forgery rejection through the carrier envelope). Asymptotic complexity for |outer matches| = k, outer tree B, leaf tree C: O(k * (log B + log C)). Out of scope (deferred): conditional-branch carriers, multi-In on prefix (IN x IN), drive-side integration. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(grovedb,query): address review feedback on carrier ACOR - Move ACOR construction/validation out of `grovedb-query/src/query.rs` (it had grown past 1500 lines) into a focused `grovedb-query/src/aggregate_count.rs` with its own `impl Query` block and tests. `query.rs` shrinks from ~1580 to ~910 lines. - Per CodeRabbit major: route verifier validation through the PathQuery layer so `SizedQuery::limit`/`offset` are rejected up front for both leaf and carrier shapes. New tests pin the pagination rejection. - V0 proof envelopes can't carry the carrier shape (they're produced only by pre-feature grove versions); the new `verify_v0_with_classification` rejects carrier classification with `InvalidProof("only supported on V1 proof envelopes")` instead of attempting a never-emitted carrier walk. Removes the unused `verify_v0_carrier_layer` / `verify_v0_subquery_path`. - Per CodeRabbit minor: ordering doc on `verify_aggregate_count_query_per_key` now says "query-direction order" (matches the merk walker) instead of asserting strict ascending lex, since the carrier honors `Query::left_to_right`. - Per CodeRabbit nitpick: pin the malformed-leaf-inner-Key rejection message in the carrier path so future refactors can't silently re-route the error. - New end-to-end tests: V0+carrier rejection, long subquery_path (2 intermediate single-key layers), corrupted outer-layer merk bytes, undecodable envelope, legacy `verify_aggregate_count_query` rejection of carrier shapes, carrier pagination rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(grovedb,query): boost carrier ACOR patch coverage Patch coverage was 89.86% (just under codecov's 90% target). The biggest gaps were untested error paths in the carrier verifier and a few carrier-validator branches the top-level dispatcher masks. New tests: grovedb (carrier proof verifier error paths): - `acor_carrier_missing_outer_lower_layer_is_rejected` — drops one `lower_layers[outer_key]` entry from a real envelope; verifier must surface "missing lower layer for outer key". - `acor_carrier_missing_subquery_path_layer_is_rejected` — drops the subquery_path "color" layer; exercises `verify_v1_subquery_path`'s missing-layer branch. - `acor_carrier_non_merk_proof_bytes_is_rejected` — swaps a `ProofBytes::Merk(...)` layer for `ProofBytes::MMR(...)`; the `expect_merk_bytes` helper rejects. grovedb-query (direct carrier-validator branches): - `validate_carrier_acor_direct_rejects_missing_subquery` - `validate_carrier_acor_direct_rejects_acor_outer_item` - `validate_carrier_acor_direct_rejects_range_full_outer` These call `validate_carrier_aggregate_count_on_range` directly to hit branches the top-level dispatcher routes around (the dispatcher classifies first and sends ACOR-outer-item shapes to the leaf validator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(grovedb): reject V0 envelopes for AggregateCountOnRange entirely V0 proof envelopes (`GroveDBProofV0` / `MerkOnlyLayerProof`) predate the ACOR feature — ACOR was introduced in #656, well after V1 envelopes became the default for grove versions used by Dash Platform v12+. The V0+ACOR combination is impossible in any deployed Platform release, so the previous compatibility shims are dead code. This change makes that explicit: - Prover entry: `prove_query_non_serialized` rejects ACOR queries paired with grove versions that dispatch to V0 with `Error::NotSupported("requires V1 proof envelopes")`. The check fires before any work happens, so callers can't accidentally emit a V0 ACOR proof. - Verifier entries: both `verify_aggregate_count_query` (leaf) and `verify_aggregate_count_query_per_key` (leaf or carrier) refuse `GroveDBProof::V0` envelopes via a shared `require_v1_envelope` helper, surfacing `Error::InvalidProof`. A forged V0 envelope carrying ACOR-looking bytes can't even reach the verification logic. - Removed dead code: `verify_v0_leaf_chain` and `verify_v0_with_classification` are deleted. `MerkOnlyLayerProof` and `GroveDBProofV0` are no longer imported by `aggregate_count.rs`. - Tests updated: `provable_count_tree_works_on_grove_v2_envelope` inverts to `aggregate_count_rejects_grove_v2_envelope`, asserting the prover refuses the combination. `acor_carrier_rejects_v0_envelope` similarly tightens its assertion to expect `NotSupported` from the prover instead of `InvalidProof` from the verifier. The V0 short-circuit in `prove_subqueries` (the v0 path) stays as defense-in-depth — unreachable through the entry gate but a safety net for any future internal caller that bypasses `prove_query_non_serialized`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(grovedb): support right-to-left direction in carrier ACOR verifier The carrier-layer verifier was hardcoding `left_to_right=true` when calling `merk::execute_proof`, even though the rest of the carrier classification (`AcorClassification::carrier_left_to_right`) and the prover (`generate_merk_proof(..., query.left_to_right, ...)`) both honor the query's direction. Consequence: a carrier query with `left_to_right=false` would produce a correct multi-key proof emitted in reverse order, but the verifier would feed it through `execute_proof` walking left-to-right and discover only the last matched key (the merk walker terminates as soon as the first "impossible direction" boundary is hit), returning a single result instead of all matches. The single-key descent (`verify_single_key_layer_proof_v0`) is unaffected because it's exactly-one-key, so order doesn't matter. The fix is a one-line change: pass `left_to_right` through to `execute_proof` in `execute_carrier_layer_proof`. Adds a new test `acor_subquery_right_to_left_returns_descending_order` that builds a 3-brand carrier with `left_to_right=false` and asserts results come back in descending lex order (brand_002, brand_001, brand_000), each with the correct per-brand count. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * docs(grovedb): tone down carrier ACOR direction comment Strip the "CRITICAL:" framing and the hypothetical "hardcoding true would..." — a short factual note about what would actually go wrong is enough. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(grovedb,query): drop "ACOR" abbreviation for forward-compat The "ACOR" / "Acor" shorthand baked count into every identifier and prose mention — but the leaf/carrier shape generalizes to forthcoming aggregate variants (sum, average, …), and "ASOR" / "AAOR" don't make that lineage obvious. This commit drops the abbreviation so future aggregate modules can use parallel naming without inheriting a count-specific prefix. Renames: - `AcorClassification` → `AggregateCountClassification` - `classify_path_query` → `classify_aggregate_count_path_query` - `acor_subquery_*` test functions → `carrier_*` (the `aggregate_count_query_tests` module already scopes them) - `acor_carrier_*` → `carrier_*` - `acor_leaf_*` → `leaf_*` - `acor_per_key_*` → `per_key_*` - `carrier_acor_path_query` helper → `carrier_count_path_query` - `validate_acor_*` → `validate_aggregate_count_*` - `validate_carrier_acor_*` → `validate_carrier_aggregate_count_*` - `make_acor_query` → `make_aggregate_count_query` - `make_leaf_acor_subquery` → `make_leaf_aggregate_count_subquery` - `inner_acor*` → `inner_aggregate_count*` - `bad_inner_acor` → `bad_inner_aggregate_count` Docs/comments/error-message prose: "ACOR" → "aggregate-count" (or the full `AggregateCountOnRange` when referring to the QueryItem). Module docs note that future variants will live in sibling modules (`aggregate_sum`, `aggregate_average`, …) with parallel naming. No behavioral changes — pure rename. All 62 grovedb + 28 grovedb-query aggregate-count tests still pass; workspace tests + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * refactor(grovedb): split aggregate_count proof module into submodules The single 750-line `aggregate_count.rs` had grown to cover four distinct concerns: public API, classification, the leaf-only chain walker, and the per-key carrier walker (plus a pile of shared helpers). Split into a directory module with one file per concern. ``` grovedb/src/operations/proof/aggregate_count/ ├── mod.rs 227 lines — public API (impl GroveDb, │ verify_aggregate_count_query + │ verify_aggregate_count_query_per_key), │ require_v1_envelope, submodule wiring ├── classification.rs 77 lines — AggregateCountClassification struct + │ classify_aggregate_count_path_query ├── leaf_chain.rs 77 lines — verify_v1_leaf_chain (legacy │ single-u64 walker) ├── per_key.rs 221 lines — per-key carrier walker │ (with_classification, _per_key, │ _carrier_layer, _subquery_path) └── helpers.rs 270 lines — shared utilities (decode envelope, verify_count_leaf, expect_merk_bytes, verify_single_key_layer_proof_v0, OuterMatch + execute_carrier_layer_proof, enforce_lower_chain) ``` Visibility is `pub(super)` everywhere — the submodules are crate implementation detail, only `verify_aggregate_count_query` and `verify_aggregate_count_query_per_key` are public. Pure code reorganization — no behavior change. All 62 grovedb + 28 grovedb-query aggregate-count tests still pass; workspace + clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(grovedb): canonical decoding for aggregate-count proofs `bincode::decode_from_slice` returns the number of bytes consumed but we were discarding it, so a valid proof with arbitrary trailing bytes appended would still decode to the same envelope and verify successfully. The chain-bound cryptographic check still gives the same `(RootHash, count)`, so this isn't a correctness/security problem — but it does mean a single logical proof has infinitely many byte encodings, which breaks any equality-by-bytes assumption (caching, deduplication, hashing the proof itself). `decode_grovedb_proof` now compares `consumed` to `proof.len()` and rejects with `Error::CorruptedData("trailing bytes after the encoded envelope")` when they differ. Test `aggregate_count_proof_with_trailing_bytes_is_rejected` pins the rejection at both entry points (`verify_aggregate_count_query` and `verify_aggregate_count_query_per_key`) with a real proof + one appended zero byte. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(grovedb,query): pin nested-carrier rejection The original spec explicitly defers the "Range × Range × ACOR" / "IN × IN on prefix" composition: a carrier whose subquery is itself another carrier (rather than a leaf `AggregateCountOnRange`). The carrier validator already rejects this — it calls `validate_leaf_aggregate_count_on_range` on the subquery, which catches the inner carrier's outer items — but we didn't have a test documenting that contract. Two new tests, one per layer: - `grovedb-query::validate_carrier_aggregate_count_rejects_nested_carrier` drives the static validator with `outer.set_subquery(inner_carrier)` and asserts `InvalidOperation`. - `grovedb::rejects_nested_carrier_range_range_aggregate_count` builds a full `PathQuery` with the same shape and asserts both the static `PathQuery::validate_aggregate_count_on_range` and the prover's entry-point gate (`prove_query` → `InvalidQuery`) refuse. If we ever decide to support the nested shape, these tests are the contract that has to change first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(grovedb): pin SQL-style A=1, B>4, COUNT(C>4) carrier query Adds an explicit end-to-end test that mirrors the canonical SQL aggregate-count query SELECT COUNT(*) FROM t WHERE a = 1 AND b > 4 AND c > 4 against an `(a, b, c)`-indexed grove laid out as TEST_LEAF / byA / <a_val> / byB / <b_val> / byC / <count_tree> The mapping to the carrier ACOR shape is: - `A = 1` is a fixed prefix → `path_query.path` - `B > 4` is the variable outer dimension → carrier's `RangeAfter` - per matched `B`, walk `byC` → carrier's `subquery_path` - `COUNT(C > 4)` is the leaf aggregate-count subquery The tree contains both `A = 1` and `A = 2` so we also confirm the fixed prefix actually scopes the result (only `A = 1` rows count). Expected: two `(b_val, 5)` entries — `b_5` and `b_7` each have 5 `C > c_4` matches. This shape was already supported via the existing carrier machinery — the test just makes the SQL → grove mapping obvious for callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(grovedb): emit lower-layer for empty count tree under carrier ACOR When a carrier `AggregateCountOnRange` query's descent reached an empty `Element::ProvableCountTree(None, ..)` or `ProvableCountSumTree(None, ..)`, the V1 prover treated it as a terminal "tree with no children" result and inserted no `lower_layer` for that step. The carrier verifier then rejected the proof at the chain walk with `"carrier aggregate-count proof missing subquery_path layer for key …"`. The correct behavior is to return `count = 0` for that outer key — an existing empty leaf count tree is still a valid match. The fix is a new, narrow arm in `prove_subqueries_v1` that recurses into empty `Provable{Count,CountSum}Tree(None, ..)` elements when both: 1. The surrounding `PathQuery` is an aggregate-count query (`has_aggregate_count_on_range_anywhere() == true`), and 2. The current level's query has a subquery / matching subquery_path step on this key. The recursion opens the empty merk, hits the aggregate-count short-circuit, runs `prove_aggregate_count_on_range` on the empty subtree (which returns an empty op stream), and emits a `LayerProof { merk_proof: ProofBytes::Merk(empty), .. }`. The verifier's `verify_count_leaf` reads empty bytes as `(NULL_HASH, 0)`, and the chain check `combine_hash(H(empty_tree_value), NULL_HASH) == parent_value_hash` holds because the parent committed the tree element with the same `NULL_HASH` child root. Non-aggregate-count queries keep their existing "empty tree = terminal result" semantics. The narrow arm fires *before* the general empty-tree arm and only matches on the two provable-count flavors, so behavior is unchanged for every other tree type. Adds `carrier_returns_zero_count_for_empty_leaf_subtree` — inserts `byBrand/brand_000/color` as an empty `ProvableCountTree`, runs a carrier query, and asserts the returned vector is `[(b"brand_000", 0)]` with matching root hash. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(grovedb): tighten query_aggregate_count to leaf-only validation `query_aggregate_count` is the no-proof counterpart of `verify_aggregate_count_query`: it returns a single `u64` and has no way to surface per-outer-key carrier counts. But the validator it called — `path_query.validate_aggregate_count_on_range` — is the top-level dispatcher that accepts BOTH leaf and carrier shapes. A carrier-shape path query would slip past validation here, then the subsequent `count_aggregate_on_range` call would either error (wrong merk tree type at `path_query.path`) or silently miscompute because the function only ever counts the inner range against the single subtree at `path_query.path`. Same fix we applied to `verify_aggregate_count_query` earlier: switch to `validate_leaf_aggregate_count_on_range`, which rejects the carrier shape up front. The docstring is updated to point carrier-shape callers at `verify_aggregate_count_query_per_key` (the proof-based per-key entry). New test `no_proof_rejects_carrier_shape` constructs a valid carrier path query, sanity-checks that the dispatcher-level validator still accepts it (so the rejection below is specifically from the leaf-only tightening), and asserts the no-proof entry returns `InvalidQuery` with no storage reads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(grovedb): surface InvalidProof instead of panicking on classification drift `verify_v1_carrier_layer` was `.expect()`-ing that `classification.carrier_subquery_path` is `Some` whenever `carrier_outer_items` is `Some`. That invariant is held today by `classify_aggregate_count_path_query`, but a panic is the wrong failure mode if the invariant ever drifts — a verifier should surface verification failures, not abort. Replace with `ok_or_else` returning `Error::InvalidProof` with a contextual message. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(grovedb): add no-proof query_aggregate_count_per_key entry point Closes the leaf/carrier asymmetry on the no-proof side. Before this commit: proof path: verify_aggregate_count_query (leaf only, -> u64) verify_aggregate_count_query_per_key (leaf or carrier, -> Vec<(key, u64)>) no-proof: query_aggregate_count (leaf only, -> u64) <missing carrier sibling> A carrier-shape caller that didn't need a proof had to go through `prove_query` + `verify_aggregate_count_query_per_key` anyway — paying proof generation, serialization, and chain verification just to throw the proof bytes away. `query_aggregate_count_per_key` mirrors the proof-path surface (returns `Vec<(Vec<u8>, u64)>`) and accepts the same shapes: - Leaf path query: delegates to `query_aggregate_count` and wraps as a one-entry vec with an empty key — matches the proof-path leaf-symmetry contract. - Carrier path query: opens the carrier subtree via `query_raw` with a "shallow" path query (carrier's outer items, no subquery) to enumerate matched outer keys. For each match, walks `subquery_path` manually and calls `count_aggregate_on_range` on the leaf merk. Absent outer keys contribute no entry; outer keys whose leaf is an empty `ProvableCountTree` contribute `(key, 0)`. Validation is the dispatcher-level `validate_aggregate_count_on_range` (accepts both shapes); non-ACOR queries get `InvalidQuery` before any storage reads. Non-tree matches on the carrier level surface `InvalidQuery` rather than the merk-level `count_aggregate_on_range` error. Six tests pin the contract: - `no_proof_per_key_leaf_matches_single_count` — leaf shape returns the same `u64` as `query_aggregate_count`, wrapped. - `no_proof_per_key_carrier_returns_per_outer_count` — two outer brands, 500 colors each -> two entries. - `no_proof_per_key_skips_absent_outer_keys` — absent outer key contributes no entry. - `no_proof_per_key_empty_leaf_returns_zero` — outer key exists, leaf is empty `ProvableCountTree` -> `(key, 0)`. - `no_proof_per_key_matches_proof_path_per_key` — element-for-element equality with `verify_aggregate_count_query_per_key` on a non-trivial 3-brand carrier query. - `no_proof_per_key_rejects_non_aggregate_count_query` — non-ACOR path queries rejected with `InvalidQuery`. Reuses the existing `query_aggregate_count_on_range` version gate since this is another entry point for the same feature, not a distinct protocol change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Issue being fixed or feature implemented
Counting a sub-range of keys in a
ProvableCountTreewas previously forced through a regular query, payingO(result-size)proof bytes for an answer that's already cryptographically committed at every internal node vianode_hash_with_count. This PR adds a dedicated count-only proof shape withO(log n)size regardless of how many keys fall in the range.What was done?
A new
QueryItemvariantAggregateCountOnRange(Box<QueryItem>)that wraps an inner range and asks for a singleu64count + cryptographic proof, instead of returning the elements themselves. Restricted toProvableCountTreeandProvableCountSumTree(plus theirNonCounted*wrapper variants); rejected for any other tree type at proof time.Cryptographic core:
Node::HashWithCount(kv_hash, left_child_hash, right_child_hash, count)which is self-verifying: the verifier recomputesnode_hash_with_count(...)from the four committed fields. A forged count diverges from the parent's expected hash and the Merkle-root chain check fails — so the count is bound by the proof, not trusted on faith. (An earlier draft used(node_hash, count)only, which would have been forge-able; that version was rejected during design review.)Proof generation (merk):
Merk::prove_aggregate_count_on_rangewalks the AVL tree, classifying each subtree asDisjoint/Contained/Boundaryusing inherited exclusive key-bound windows. Disjoint short-circuits at the link level (no I/O); Contained walks one level for its kv_hash + grandchild hashes; Boundary recurses. EmitsHash/HashWithCount/KVDigestCountaccordingly.Proof verification (merk):
verify_aggregate_count_on_range_proofreplays Ops via the newexecute_with_options(.., verify_avl_balance=false, ..). The AVL-balance check is bypassed only here because count proofs intentionally collapse one side of the tree into a single op while descending the other — the resulting reconstructed tree is unbalanced by design. All existing callers keep the balance check via the unchangedexecute()wrapper.Hash/HashWithCount/KVDigestCount; anything else isInvalidProofError.GroveDB integration:
prove_queryentry — short-circuits inside bothprove_subqueries(v0) andprove_subqueries_v1when the leaf-merk's items containAggregateCountOnRange. Envelope structure is unchanged so existing serialization works.GroveDb::verify_aggregate_count_query(proof, path_query, version) -> Result<(CryptoHash, u64), Error>walks the multi-layer envelope, verifies single-key existence proofs at each non-leaf layer, delegates to the merk count verifier at the leaf, and enforcescombine_hash(H(value), lower_root) == parent_proof_hashfor the chain.Validation:
Enforced at
Query::validate_aggregate_count_on_range/SizedQuery::validate_aggregate_count_on_range/PathQuery::validate_aggregate_count_on_range:Key(usehas_raw/get_raw)RangeFull(read the parentElement::ProvableCountTreebytes)AggregateCountOnRangedefault_subquery_branchconditional_subquery_branchesSizedQuery::limit/offsetleft_to_rightignoredHow Has This Been Tested?
Unit (10 tests): classification of every Disjoint / Contained / Boundary case across all range-bound shapes.
Merk integration (13 tests): every allowed range variant (
Range,RangeInclusive,RangeFrom,RangeAfter,RangeTo,RangeToInclusive,RangeAfterTo,RangeAfterToInclusive) + range-below-all-keys + range-above-all-keys + empty merk + tree-type rejection + a count-forgery test that tampers with the count in aHashWithCountop and asserts the reconstructed root diverges from the expected one.GroveDB integration (11 tests,
grovedb/src/tests/aggregate_count_query_tests.rs): end-to-endprove → encode → decode → verifyonProvableCountTreeandProvableCountSumTreeat multi-layer paths, plus validation rejections (Key inner, RangeFull inner, NormalTree target) and a GroveDB-level forgery test that flips a count byte in the encoded proof and asserts the verifier rejects it via the layer-chain check.Workspace state:
cargo build --workspace,cargo clippy --workspace --all-targets, andcargo test --workspaceall pass with no regressions (1465+ grovedb lib tests + 387+ merk lib tests).cargo fmtclean.Documentation
New chapter at
docs/book/src/aggregate-count-queries.mdcovering the contract, allowed range variants, validation rules, and per-case proof shapes (open and closed ranges, with mermaid diagrams using a 15-key example tree). Cross-linked from the query-system chapter.Breaking Changes
None for callers — the new variant is additive and existing
prove_query/verify_querypaths are untouched. Sites that pattern-match exhaustively onQueryItemorNodewere updated (one&'static strrejection arm each in dense-tree, bulk-append-tree, and the grovedb proof generator/verifier — none of those tree types support count queries).Checklist:
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Documentation
Tests