|
| 1 | +# Aggregate Sum on Range Queries |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +An **Aggregate Sum on Range Query** lets a caller ask: |
| 6 | + |
| 7 | +> "What is the total sum of children whose keys fall in this range, in this |
| 8 | +> `ProvableSumTree`?" |
| 9 | +
|
| 10 | +The answer is a signed `i64`, and on a `ProvableSumTree` it comes back with a |
| 11 | +cryptographic proof. A verifier holding the tree's root hash can compute the |
| 12 | +total from the proof in `O(log n + |boundary|)` work — without ever |
| 13 | +materializing the `SumItem` values themselves. |
| 14 | + |
| 15 | +This is the parallel to [Aggregate Count on Range](aggregate-count-queries.md) |
| 16 | +for sum trees. The two query types are orthogonal: an aggregate-sum query |
| 17 | +returns a sum, an aggregate-count query returns a count, and a single |
| 18 | +`PathQuery` may not contain both. |
| 19 | + |
| 20 | +> **Not to be confused with [Aggregate Sum Queries](aggregate-sum-queries.md).** |
| 21 | +> That existing API is a sum-budget iterator — it walks a SumTree returning |
| 22 | +> `(key, sum_value)` pairs until a running total is reached. `AggregateSumOnRange` |
| 23 | +> is a different feature: it answers "what is the verified total for keys in |
| 24 | +> this range?" without returning any values, and only against the |
| 25 | +> `ProvableSumTree` element type. |
| 26 | +
|
| 27 | +The feature is implemented as a `QueryItem` variant: |
| 28 | + |
| 29 | +```rust |
| 30 | +pub enum QueryItem { |
| 31 | + Key(Vec<u8>), |
| 32 | + Range(Range<Vec<u8>>), |
| 33 | + // ... existing variants ... |
| 34 | + AggregateCountOnRange(Box<QueryItem>), |
| 35 | + |
| 36 | + /// Sum the per-node sum contributions of children matched by the inner |
| 37 | + /// range, without returning them. Only valid on ProvableSumTree (and its |
| 38 | + /// `NonCounted` / `NotSummed` wrapper variants). |
| 39 | + AggregateSumOnRange(Box<QueryItem>), |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +The wrapped `QueryItem` is the **range to sum over**. As with |
| 44 | +`AggregateCountOnRange`, it must be one of the true range variants: |
| 45 | +`Range`, `RangeInclusive`, `RangeFrom`, `RangeTo`, `RangeToInclusive`, |
| 46 | +`RangeAfter`, `RangeAfterTo`, `RangeAfterToInclusive`. The single-key |
| 47 | +(`Key`), full-range (`RangeFull`), and self-nested (`AggregateSumOnRange`) |
| 48 | +variants are rejected — and `AggregateSumOnRange` may not wrap an |
| 49 | +`AggregateCountOnRange` either. |
| 50 | + |
| 51 | +> **Why are `Key` and `RangeFull` rejected?** |
| 52 | +> |
| 53 | +> - **`Key(k)`** would return either `0` or the single child's sum |
| 54 | +> contribution — degenerate cases the existing `get_raw` / |
| 55 | +> `verify_query_with_options` paths already handle more cheaply. |
| 56 | +> - **`RangeFull`** has its answer already exposed by the parent's |
| 57 | +> `Element::ProvableSumTree(_, sum, _)` bytes, which are hash-verified by |
| 58 | +> the parent Merk's proof. Going through `AggregateSumOnRange(RangeFull)` |
| 59 | +> would always produce a strictly heavier proof for an answer the caller |
| 60 | +> can read directly. |
| 61 | +
|
| 62 | +## Why this works only on ProvableSumTree |
| 63 | + |
| 64 | +GroveDB has several tree types that track a sum: |
| 65 | + |
| 66 | +| Tree type | Sum tracked? | Sum in node hash? | AggregateSumOnRange allowed? | |
| 67 | +|----------------------------------|:------------:|:-----------------:|:---------------------------:| |
| 68 | +| `SumTree` | yes | no | **no** | |
| 69 | +| `BigSumTree` | yes (i128) | no | **no** | |
| 70 | +| `CountSumTree` | yes | no | **no** | |
| 71 | +| `ProvableCountSumTree` | yes | no (count only) | **no** | |
| 72 | +| `ProvableSumTree` | yes | **yes** | **yes** | |
| 73 | +| `NonCountedProvableSumTree` | yes (inner) | yes (inner) | **yes** | |
| 74 | +| `NotSummedProvableSumTree` | yes (inner) | yes (inner) | **yes** | |
| 75 | + |
| 76 | +Only `ProvableSumTree` bakes the per-node sum into the node hash via |
| 77 | +`node_hash_with_sum(kv_hash, left, right, sum)`. Because every node's sum |
| 78 | +participates in the Merkle root, a verifier holding only the root hash can |
| 79 | +reconstruct enough of the tree from a proof to **trust** the sums embedded |
| 80 | +in it. |
| 81 | + |
| 82 | +`SumTree`, `BigSumTree`, `CountSumTree`, and `ProvableCountSumTree` all |
| 83 | +track sums in storage, but those sums are not committed in the node hash |
| 84 | +chain. (For `ProvableCountSumTree`, the count is in the hash but the sum |
| 85 | +is not.) A "proof" of those sums would be unverifiable, so we reject |
| 86 | +`AggregateSumOnRange` against them at query-construction time. |
| 87 | + |
| 88 | +The wrapper variants are accepted because the wrapper only changes how the |
| 89 | +**parent** aggregates this element — the inner is still a fully-fledged |
| 90 | +`ProvableSumTree`. |
| 91 | + |
| 92 | +> **Why not `BigSumTree`?** `BigSumTree` uses `i128` sums and would need a |
| 93 | +> separate hash dispatch (`node_hash_with_big_sum`) plus a different verify |
| 94 | +> path. It is a documented follow-up, not part of this PR. |
| 95 | +
|
| 96 | +## Query-Level Constraints |
| 97 | + |
| 98 | +`AggregateSumOnRange` is a **terminal** query item. Its presence reduces |
| 99 | +the enclosing `Query` to a single, well-defined operation: "sum, then |
| 100 | +return." |
| 101 | + |
| 102 | +If any `QueryItem::AggregateSumOnRange(_)` appears in `Query::items`, the |
| 103 | +query is well-formed only when: |
| 104 | + |
| 105 | +1. `items.len() == 1` — no other items, no other sums, no mixing with |
| 106 | + `AggregateCountOnRange`. |
| 107 | +2. The inner `QueryItem` is **not** `Key`, `RangeFull`, or another |
| 108 | + `AggregateSumOnRange` / `AggregateCountOnRange`. |
| 109 | +3. `default_subquery_branch.subquery.is_none()` and |
| 110 | + `subquery_path.is_none()`. |
| 111 | +4. `conditional_subquery_branches.is_none()` (or empty). |
| 112 | +5. The targeted subtree's `TreeType` is `ProvableSumTree`. |
| 113 | +6. The enclosing `SizedQuery` does not set `limit` or `offset`. Summing |
| 114 | + is aggregate over the matched range — pagination would silently change |
| 115 | + the answer and is rejected. |
| 116 | +7. `left_to_right` is **ignored** (summing is direction-agnostic). |
| 117 | + |
| 118 | +Violations return `Error::InvalidQuery(...)` before any I/O. |
| 119 | + |
| 120 | +## API Surface |
| 121 | + |
| 122 | +```rust |
| 123 | +// Prove side — unchanged from regular queries: |
| 124 | +GroveDb::prove_query(&path_query, prove_options, grove_version) |
| 125 | + -> CostResult<Vec<u8>, Error> |
| 126 | + |
| 127 | +// Verify side — dedicated, returns (root_hash, sum): |
| 128 | +GroveDb::verify_aggregate_sum_query(proof, &path_query, grove_version) |
| 129 | + -> Result<(CryptoHash, i64), Error> |
| 130 | +``` |
| 131 | + |
| 132 | +A bare tuple is used rather than a wrapper struct: the sum is already an |
| 133 | +`i64` and the `path_query` echoes the inner range. |
| 134 | + |
| 135 | +> **Note on `NonCounted` and `NotSummed` children.** An |
| 136 | +> `Element::NotSummed(child)` wrapper tells the parent sum tree to skip the |
| 137 | +> wrapped element when aggregating its own sum. `AggregateSumOnRange` |
| 138 | +> honors this: every node in a `ProvableSumTree` carries an own-sum equal |
| 139 | +> to its own `SumItem` value or `0` if `NotSummed`-wrapped. The verifier |
| 140 | +> credits only the **own-sum** to the in-range total when the boundary key |
| 141 | +> falls in range. `NonCounted` is orthogonal to sums — it suppresses count |
| 142 | +> aggregation, not sum aggregation — so a `NonCounted` `SumItem` still |
| 143 | +> contributes its sum value normally. |
| 144 | +
|
| 145 | +## Proof Node Vocabulary |
| 146 | + |
| 147 | +For `ProvableSumTree`, every node hash commits to its subtree's aggregate |
| 148 | +sum via `node_hash_with_sum(kv_hash, left, right, sum)`. The proof-node |
| 149 | +vocabulary is parallel to the count family, with new variants carrying an |
| 150 | +`i64` sum field in place of the `u64` count: |
| 151 | + |
| 152 | +| Role in proof | Proof node type | What it carries | |
| 153 | +|----------------------------|------------------------------------------------------------------------------|----------------------------------------------------------------| |
| 154 | +| **On-path / boundary** | `KVDigestSum(key, value_hash, sum)` | key + value digest + subtree sum | |
| 155 | +| **Fully-inside / outside** | `HashWithSum(kv_hash, left_hash, right_hash, sum)` | the four fields needed to recompute `node_hash_with_sum` | |
| 156 | +| **Queried boundary item** | `KVSum(key, value, sum)` | leaf value at a boundary key, with subtree sum | |
| 157 | +| **Empty side** | (the empty-tree sentinel, no `Push` needed) | — | |
| 158 | + |
| 159 | +Wire format tag bytes (V1 only): `0x30..=0x3D` for the push and |
| 160 | +push-inverted variants. The on-the-wire sum field is `varint i64` (not |
| 161 | +fixed-width) for compactness; the **hash input** to `node_hash_with_sum` |
| 162 | +uses fixed 8-byte big-endian — wire and hash are deliberately decoupled. |
| 163 | + |
| 164 | +> **Why `HashWithSum` is self-verifying.** The `sum` value carried by a |
| 165 | +> `HashWithSum` op is *bound* to the parent merk's hash chain, not |
| 166 | +> trusted on faith. The verifier recomputes |
| 167 | +> `node_hash_with_sum(kv_hash, left, right, sum)` from the four fields |
| 168 | +> and uses the result as the subtree's committed `node_hash` for the |
| 169 | +> parent's hash recomputation. If the prover lies about `sum`, the |
| 170 | +> recomputed `node_hash` diverges from what the parent committed, and the |
| 171 | +> parent's Merkle-root check fails. |
| 172 | +
|
| 173 | +The walk-by-example diagrams from |
| 174 | +[Aggregate Count on Range Queries](aggregate-count-queries.md) apply |
| 175 | +unchanged — substitute `KVDigestCount` → `KVDigestSum` and |
| 176 | +`HashWithCount` → `HashWithSum`. |
| 177 | + |
| 178 | +## Signed-Sum Arithmetic |
| 179 | + |
| 180 | +Two correctness points differ from the count machinery: |
| 181 | + |
| 182 | +### Negative sums |
| 183 | + |
| 184 | +A `ProvableSumTree` can hold negative `SumItem` values, and a range can |
| 185 | +sum to a negative or zero total. Two consequences: |
| 186 | + |
| 187 | +- **No `if sum == 0` short-circuit.** The count generator can skip an |
| 188 | + empty subtree (count = 0 means "no elements"), but `sum == 0` does |
| 189 | + **not** mean "no elements" — it can mean "+5 and -5 cancelled". The |
| 190 | + sum prover descends regardless. |
| 191 | +- **No `own_sum = aggregate − left_struct − right_struct` overflow |
| 192 | + check.** Count uses `checked_sub` to catch "children claim more than |
| 193 | + parent" as corruption. Signed sums can naturally have children's |
| 194 | + structural sums in any combination (`+200 + -150 = +50`), so the |
| 195 | + subtraction is allowed to wrap. The hash chain still binds every |
| 196 | + node, so arithmetic corruption changes the reconstructed root hash |
| 197 | + and the caller's root check catches it. |
| 198 | + |
| 199 | +### i64 overflow at extremes |
| 200 | + |
| 201 | +A sum of two `i64::MAX` children does **not** fit in `i64`. The verify |
| 202 | +path accumulates in `i128` end-to-end: |
| 203 | + |
| 204 | +- The prover's internal recursion (`emit_sum_proof`) returns |
| 205 | + `CostResult<i128, Error>`. |
| 206 | +- The verifier's `verify_sum_shape` accumulates into an `i128`. |
| 207 | +- Both narrow to `i64` at the **outermost entry point** via |
| 208 | + `i64::try_from(sum_i128)`, returning `Error::InvalidProofError` if |
| 209 | + the i128 result doesn't fit. |
| 210 | + |
| 211 | +Tests cover the two interesting overflow shapes: |
| 212 | + |
| 213 | +- `i64::MAX + i64::MAX` → overflows i64, verify rejects with |
| 214 | + `InvalidProofError`. |
| 215 | +- `i64::MAX + i64::MIN` → `-1`, fits i64, verify succeeds. The |
| 216 | + intermediate i128 carries the difference safely. |
| 217 | + |
| 218 | +## Tests and Examples |
| 219 | + |
| 220 | +See: |
| 221 | + |
| 222 | +- `grovedb/src/tests/aggregate_sum_query_tests.rs` — 21 end-to-end |
| 223 | + GroveDB tests. |
| 224 | +- `merk/src/proofs/query/aggregate_sum.rs` — 14 Merk-level tests |
| 225 | + (classification, prover internals, single-`Hash` rejection, |
| 226 | + disjoint-with-children rejection, overflow at i64::MAX). |
| 227 | +- `grovedb/src/operations/proof/aggregate_sum.rs` — V0/V1 envelope walker |
| 228 | + with layer-chain validation. |
| 229 | + |
| 230 | +The marquee scenarios: |
| 231 | + |
| 232 | +| Scenario | Result | |
| 233 | +|-------------------------------------------------------|-------------------------------------| |
| 234 | +| Full range over `[1..=15]` | sum = 120 | |
| 235 | +| Subrange `[5..=10]` | sum = 45 | |
| 236 | +| Mixed `+50, -100, +30, -50` | sum = -70 | |
| 237 | +| All-negative subrange | sum = -10 | |
| 238 | +| `+5, -5` (non-zero children, zero sum) | sum = 0 (no short-circuit) | |
| 239 | +| `i64::MAX + i64::MAX` | `Error::InvalidProofError` | |
| 240 | +| `i64::MAX + i64::MIN` | sum = -1 | |
| 241 | +| Tampered `HashWithSum::sum` | rejected (root-hash divergence) | |
| 242 | +| `NotSummed(SumItem)` in range | excluded (matches tree's aggregate) | |
| 243 | +| Query with subquery / pagination / mixed aggregates | rejected at validation | |
| 244 | + |
| 245 | +## See Also |
| 246 | + |
| 247 | +- [Element System](element-system.md) — the `ProvableSumTree` element |
| 248 | + variant and `ProvableSummedMerkNode` feature type. |
| 249 | +- [Aggregate Count on Range Queries](aggregate-count-queries.md) — the |
| 250 | + symmetric count-only feature; most of the proof-shape walk diagrams |
| 251 | + apply unchanged. |
| 252 | +- [Aggregate Sum Queries](aggregate-sum-queries.md) — the existing |
| 253 | + sum-budget iterator (a different feature with a similar name). |
| 254 | +- [Hashing](hashing.md) — `node_hash_with_sum` and the broader |
| 255 | + hash-binding scheme. |
0 commit comments