Skip to content

Commit e49fa10

Browse files
docs: ProvableSumTree element + AggregateSumOnRange query
Final phase of the ProvableSumTree feature — documentation. Adds: - `docs/book/src/aggregate-sum-on-range-queries.md`: new dedicated chapter describing the AggregateSumOnRange query, the ProvableSumTree tree type it operates on, why the existing sum trees can't be queried this way, the proof node vocabulary (KVSum / KVHashSum / HashWithSum / KVDigestSum / KVRefValueHashSum at wire tags 0x30..=0x3D), and the signed-sum correctness notes (no zero-sum short-circuit; i128 accumulator with i64 narrowing at the entry points; overflow handling at i64::MAX extremes). - `docs/book/src/element-system.md`: ProvableSumTree row added to the aggregate-tree table; ProvableSummedMerkNode added to the TreeFeatureType enum block; NonCounted/NotSummed wrapper indices surfaced; explanation of when to choose ProvableSumTree over plain SumTree (sum is part of the protocol invariant vs metadata) and the rationale for the explicit `NotSummedProvableSumTree = 177` slot. - `docs/book/src/hashing.md`: parallel "Aggregate Hashing for ProvableSumTree" section showing node_hash_with_sum's i64 BE input layout and the wire-vs-hash encoding split. - `docs/book/src/appendix-a.md`: rows for NonCounted (15), NotSummed (16), and ProvableSumTree (17) added to the discriminant table. - `docs/book/src/aggregate-sum-queries.md`: disambiguation banner at the top distinguishing the existing sum-budget iterator from the new AggregateSumOnRange query, with a cross-link. - `docs/book/src/SUMMARY.md`: registers the new chapter. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 9c7d5e1 commit e49fa10

6 files changed

Lines changed: 324 additions & 0 deletions

File tree

docs/book/src/SUMMARY.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212
- [The Query System](query-system.md)
1313
- [Aggregate Sum Queries](aggregate-sum-queries.md)
1414
- [Aggregate Count Queries](aggregate-count-queries.md)
15+
- [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)
1516
- [Batch Operations](batch-operations.md)
1617
- [Cost Tracking](cost-tracking.md)
1718
- [The MMR Tree](mmr-tree.md)
Lines changed: 255 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,255 @@
1+
# Aggregate Sum on Range Queries
2+
3+
## Overview
4+
5+
An **Aggregate Sum on Range Query** lets a caller ask:
6+
7+
> "What is the total sum of children whose keys fall in this range, in this
8+
> `ProvableSumTree`?"
9+
10+
The answer is a signed `i64`, and on a `ProvableSumTree` it comes back with a
11+
cryptographic proof. A verifier holding the tree's root hash can compute the
12+
total from the proof in `O(log n + |boundary|)` work — without ever
13+
materializing the `SumItem` values themselves.
14+
15+
This is the parallel to [Aggregate Count on Range](aggregate-count-queries.md)
16+
for sum trees. The two query types are orthogonal: an aggregate-sum query
17+
returns a sum, an aggregate-count query returns a count, and a single
18+
`PathQuery` may not contain both.
19+
20+
> **Not to be confused with [Aggregate Sum Queries](aggregate-sum-queries.md).**
21+
> That existing API is a sum-budget iterator — it walks a SumTree returning
22+
> `(key, sum_value)` pairs until a running total is reached. `AggregateSumOnRange`
23+
> is a different feature: it answers "what is the verified total for keys in
24+
> this range?" without returning any values, and only against the
25+
> `ProvableSumTree` element type.
26+
27+
The feature is implemented as a `QueryItem` variant:
28+
29+
```rust
30+
pub enum QueryItem {
31+
Key(Vec<u8>),
32+
Range(Range<Vec<u8>>),
33+
// ... existing variants ...
34+
AggregateCountOnRange(Box<QueryItem>),
35+
36+
/// Sum the per-node sum contributions of children matched by the inner
37+
/// range, without returning them. Only valid on ProvableSumTree (and its
38+
/// `NonCounted` / `NotSummed` wrapper variants).
39+
AggregateSumOnRange(Box<QueryItem>),
40+
}
41+
```
42+
43+
The wrapped `QueryItem` is the **range to sum over**. As with
44+
`AggregateCountOnRange`, it must be one of the true range variants:
45+
`Range`, `RangeInclusive`, `RangeFrom`, `RangeTo`, `RangeToInclusive`,
46+
`RangeAfter`, `RangeAfterTo`, `RangeAfterToInclusive`. The single-key
47+
(`Key`), full-range (`RangeFull`), and self-nested (`AggregateSumOnRange`)
48+
variants are rejected — and `AggregateSumOnRange` may not wrap an
49+
`AggregateCountOnRange` either.
50+
51+
> **Why are `Key` and `RangeFull` rejected?**
52+
>
53+
> - **`Key(k)`** would return either `0` or the single child's sum
54+
> contribution — degenerate cases the existing `get_raw` /
55+
> `verify_query_with_options` paths already handle more cheaply.
56+
> - **`RangeFull`** has its answer already exposed by the parent's
57+
> `Element::ProvableSumTree(_, sum, _)` bytes, which are hash-verified by
58+
> the parent Merk's proof. Going through `AggregateSumOnRange(RangeFull)`
59+
> would always produce a strictly heavier proof for an answer the caller
60+
> can read directly.
61+
62+
## Why this works only on ProvableSumTree
63+
64+
GroveDB has several tree types that track a sum:
65+
66+
| Tree type | Sum tracked? | Sum in node hash? | AggregateSumOnRange allowed? |
67+
|----------------------------------|:------------:|:-----------------:|:---------------------------:|
68+
| `SumTree` | yes | no | **no** |
69+
| `BigSumTree` | yes (i128) | no | **no** |
70+
| `CountSumTree` | yes | no | **no** |
71+
| `ProvableCountSumTree` | yes | no (count only) | **no** |
72+
| `ProvableSumTree` | yes | **yes** | **yes** |
73+
| `NonCountedProvableSumTree` | yes (inner) | yes (inner) | **yes** |
74+
| `NotSummedProvableSumTree` | yes (inner) | yes (inner) | **yes** |
75+
76+
Only `ProvableSumTree` bakes the per-node sum into the node hash via
77+
`node_hash_with_sum(kv_hash, left, right, sum)`. Because every node's sum
78+
participates in the Merkle root, a verifier holding only the root hash can
79+
reconstruct enough of the tree from a proof to **trust** the sums embedded
80+
in it.
81+
82+
`SumTree`, `BigSumTree`, `CountSumTree`, and `ProvableCountSumTree` all
83+
track sums in storage, but those sums are not committed in the node hash
84+
chain. (For `ProvableCountSumTree`, the count is in the hash but the sum
85+
is not.) A "proof" of those sums would be unverifiable, so we reject
86+
`AggregateSumOnRange` against them at query-construction time.
87+
88+
The wrapper variants are accepted because the wrapper only changes how the
89+
**parent** aggregates this element — the inner is still a fully-fledged
90+
`ProvableSumTree`.
91+
92+
> **Why not `BigSumTree`?** `BigSumTree` uses `i128` sums and would need a
93+
> separate hash dispatch (`node_hash_with_big_sum`) plus a different verify
94+
> path. It is a documented follow-up, not part of this PR.
95+
96+
## Query-Level Constraints
97+
98+
`AggregateSumOnRange` is a **terminal** query item. Its presence reduces
99+
the enclosing `Query` to a single, well-defined operation: "sum, then
100+
return."
101+
102+
If any `QueryItem::AggregateSumOnRange(_)` appears in `Query::items`, the
103+
query is well-formed only when:
104+
105+
1. `items.len() == 1` — no other items, no other sums, no mixing with
106+
`AggregateCountOnRange`.
107+
2. The inner `QueryItem` is **not** `Key`, `RangeFull`, or another
108+
`AggregateSumOnRange` / `AggregateCountOnRange`.
109+
3. `default_subquery_branch.subquery.is_none()` and
110+
`subquery_path.is_none()`.
111+
4. `conditional_subquery_branches.is_none()` (or empty).
112+
5. The targeted subtree's `TreeType` is `ProvableSumTree`.
113+
6. The enclosing `SizedQuery` does not set `limit` or `offset`. Summing
114+
is aggregate over the matched range — pagination would silently change
115+
the answer and is rejected.
116+
7. `left_to_right` is **ignored** (summing is direction-agnostic).
117+
118+
Violations return `Error::InvalidQuery(...)` before any I/O.
119+
120+
## API Surface
121+
122+
```rust
123+
// Prove side — unchanged from regular queries:
124+
GroveDb::prove_query(&path_query, prove_options, grove_version)
125+
-> CostResult<Vec<u8>, Error>
126+
127+
// Verify side — dedicated, returns (root_hash, sum):
128+
GroveDb::verify_aggregate_sum_query(proof, &path_query, grove_version)
129+
-> Result<(CryptoHash, i64), Error>
130+
```
131+
132+
A bare tuple is used rather than a wrapper struct: the sum is already an
133+
`i64` and the `path_query` echoes the inner range.
134+
135+
> **Note on `NonCounted` and `NotSummed` children.** An
136+
> `Element::NotSummed(child)` wrapper tells the parent sum tree to skip the
137+
> wrapped element when aggregating its own sum. `AggregateSumOnRange`
138+
> honors this: every node in a `ProvableSumTree` carries an own-sum equal
139+
> to its own `SumItem` value or `0` if `NotSummed`-wrapped. The verifier
140+
> credits only the **own-sum** to the in-range total when the boundary key
141+
> falls in range. `NonCounted` is orthogonal to sums — it suppresses count
142+
> aggregation, not sum aggregation — so a `NonCounted` `SumItem` still
143+
> contributes its sum value normally.
144+
145+
## Proof Node Vocabulary
146+
147+
For `ProvableSumTree`, every node hash commits to its subtree's aggregate
148+
sum via `node_hash_with_sum(kv_hash, left, right, sum)`. The proof-node
149+
vocabulary is parallel to the count family, with new variants carrying an
150+
`i64` sum field in place of the `u64` count:
151+
152+
| Role in proof | Proof node type | What it carries |
153+
|----------------------------|------------------------------------------------------------------------------|----------------------------------------------------------------|
154+
| **On-path / boundary** | `KVDigestSum(key, value_hash, sum)` | key + value digest + subtree sum |
155+
| **Fully-inside / outside** | `HashWithSum(kv_hash, left_hash, right_hash, sum)` | the four fields needed to recompute `node_hash_with_sum` |
156+
| **Queried boundary item** | `KVSum(key, value, sum)` | leaf value at a boundary key, with subtree sum |
157+
| **Empty side** | (the empty-tree sentinel, no `Push` needed) ||
158+
159+
Wire format tag bytes (V1 only): `0x30..=0x3D` for the push and
160+
push-inverted variants. The on-the-wire sum field is `varint i64` (not
161+
fixed-width) for compactness; the **hash input** to `node_hash_with_sum`
162+
uses fixed 8-byte big-endian — wire and hash are deliberately decoupled.
163+
164+
> **Why `HashWithSum` is self-verifying.** The `sum` value carried by a
165+
> `HashWithSum` op is *bound* to the parent merk's hash chain, not
166+
> trusted on faith. The verifier recomputes
167+
> `node_hash_with_sum(kv_hash, left, right, sum)` from the four fields
168+
> and uses the result as the subtree's committed `node_hash` for the
169+
> parent's hash recomputation. If the prover lies about `sum`, the
170+
> recomputed `node_hash` diverges from what the parent committed, and the
171+
> parent's Merkle-root check fails.
172+
173+
The walk-by-example diagrams from
174+
[Aggregate Count on Range Queries](aggregate-count-queries.md) apply
175+
unchanged — substitute `KVDigestCount``KVDigestSum` and
176+
`HashWithCount``HashWithSum`.
177+
178+
## Signed-Sum Arithmetic
179+
180+
Two correctness points differ from the count machinery:
181+
182+
### Negative sums
183+
184+
A `ProvableSumTree` can hold negative `SumItem` values, and a range can
185+
sum to a negative or zero total. Two consequences:
186+
187+
- **No `if sum == 0` short-circuit.** The count generator can skip an
188+
empty subtree (count = 0 means "no elements"), but `sum == 0` does
189+
**not** mean "no elements" — it can mean "+5 and -5 cancelled". The
190+
sum prover descends regardless.
191+
- **No `own_sum = aggregate − left_struct − right_struct` overflow
192+
check.** Count uses `checked_sub` to catch "children claim more than
193+
parent" as corruption. Signed sums can naturally have children's
194+
structural sums in any combination (`+200 + -150 = +50`), so the
195+
subtraction is allowed to wrap. The hash chain still binds every
196+
node, so arithmetic corruption changes the reconstructed root hash
197+
and the caller's root check catches it.
198+
199+
### i64 overflow at extremes
200+
201+
A sum of two `i64::MAX` children does **not** fit in `i64`. The verify
202+
path accumulates in `i128` end-to-end:
203+
204+
- The prover's internal recursion (`emit_sum_proof`) returns
205+
`CostResult<i128, Error>`.
206+
- The verifier's `verify_sum_shape` accumulates into an `i128`.
207+
- Both narrow to `i64` at the **outermost entry point** via
208+
`i64::try_from(sum_i128)`, returning `Error::InvalidProofError` if
209+
the i128 result doesn't fit.
210+
211+
Tests cover the two interesting overflow shapes:
212+
213+
- `i64::MAX + i64::MAX` → overflows i64, verify rejects with
214+
`InvalidProofError`.
215+
- `i64::MAX + i64::MIN``-1`, fits i64, verify succeeds. The
216+
intermediate i128 carries the difference safely.
217+
218+
## Tests and Examples
219+
220+
See:
221+
222+
- `grovedb/src/tests/aggregate_sum_query_tests.rs` — 21 end-to-end
223+
GroveDB tests.
224+
- `merk/src/proofs/query/aggregate_sum.rs` — 14 Merk-level tests
225+
(classification, prover internals, single-`Hash` rejection,
226+
disjoint-with-children rejection, overflow at i64::MAX).
227+
- `grovedb/src/operations/proof/aggregate_sum.rs` — V0/V1 envelope walker
228+
with layer-chain validation.
229+
230+
The marquee scenarios:
231+
232+
| Scenario | Result |
233+
|-------------------------------------------------------|-------------------------------------|
234+
| Full range over `[1..=15]` | sum = 120 |
235+
| Subrange `[5..=10]` | sum = 45 |
236+
| Mixed `+50, -100, +30, -50` | sum = -70 |
237+
| All-negative subrange | sum = -10 |
238+
| `+5, -5` (non-zero children, zero sum) | sum = 0 (no short-circuit) |
239+
| `i64::MAX + i64::MAX` | `Error::InvalidProofError` |
240+
| `i64::MAX + i64::MIN` | sum = -1 |
241+
| Tampered `HashWithSum::sum` | rejected (root-hash divergence) |
242+
| `NotSummed(SumItem)` in range | excluded (matches tree's aggregate) |
243+
| Query with subquery / pagination / mixed aggregates | rejected at validation |
244+
245+
## See Also
246+
247+
- [Element System](element-system.md) — the `ProvableSumTree` element
248+
variant and `ProvableSummedMerkNode` feature type.
249+
- [Aggregate Count on Range Queries](aggregate-count-queries.md) — the
250+
symmetric count-only feature; most of the proof-shape walk diagrams
251+
apply unchanged.
252+
- [Aggregate Sum Queries](aggregate-sum-queries.md) — the existing
253+
sum-budget iterator (a different feature with a similar name).
254+
- [Hashing](hashing.md)`node_hash_with_sum` and the broader
255+
hash-binding scheme.

docs/book/src/aggregate-sum-queries.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
# Aggregate Sum Queries
22

3+
> **Heads up — two different features.** This page covers the
4+
> sum-budget iterator: walk a `SumTree` returning `(key, sum_value)` pairs
5+
> until a running total is reached. If you instead want a **cryptographically
6+
> verifiable total** for a key range against a `ProvableSumTree`, see
7+
> [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md).
8+
> The two features are independent — the iterator does not produce a
9+
> proof of the running total, only the elements that contributed to it.
10+
311
## Overview
412

513
Aggregate Sum Queries are a specialized query type designed for **SumTrees** in GroveDB.

docs/book/src/appendix-a.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@
1717
| 12 | `MmrTree` | 8 | `(mmr_size: u64, flags)` | 11 | Append-only MMR log |
1818
| 13 | `BulkAppendTree` | 9 | `(total_count: u64, chunk_power: u8, flags)` | 12 | High-throughput append-only log |
1919
| 14 | `DenseAppendOnlyFixedSizeTree` | 10 | `(count: u16, height: u8, flags)` | 6 | Dense fixed-capacity Merkle storage |
20+
| 15 | `NonCounted` | wrapper | `Box<Element>` | inner + 1 byte | Opts inner out of parent count aggregation |
21+
| 16 | `NotSummed` | wrapper | `Box<Element>` | inner + 1 byte | Opts inner out of parent sum aggregation |
22+
| 17 | `ProvableSumTree` | 11 | `(root_key, sum: i64, flags)` | SUM_TREE_COST_SIZE | Sum baked into hash (see [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)) |
2023

2124
**Notes:**
2225
- Discriminants 11–14 are **non-Merk trees**: data lives outside a child Merk subtree

docs/book/src/element-system.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ pub enum Element {
2525
MmrTree(u64, Option<ElementFlags>), // [12]
2626
BulkAppendTree(u64, u8, Option<ElementFlags>), // [13]
2727
DenseAppendOnlyFixedSizeTree(u16, u8, Option<ElementFlags>), // [14]
28+
NonCounted(Box<Element>), // [15] wrapper byte
29+
NotSummed(Box<Element>), // [16] wrapper byte
30+
ProvableSumTree(Option<Vec<u8>>, SumValue, Option<ElementFlags>), // [17]
2831
}
2932
```
3033

@@ -155,11 +158,31 @@ Additional aggregate tree types:
155158
| `BigSumTree` | `BigSummedMerkNode(i128)` | 128-bit sum for large values |
156159
| `ProvableCountTree` | `ProvableCountedMerkNode(u64)` | Count baked into hash |
157160
| `ProvableCountSumTree` | `ProvableCountedSummedMerkNode(u64, i64)` | Count in hash + sum |
161+
| `ProvableSumTree` | `ProvableSummedMerkNode(i64)` | Sum baked into hash |
158162

159163
**ProvableCountTree** is special: its count is included in the `node_hash`
160164
computation (via `node_hash_with_count`), so a proof can verify the count without
161165
revealing any values.
162166

167+
**ProvableSumTree** is the parallel for sums: each node's aggregate sum is
168+
included in the `node_hash` via `node_hash_with_sum(kv_hash, left, right, sum)`,
169+
so a proof can return the verified total of any key range without revealing
170+
the underlying `SumItem` values. Use this when the sum is part of the
171+
protocol invariant — stake weights, fee priorities, vote tallies — and a
172+
peer needs to verify totals from the root hash alone. Use plain `SumTree`
173+
when the sum is bookkeeping metadata that doesn't need cryptographic
174+
binding. The per-node hashing cost is a small fixed addition over plain
175+
`SumTree`. See [Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md)
176+
for the verifiable range-sum query that this element enables.
177+
178+
Like its count counterpart, `ProvableSumTree` accepts the `NotSummed`
179+
wrapper so a sum-bearing child can opt out of contributing to its parent's
180+
running sum. The `NotSummed` ElementType twin lives at slot 177 in the
181+
`0xB0..=0xBF` family range, assigned explicitly rather than via the
182+
`prefix | base` formula used elsewhere (the formula would collide because
183+
ProvableSumTree's base discriminant `17` and `0xb0 | 17 = 0xB1` would mask
184+
back to `Reference` under the legacy `& 0x0F` inverse).
185+
163186
## Element Serialization
164187

165188
Elements are serialized using **bincode** with big-endian byte order:
@@ -198,6 +221,7 @@ pub enum TreeFeatureType {
198221
CountedSummedMerkNode(u64, i64), // Count + sum
199222
ProvableCountedMerkNode(u64), // Count in hash
200223
ProvableCountedSummedMerkNode(u64, i64), // Count in hash + sum
224+
ProvableSummedMerkNode(i64), // Sum in hash
201225
}
202226
```
203227

docs/book/src/hashing.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -231,4 +231,37 @@ pub fn node_hash_with_count(
231231
This means a proof of count doesn't require revealing the actual data — the count
232232
is baked into the cryptographic commitment.
233233

234+
## Aggregate Hashing for ProvableSumTree
235+
236+
`ProvableSumTree` is the sum parallel — each node's aggregate sum is bound
237+
into the node hash:
238+
239+
```rust
240+
pub fn node_hash_with_sum(
241+
kv: &CryptoHash,
242+
left: &CryptoHash,
243+
right: &CryptoHash,
244+
sum: i64,
245+
) -> CostContext<CryptoHash> {
246+
let mut hasher = blake3::Hasher::new();
247+
hasher.update(kv); // 32 bytes
248+
hasher.update(left); // 32 bytes
249+
hasher.update(right); // 32 bytes
250+
hasher.update(&sum.to_be_bytes()); // 8 bytes (signed i64 BE)
251+
// Same 2 hash ops as node_hash_with_count
252+
}
253+
```
254+
255+
Hashing uses fixed 8-byte big-endian `i64::to_be_bytes()` (signed),
256+
**not** the varint encoding used for wire-format compactness in proof
257+
ops. The two are deliberately decoupled: wire wants compact, the hash
258+
input must be canonical and length-fixed so the verifier reconstructs the
259+
exact pre-image. Negative sums hash correctly because two's-complement
260+
big-endian is a deterministic content-binding encoding (no order
261+
preservation needed).
262+
263+
A proof against a `ProvableSumTree` can return the verified total of any
264+
key range without revealing the underlying `SumItem` values — see
265+
[Aggregate Sum on Range Queries](aggregate-sum-on-range-queries.md).
266+
234267
---

0 commit comments

Comments
 (0)