Skip to content

Commit 40b3c16

Browse files
feat(grovedb): verify_grovedb consistency check for aggregate fields
Phase 4 of the ProvableSumTree feature. The existing `verify_merk_and_submerks_in_transaction` walk is cryptographically complete — `combine_hash(value_hash(parent_bytes), inner_merk_root) == stored_element_value_hash` catches any byte-level tampering, and for ProvableSumTree the inner aggregate is bound into the inner Merk's root_hash via `node_hash_with_sum` (Phase 2). What that walk did not catch was the *software-consistency* class of drift: a parent `ProvableSumTree(_, N, _)` whose stored sum field N disagrees with the inner Merk's actual `aggregate_data()` value M. For provable variants both N and M are bound into element_value_hash, but they live on disk independently and could disagree if Phase 3's propagation logic drifts. For non-provable variants (SumTree, BigSumTree, CountTree, CountSumTree) the recorded aggregate isn't hash-bound at all, so a pure software bug in propagation would silently corrupt the tree. LIB.RS — verify_merk_and_submerks_in_transaction After the existing cryptographic check, for any tree element whose inner Merk holds actual data (i.e. excluding the non-Merk-data trees CommitmentTree/MmrTree/BulkAppendTree/DenseTree, which already short-circuit via `uses_non_merk_data_storage`), the verifier now opens the inner Merk, reads its `aggregate_data()`, and compares against the parent's recorded aggregate field via a new free helper `aggregate_consistency_labels`. The helper covers all seven aggregate- bearing tree variants: - SumTree vs. AggregateData::Sum - ProvableSumTree vs. AggregateData::ProvableSum - BigSumTree vs. AggregateData::BigSum - CountTree vs. AggregateData::Count - CountSumTree vs. AggregateData::CountAndSum - ProvableCountTree vs. AggregateData::ProvableCount - ProvableCountSumTree vs. AggregateData::ProvableCountAndSum Plus an empty-Merk identity case (NoAggregateData with zero recorded aggregate matches), and a fallback that reports any variant-shape mismatch (e.g. ProvableSumTree paired with AggregateData::Count(_)). VERIFICATIONISSUES SHAPE — placeholder hashes, not type extension `VerificationIssues` is a private type alias `HashMap<Vec<Vec<u8>>, (CryptoHash, CryptoHash, CryptoHash)>` whose shape is consumed by `visualize_verify_grovedb`. To avoid breaking its callers and the visualize hex output, mismatched aggregates are packed into deterministic placeholder CryptoHashes via `blake3(format!("recorded ..."))` and `blake3(format!("inner ..."))`, slotted into the "expected" and "actual" fields. The "root" slot reuses the inner-Merk root_hash for path locality. Documented inline. INTEGRITY WALK TESTS — 7 new tests A new `integrity_walk_tests` module in `provable_sum_tree_tests.rs` exercises the verifier end-to-end via two raw-storage tampering helpers: - `tamper_value_no_hash_update` decodes the on-disk TreeNode for a leaf, replaces only its element bytes, re-encodes (leaving the stored value_hash stale), writes back via the immediate storage context. Simulates byte-level tampering caught by the SumItem arm's `value_hash(bytes) != stored_value_hash` check. - `tamper_parent_element_with_consistent_hashes` splices in fresh element bytes AND recomputes hash + value_hash to remain crypto-consistent with the inner Merk's existing root_hash. Used for aggregate-mismatch scenarios — the crypto check passes, but the new aggregate-consistency check fires. Offsets into the on-disk TreeNodeInner encoding are derived from the decoded `value_as_slice().len()`. Scenarios covered: 1. Inner SumItem value tamper (different bytes) — crypto check catches it. 2. Inner SumItem same-length value tamper — crypto check catches it (assert: hashes, not lengths, are what's verified). 3. Parent ProvableSumTree aggregate mismatch (sum=999 stored vs. 40 actual) — new aggregate-consistency check fires. 4. Clean ProvableSumTree verifies clean (with mixed positive, negative, zero, and large values). 5. Clean ProvableCountTree verifies clean. 6. Parent ProvableCountTree aggregate mismatch (count=9999 vs. 3) — sanity check that the generalized helper handles the count variant too. 7. Reload-after-write determinism: insert, drop the db handle, reopen, verify_grovedb reports zero issues; the parent's ProvableSumTree.sum_value field round-trips. Workspace cargo test --all-features green: 2898 passing (Phase 3 baseline of 2891 + 7 new), zero failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 73697a8 commit 40b3c16

2 files changed

Lines changed: 862 additions & 0 deletions

File tree

grovedb/src/lib.rs

Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1020,6 +1020,60 @@ impl GroveDb {
10201020
);
10211021
}
10221022

1023+
// Software-consistency check: the aggregate fields
1024+
// stored in the parent's tree element (e.g.
1025+
// `sum_value` in `ProvableSumTree(_, sum_value, _)`)
1026+
// must agree with the inner Merk's actual
1027+
// `aggregate_data()`. This is distinct from the
1028+
// cryptographic check above: for the provable
1029+
// variants, both the recorded aggregate field AND
1030+
// the actual inner aggregate are bound into
1031+
// element_value_hash, but they are independently
1032+
// representable on disk and could disagree if a
1033+
// propagation bug (or storage corruption) drifts
1034+
// them out of sync. For non-provable variants, the
1035+
// aggregate field is stored alongside but not bound
1036+
// into the hash; an out-of-sync field is therefore
1037+
// a pure software bug, and the cryptographic check
1038+
// would not catch it.
1039+
//
1040+
// Non-Merk data trees (CommitmentTree, MmrTree,
1041+
// BulkAppendTree, DenseTree) keep an empty inner
1042+
// Merk by design, so their `aggregate_data()` is
1043+
// always `NoAggregateData`. Skip them here; the
1044+
// recursion below is already skipped for them via
1045+
// `uses_non_merk_data_storage`.
1046+
//
1047+
// For aggregate-mismatch logging we reuse the
1048+
// existing `VerificationIssues` shape
1049+
// (HashMap<path, (CryptoHash, CryptoHash,
1050+
// CryptoHash)>) by packing the recorded vs. actual
1051+
// aggregate values into a deterministic placeholder
1052+
// hash via blake3. This avoids breaking the type
1053+
// signature and all callers (including
1054+
// `visualize_verify_grovedb`), at the cost of the
1055+
// hex output being a placeholder rather than a
1056+
// real Merk hash. The recorded-aggregate hash is
1057+
// placed in the "expected" slot and the
1058+
// actual-aggregate hash in the "actual" slot; the
1059+
// "root" slot reuses the inner-Merk root hash for
1060+
// path-locality.
1061+
if !element.uses_non_merk_data_storage() {
1062+
let actual_aggregate = inner_merk.aggregate_data().map_err(MerkError)?;
1063+
if let Some((recorded_label, actual_label)) =
1064+
aggregate_consistency_labels(&element, &actual_aggregate)
1065+
{
1066+
let expected_placeholder: CryptoHash =
1067+
blake3::hash(recorded_label.as_bytes()).into();
1068+
let actual_placeholder: CryptoHash =
1069+
blake3::hash(actual_label.as_bytes()).into();
1070+
issues.insert(
1071+
new_path.to_vec(),
1072+
(root_hash, expected_placeholder, actual_placeholder),
1073+
);
1074+
}
1075+
}
1076+
10231077
// Non-Merk data trees (CommitmentTree, MmrTree,
10241078
// BulkAppendTree, DenseTree) store data in the data
10251079
// namespace as non-Element entries. Recursing into
@@ -1203,6 +1257,187 @@ impl GroveDb {
12031257
}
12041258
}
12051259

1260+
/// Inspect a tree-bearing Element together with the actual aggregate data of
1261+
/// its inner Merk. Returns `Some((recorded_label, actual_label))` if the
1262+
/// aggregate field(s) stored in the element disagree with `actual`, or
1263+
/// `None` if they match (or if `element` is not a tree variant that carries
1264+
/// an aggregate field reflecting the inner Merk's `aggregate_data()`).
1265+
///
1266+
/// The string labels are intended to be hashed into deterministic placeholder
1267+
/// `CryptoHash` values for inclusion in `VerificationIssues`.
1268+
///
1269+
/// Coverage:
1270+
/// - `SumTree(_, n, _)` vs. `AggregateData::Sum(m)`.
1271+
/// - `ProvableSumTree(_, n, _)` vs. `AggregateData::ProvableSum(m)`.
1272+
/// - `BigSumTree(_, n, _)` vs. `AggregateData::BigSum(m)`.
1273+
/// - `CountTree(_, n, _)` vs. `AggregateData::Count(m)`.
1274+
/// - `CountSumTree(_, c, s, _)` vs. `AggregateData::CountAndSum(cm, sm)`.
1275+
/// - `ProvableCountTree(_, n, _)` vs. `AggregateData::ProvableCount(m)`.
1276+
/// - `ProvableCountSumTree(_, c, s, _)` vs.
1277+
/// `AggregateData::ProvableCountAndSum(cm, sm)`.
1278+
///
1279+
/// A plain `Element::Tree(..)` has no aggregate field; the inner Merk's
1280+
/// `aggregate_data` is `NoAggregateData` by construction, and any other
1281+
/// value would be a separate corruption (caught by the type/feature checks
1282+
/// elsewhere). We return `None` for it here.
1283+
///
1284+
/// A variant/aggregate-shape mismatch (e.g. `ProvableSumTree` whose inner
1285+
/// Merk reports `AggregateData::Count(_)` instead of `ProvableSum(_)`) is
1286+
/// also reported, because the inner Merk's tree-type has drifted from what
1287+
/// the parent element claims.
1288+
#[cfg(feature = "minimal")]
1289+
fn aggregate_consistency_labels(
1290+
element: &Element,
1291+
actual: &AggregateData,
1292+
) -> Option<(String, String)> {
1293+
match (element, actual) {
1294+
// --- Plain Tree: no aggregate, never reports a mismatch.
1295+
(Element::Tree(..), AggregateData::NoAggregateData) => None,
1296+
1297+
// --- SumTree variants ---
1298+
(Element::SumTree(_, recorded, _), AggregateData::Sum(actual_sum)) => {
1299+
if recorded == actual_sum {
1300+
None
1301+
} else {
1302+
Some((
1303+
format!("SumTree recorded sum {}", recorded),
1304+
format!("inner aggregate Sum {}", actual_sum),
1305+
))
1306+
}
1307+
}
1308+
(Element::ProvableSumTree(_, recorded, _), AggregateData::ProvableSum(actual_sum)) => {
1309+
if recorded == actual_sum {
1310+
None
1311+
} else {
1312+
Some((
1313+
format!("ProvableSumTree recorded sum {}", recorded),
1314+
format!("inner aggregate ProvableSum {}", actual_sum),
1315+
))
1316+
}
1317+
}
1318+
(Element::BigSumTree(_, recorded, _), AggregateData::BigSum(actual_sum)) => {
1319+
if recorded == actual_sum {
1320+
None
1321+
} else {
1322+
Some((
1323+
format!("BigSumTree recorded sum {}", recorded),
1324+
format!("inner aggregate BigSum {}", actual_sum),
1325+
))
1326+
}
1327+
}
1328+
(Element::CountTree(_, recorded, _), AggregateData::Count(actual_count)) => {
1329+
if recorded == actual_count {
1330+
None
1331+
} else {
1332+
Some((
1333+
format!("CountTree recorded count {}", recorded),
1334+
format!("inner aggregate Count {}", actual_count),
1335+
))
1336+
}
1337+
}
1338+
(
1339+
Element::CountSumTree(_, recorded_count, recorded_sum, _),
1340+
AggregateData::CountAndSum(actual_count, actual_sum),
1341+
) => {
1342+
if recorded_count == actual_count && recorded_sum == actual_sum {
1343+
None
1344+
} else {
1345+
Some((
1346+
format!(
1347+
"CountSumTree recorded count {} sum {}",
1348+
recorded_count, recorded_sum
1349+
),
1350+
format!(
1351+
"inner aggregate CountAndSum count {} sum {}",
1352+
actual_count, actual_sum
1353+
),
1354+
))
1355+
}
1356+
}
1357+
(
1358+
Element::ProvableCountTree(_, recorded, _),
1359+
AggregateData::ProvableCount(actual_count),
1360+
) => {
1361+
if recorded == actual_count {
1362+
None
1363+
} else {
1364+
Some((
1365+
format!("ProvableCountTree recorded count {}", recorded),
1366+
format!("inner aggregate ProvableCount {}", actual_count),
1367+
))
1368+
}
1369+
}
1370+
(
1371+
Element::ProvableCountSumTree(_, recorded_count, recorded_sum, _),
1372+
AggregateData::ProvableCountAndSum(actual_count, actual_sum),
1373+
) => {
1374+
if recorded_count == actual_count && recorded_sum == actual_sum {
1375+
None
1376+
} else {
1377+
Some((
1378+
format!(
1379+
"ProvableCountSumTree recorded count {} sum {}",
1380+
recorded_count, recorded_sum
1381+
),
1382+
format!(
1383+
"inner aggregate ProvableCountAndSum count {} sum {}",
1384+
actual_count, actual_sum
1385+
),
1386+
))
1387+
}
1388+
}
1389+
1390+
// --- Empty-merk edge case: an empty Merk returns NoAggregateData
1391+
// for any tree type. This is the correct initial state for a
1392+
// freshly-inserted tree element. Treat as not-mismatching as long
1393+
// as the recorded aggregate is the identity for that variant
1394+
// (zero / zero counts). Anything else is a real mismatch. ---
1395+
(Element::SumTree(_, recorded, _), AggregateData::NoAggregateData) if *recorded == 0 => {
1396+
None
1397+
}
1398+
(Element::ProvableSumTree(_, recorded, _), AggregateData::NoAggregateData)
1399+
if *recorded == 0 =>
1400+
{
1401+
None
1402+
}
1403+
(Element::BigSumTree(_, recorded, _), AggregateData::NoAggregateData) if *recorded == 0 => {
1404+
None
1405+
}
1406+
(Element::CountTree(_, recorded, _), AggregateData::NoAggregateData) if *recorded == 0 => {
1407+
None
1408+
}
1409+
(
1410+
Element::CountSumTree(_, recorded_count, recorded_sum, _),
1411+
AggregateData::NoAggregateData,
1412+
) if *recorded_count == 0 && *recorded_sum == 0 => None,
1413+
(Element::ProvableCountTree(_, recorded, _), AggregateData::NoAggregateData)
1414+
if *recorded == 0 =>
1415+
{
1416+
None
1417+
}
1418+
(
1419+
Element::ProvableCountSumTree(_, recorded_count, recorded_sum, _),
1420+
AggregateData::NoAggregateData,
1421+
) if *recorded_count == 0 && *recorded_sum == 0 => None,
1422+
1423+
// --- Non-Merk data trees: caller skips us via
1424+
// `uses_non_merk_data_storage`; if we end up here anyway, do not
1425+
// report. ---
1426+
(Element::CommitmentTree(..), _)
1427+
| (Element::MmrTree(..), _)
1428+
| (Element::BulkAppendTree(..), _)
1429+
| (Element::DenseAppendOnlyFixedSizeTree(..), _) => None,
1430+
1431+
// --- Anything else is a variant/aggregate-shape mismatch (e.g.
1432+
// the inner Merk's tree-type has drifted from what the parent
1433+
// claims). Report with descriptive labels. ---
1434+
(element, actual) => Some((
1435+
format!("element variant {}", element.type_str()),
1436+
format!("inner aggregate variant {:?}", actual),
1437+
)),
1438+
}
1439+
}
1440+
12061441
/// Test-only helpers for verifying internal storage state.
12071442
#[cfg(all(test, feature = "minimal"))]
12081443
impl GroveDb {

0 commit comments

Comments
 (0)