feat: add CountIndexedTree element with auto-cascading secondary index#657
feat: add CountIndexedTree element with auto-cascading secondary index#657QuantumExplorer wants to merge 69 commits into
Conversation
Adds two new GroveDB element types — CountIndexedTree and ProvableCountIndexedTree — that pair a CountTree-shaped primary Merk with a count-ordered secondary Merk for sub-linear top-k and count-range queries. Each element points at two child Merks. The parent Merk binds both via H1-A composition: combined_value_hash = Blake3(actual_value_hash || primary_root_hash || secondary_root_hash). The secondary is itself a ProvableCountTree (each entry contributes count = 1) so existing AggregateCountOnRange machinery applies natively. Storage prefix derivation (S2-B): primary keeps the existing build_prefix(path); secondary is Blake3(primary_prefix || 0x01). Public API: - insert_into_count_indexed_tree / delete_from_count_indexed_tree — dedicated direct APIs that mirror to the secondary inline and chain the H1-A combine into the parent. - count_indexed_top_k / count_indexed_count_range — read APIs walking the secondary in count order. - reconcile_count_indexed_tree_secondary — rebuild the secondary from the primary on demand; used after batch operations that bypass the dedicated write path. - prove_count_indexed_top_k / verify_count_indexed_top_k — proof generation and verification for top-k queries, binding the secondary range proof to the GroveDB root hash via the H1-A composition. - Empty CountIndexedTree elements can be created via apply_batch. Auto-cascading: propagate_changes_with_transaction is now CountIndexed- aware. When the propagation pass crosses a CountIndexedTree primary level, it mirrors the count delta to that level's secondary; when a CountIndexedTree element needs reconstruction, it uses the H1-A three-input combine. Nested CountIndexedTrees and deep db.insert paths through sub-trees of a cidx primary cascade correctly. Design doc at docs/book/src/count-indexed-tree.md captures the ratified decisions (H1-A, S2-B, V1-A, Q1-A, S1-A, Q2 with conditional subqueries deferred). Spike note at docs/spikes/cascading-aggregation-spike.md records the architectural analysis for the propagation refactor. Tests: 27 dedicated tests covering empty creation, insert/update/delete with count deltas, NonCounted handling, deep cascading through sub-trees, nested CountIndexedTrees, top-k and count-range queries, reconciliation, batch creation, proof round-trips, and forge tests (tampered bytes, wrong path). Workspace: 2615 lib tests pass, no regressions. Deferred for follow-up: - Item-level batch inserts INTO a cidx primary (use the dedicated API) - Replication / chunk restoration support for two-Merk subtrees - Conditional-by-count subqueries within CountIndexedQuery (Q2.3) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds ChangesCountIndexedTree Dual-Merk Implementation
Estimated code review effort
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (4)
merk/src/tree/ops.rs (1)
389-410: ⚡ Quick winAdd focused unit coverage for the new op variants.
This module’s local tests still exercise only the legacy
Put/Deletepaths, so regressions innew_with_layered_value_hash_three(...)orput_value_with_two_reference_value_hashes_and_value_cost(...)would currently slip through here. A pair of tests that hits bothapply_to(None, ...)and update-on-existing-node would lock down the new hashing path well.Also applies to: 561-589
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/tree/ops.rs` around lines 389 - 410, Add unit tests that exercise the new op variants PutLayeredCountIndexedReference and ReplaceLayeredCountIndexedReference so the layered hashing path is covered: write tests that call the op's apply_to(None, ...) to create a fresh node and then apply the op again against an existing node (update-on-existing-node) to exercise TreeNode::new_with_layered_value_hash_three and the put_value_with_two_reference_value_hashes_and_value_cost code paths; assert expected node hashes, costs, and stored references (use mid_key/mid_value equivalents from the diff) and mirror these tests for both variants to prevent regressions.merk/src/element/reconstruct.rs (1)
97-125: ⚡ Quick winAdd a direct test for
reconstruct_with_two_root_keys.This helper is now the reconstruction path for count-indexed parents, but the test module still exercises only
reconstruct_with_root_key. A small test for both raw andNonCounted-wrapped count-indexed elements would catch swapped root keys or wrapper loss early.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/element/reconstruct.rs` around lines 97 - 125, Add unit tests that call reconstruct_with_two_root_keys directly: create both CountIndexedTree and ProvableCountIndexedTree Element instances (and their NonCounted(Box::new(...)) wrapped variants), call reconstruct_with_two_root_keys with distinct primary_root_key and secondary_root_key and an AggregateData that yields a known count, and assert the returned Element preserves the correct variant, wrapper (NonCounted present when expected), and that primary_root_key and secondary_root_key are placed in the reconstructed Element in the correct order (i.e., not swapped). Use the existing AggregateData helpers and Element constructors to build inputs and compare reconstructed fields to expected values.grovedb/src/operations/count_indexed_tree.rs (2)
316-381: ⚡ Quick winExtract the nested-secondary mirror path into one helper.
The grandparent lookup, parent-secondary mirror, and deferred-secondary seeding logic is duplicated almost verbatim in both insert and delete. This path is subtle, and keeping two copies in sync will be error-prone as the CountIndexedTree propagation rules evolve. A shared helper returning the initial deferred-secondary state would reduce drift risk here.
Also applies to: 1090-1155
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/count_indexed_tree.rs` around lines 316 - 381, The code that computes initial_deferred_secondary (the grandparent lookup, extracting parent_secondary_root_key_before from gp_element, opening parent_secondary_merk via open_count_indexed_secondary_at_path, calling mirror_to_secondary, and returning (sh, sk) from parent_secondary_merk.root_hash_key_and_aggregate_data()) is duplicated in insert and delete; extract this into a single helper (e.g., compute_initial_deferred_secondary or seed_nested_secondary) that accepts parent_path, parent_merk, count_indexed_key, old_count_in_parent, new_count_in_parent, transaction, batch, grove_version and returns Option<(sh, sk)> or an error, then replace both duplicated blocks with a call to that helper and reuse it from the same call sites (keeping references to mirror_to_secondary, open_transactional_merk_at_path, open_count_indexed_secondary_at_path and Element::CountIndexedTree/ProvableCountIndexedTree logic inside the helper).
155-163: ⚡ Quick winUse
cost_return_on_error!for these early exits.These branches hand-roll
return Err(...).wrap_with_cost(cost)instead of using the repo-standard early-return helper that the rest of the Rust codebase expects for cost accounting. Converting these sites would make the file consistent with the project convention.As per coding guidelines
**/*.rs: Usecost_return_on_error!macro for early returns with cost accumulation in Rust source files.Also applies to: 478-486, 847-855, 933-941
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/count_indexed_tree.rs` around lines 155 - 163, Replace the manual early-return that does `return Err(Error::InvalidPath(...)).wrap_with_cost(cost)` after calling `path.derive_parent()` with the project-standard macro `cost_return_on_error!`, e.g. invoke `cost_return_on_error!(Error::InvalidPath("cannot insert into count-indexed tree at the root path".to_string()), cost)` so cost accounting is applied consistently; apply the same change to the other analogous early-exit sites in this file that wrap `Err(...).wrap_with_cost(cost)` (the other occurrences around the count-indexed-tree logic) so all early returns use `cost_return_on_error!` instead of hand-rolled `wrap_with_cost(cost)`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/book/src/count-indexed-tree.md`:
- Around line 3-7: Update the status banner string "Status: design ratified,
awaiting implementation." in the docs/book/src/count-indexed-tree.md to reflect
that the feature is implemented and available (e.g., change to "Status:
implemented and available" or similar); locate the exact banner line containing
that phrase and replace it with the new implemented/available wording so the
docs match the delivered code and tests.
- Around line 212-217: Replace the absolute phrase "Collision-free secondary"
and any wording that claims absolute collision-freedom with language that
accurately describes domain separation and collision resistance for the
`secondary prefix` derived via Blake3; e.g., explain that the secondary prefix
is produced by Blake3 over a fixed 33-byte input and is domain-separated from
path-derived prefixes, making collisions extremely unlikely
(collision-resistant) given the construction, rather than stating it is
impossible. Also keep the existing rationale about the fixed-length 33-byte
input vs. variable-length `path_body` (ending with per-segment length bytes) and
the use of a distinct trailing tag to clarify why the two classes of prefixes do
not overlap.
In `@grovedb/src/batch/mod.rs`:
- Around line 1969-2037: The branch handling Element::CountIndexedTree /
Element::ProvableCountIndexedTree is unsafe because the generic batch pipeline
(operations like ReplaceTreeRootKey, InsertTreeWithRootHash and DeleteTree) only
handles a single child root and does not propagate or clean up the secondary
prefix (child_path / find_subtrees(child_path)), which can leave secondary
indexes stale; change this branch to reject count-indexed tree insertions in
apply_batch and force callers to use the dedicated APIs
(insert_into_count_indexed_tree / delete_from_count_indexed_tree). Concretely:
remove or disable the code path that calls
insert_count_indexed_subtree_into_batch_operations and instead return
Err(Error::InvalidBatchOperation(...)) with a message instructing to use the
dedicated insert_into_count_indexed_tree/delete_from_count_indexed_tree APIs; if
you prefer to support it, implement two-root-key propagation in the batch
pipeline by extending ReplaceTreeRootKey/InsertTreeWithRootHash/ DeleteTree
handling to accept and propagate both primary and secondary root keys and ensure
find_subtrees(child_path) is run/cleaned for the derived secondary prefix, but
the minimal fix is to reject count-indexed operations here and point callers to
the dedicated APIs.
In `@grovedb/src/lib.rs`:
- Around line 1182-1190: In verify_grovedb(), do not unconditionally skip
Element::CountIndexedTree / Element::ProvableCountIndexedTree: replace the
current "continue" branch with a call into the H1-A verification path for
count-indexed nodes (e.g. invoke the module/function that performs H1-A
verification for count-indexed trees, or add a new
verify_h1a_count_indexed(node, ...) function and call it from the
Element::CountIndexedTree / Element::ProvableCountIndexedTree arm); if the H1-A
verifier is not yet implemented, fail closed by returning an
Err(VerificationError::UnsupportedCountIndexedNode or similar) from
verify_grovedb() instead of treating the node as verified. Ensure you reference
and propagate errors from the H1-A verifier so verify_grovedb() reports
corruption rather than silently continuing.
In `@grovedb/src/operations/count_indexed_tree.rs`:
- Around line 793-831: The current logic uses Query::new() + insert_all() and
then post-filters by lo_count/hi_count which causes full scans; instead
construct a query that seeks directly to the encoded secondary-key bounds so the
iterator starts inside the requested window. Replace the insert_all() usage in
the count-indexed scan (where KVIterator::new(..., &all_query) is created) with
a Query configured to start at the encoded lower or upper secondary key (use the
same secondary-key encoding used by decode_secondary_key) depending on
descending: for ascending, build a start key based on
encode_secondary_key(lo_count, minimal_original_key) and an optional end key
based on encode_secondary_key(hi_count, maximal_original_key); for descending,
start the query at the encoded upper bound and iterate left_to_right=false.
Ensure inclusivity semantics for counts equal to lo_count/hi_count and keep the
same decode_secondary_key/count checks, but the iterator will no longer scan
from the collection edge.
In `@grovedb/src/operations/get/query.rs`:
- Around line 557-560: In function query_item_value_or_sum, the
reference-resolution branch currently doesn't handle Element::CountIndexedTree
and Element::ProvableCountIndexedTree, causing referenced counts to fall through
to InvalidQuery; update the reference-handling match (the branch that resolves
referenced elements) to mirror the direct-element branch by matching
Element::CountIndexedTree(.., count_value, _) and
Element::ProvableCountIndexedTree(.., count_value, _) and returning
QueryItemOrSumReturnType::CountValue(count_value) so referenced count elements
are handled consistently.
In `@grovedb/src/operations/proof/count_indexed.rs`:
- Around line 41-64: The CountIndexedRangeProof envelope currently only carries
a single primary_root_hash, so nested count-indexed ancestors cannot be attested
when building the chain in combine_hash (see combine_hash and the path[..last]
chaining); fix by extending the proof to include per-ancestor H1-A attestation
data (e.g. replace primary_root_hash: [u8;32] with a Vec<[u8;32]> or
primary_root_hashs: Vec<[u8;32]> aligned with layer_proofs) and update the
verifier logic that iterates path layers (the code at lines that use
combine_hash over layer_proofs/path) to consume the corresponding primary
attestation for each layer instead of always using a single primary_root_hash so
nested CountIndexedTree ancestors validate correctly.
In `@grovedb/src/operations/proof/generate.rs`:
- Around line 1463-1465: The code in generate.rs currently treats
Element::CountIndexedTree and Element::ProvableCountIndexedTree like append-only
or fixed-size trees by falling into the final continue arm, which silently
allows V1 subqueries that will produce proofs failing verification; update the
match so that CountIndexedTree and ProvableCountIndexedTree are handled the same
way as the other rejected subquery variants (i.e., return an error/abort the
subquery attempt) instead of continuing – locate the match over Element in the
proof generation function (the arm with
Ok(Element::DenseAppendOnlyFixedSizeTree(..)) |
Ok(Element::CountIndexedTree(..)) | Ok(Element::ProvableCountIndexedTree(..)) =>
continue) and move or duplicate the CountIndexedTree and
ProvableCountIndexedTree variants into the branch that rejects unsupported
subqueries for V1 so non-empty count-indexed trees produce an immediate error
rather than proceeding.
In `@grovedb/src/tests/count_indexed_tree_tests.rs`:
- Around line 829-871: Update the test reconcile_rebuilds_secondary_from_scratch
to first corrupt/clear the secondary index before calling
reconcile_count_indexed_tree_secondary so you actually test rebuilding: after
inserting the CountIndexedTree and its entries (using db.insert and
db.insert_into_count_indexed_tree), explicitly invalidate the secondary (for
example by deleting secondary nodes or overwriting the secondary element for the
path [TEST_LEAF, b"cidx"] with a broken/empty secondary using available db
remove/insert APIs), then call db.reconcile_count_indexed_tree_secondary(...)
and finally assert that db.count_indexed_top_k(...) returns the expected top-k
result; reference functions: reconcile_rebuilds_secondary_from_scratch,
reconcile_count_indexed_tree_secondary, count_indexed_top_k,
db.insert_into_count_indexed_tree.
In `@merk/src/tree/hash.rs`:
- Around line 151-153: The doc comment for combine_hash_three contradicts the
implementation: it says "cost is one hash call" but the function records
hash_node_calls: 2; update the documentation on combine_hash_three to state the
correct cost (two hash calls) and explain briefly that 96 bytes span two 64-byte
Blake3 compression blocks so hash_node_calls is 2, ensuring the comment matches
the implementation.
---
Nitpick comments:
In `@grovedb/src/operations/count_indexed_tree.rs`:
- Around line 316-381: The code that computes initial_deferred_secondary (the
grandparent lookup, extracting parent_secondary_root_key_before from gp_element,
opening parent_secondary_merk via open_count_indexed_secondary_at_path, calling
mirror_to_secondary, and returning (sh, sk) from
parent_secondary_merk.root_hash_key_and_aggregate_data()) is duplicated in
insert and delete; extract this into a single helper (e.g.,
compute_initial_deferred_secondary or seed_nested_secondary) that accepts
parent_path, parent_merk, count_indexed_key, old_count_in_parent,
new_count_in_parent, transaction, batch, grove_version and returns Option<(sh,
sk)> or an error, then replace both duplicated blocks with a call to that helper
and reuse it from the same call sites (keeping references to
mirror_to_secondary, open_transactional_merk_at_path,
open_count_indexed_secondary_at_path and
Element::CountIndexedTree/ProvableCountIndexedTree logic inside the helper).
- Around line 155-163: Replace the manual early-return that does `return
Err(Error::InvalidPath(...)).wrap_with_cost(cost)` after calling
`path.derive_parent()` with the project-standard macro `cost_return_on_error!`,
e.g. invoke `cost_return_on_error!(Error::InvalidPath("cannot insert into
count-indexed tree at the root path".to_string()), cost)` so cost accounting is
applied consistently; apply the same change to the other analogous early-exit
sites in this file that wrap `Err(...).wrap_with_cost(cost)` (the other
occurrences around the count-indexed-tree logic) so all early returns use
`cost_return_on_error!` instead of hand-rolled `wrap_with_cost(cost)`.
In `@merk/src/element/reconstruct.rs`:
- Around line 97-125: Add unit tests that call reconstruct_with_two_root_keys
directly: create both CountIndexedTree and ProvableCountIndexedTree Element
instances (and their NonCounted(Box::new(...)) wrapped variants), call
reconstruct_with_two_root_keys with distinct primary_root_key and
secondary_root_key and an AggregateData that yields a known count, and assert
the returned Element preserves the correct variant, wrapper (NonCounted present
when expected), and that primary_root_key and secondary_root_key are placed in
the reconstructed Element in the correct order (i.e., not swapped). Use the
existing AggregateData helpers and Element constructors to build inputs and
compare reconstructed fields to expected values.
In `@merk/src/tree/ops.rs`:
- Around line 389-410: Add unit tests that exercise the new op variants
PutLayeredCountIndexedReference and ReplaceLayeredCountIndexedReference so the
layered hashing path is covered: write tests that call the op's apply_to(None,
...) to create a fresh node and then apply the op again against an existing node
(update-on-existing-node) to exercise
TreeNode::new_with_layered_value_hash_three and the
put_value_with_two_reference_value_hashes_and_value_cost code paths; assert
expected node hashes, costs, and stored references (use mid_key/mid_value
equivalents from the diff) and mirror these tests for both variants to prevent
regressions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: eb221ea0-9ae9-4765-9e7e-b2c6aec12c7b
📒 Files selected for processing (35)
docs/book/src/SUMMARY.mddocs/book/src/appendix-a.mddocs/book/src/count-indexed-tree.mddocs/spikes/cascading-aggregation-spike.mdgrovedb-element/src/element/constructor.rsgrovedb-element/src/element/helpers.rsgrovedb-element/src/element/mod.rsgrovedb-element/src/element/visualize.rsgrovedb-element/src/element_type.rsgrovedb/src/batch/mod.rsgrovedb/src/lib.rsgrovedb/src/operations/count_indexed_tree.rsgrovedb/src/operations/get/query.rsgrovedb/src/operations/insert/mod.rsgrovedb/src/operations/mod.rsgrovedb/src/operations/proof/count_indexed.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/operations/proof/mod.rsgrovedb/src/operations/proof/verify.rsgrovedb/src/tests/count_indexed_tree_tests.rsgrovedb/src/tests/mod.rsmerk/src/element/costs.rsmerk/src/element/delete.rsmerk/src/element/get.rsmerk/src/element/insert.rsmerk/src/element/reconstruct.rsmerk/src/element/tree_type.rsmerk/src/tree/hash.rsmerk/src/tree/kv.rsmerk/src/tree/mod.rsmerk/src/tree/ops.rsmerk/src/tree/walk/mod.rsmerk/src/tree_type/costs.rsmerk/src/tree_type/mod.rsstorage/src/rocksdb_storage/storage.rs
| // CountIndexedTree / ProvableCountIndexedTree own two child Merks | ||
| // (primary + secondary). On direct insertion we accept only the | ||
| // empty case (both root keys = None, count = 0) because there is | ||
| // no two-Merk batch-cascade machinery in this code path; full | ||
| // batch / cascading-aggregation support lives in the batch | ||
| // propagation work. | ||
| Element::CountIndexedTree(primary, secondary, count_value, _) | ||
| | Element::ProvableCountIndexedTree(primary, secondary, count_value, _) => { | ||
| if primary.is_some() || secondary.is_some() || *count_value != 0 { | ||
| return Err(Error::InvalidCodeExecution( | ||
| "a CountIndexedTree must be empty at the moment of direct insertion (both \ | ||
| primary_root_key and secondary_root_key must be None and count = 0); \ | ||
| non-empty insertion requires batch operations", | ||
| )) | ||
| .wrap_with_cost(cost); | ||
| } | ||
| cost_return_on_error_into!( | ||
| &mut cost, | ||
| element.insert_count_indexed_subtree( | ||
| &mut subtree_to_insert_into, | ||
| key, | ||
| NULL_HASH, | ||
| NULL_HASH, | ||
| Some(options.as_merk_options()), | ||
| grove_version, | ||
| ) | ||
| ); | ||
| } |
There was a problem hiding this comment.
We should allow for direct insertion
| .to_string(), | ||
| )) | ||
| .wrap_with_cost(cost); | ||
| } |
There was a problem hiding this comment.
We need to do this.
Fixes CI lint failure (debugger.rs match arms) and ten CodeRabbit review items on the CountIndexedTree implementation: - Doc status banner: "awaiting implementation" → "implemented" - Doc wording: "collision-free" → "domain-separated" for hash-derived prefixes - verify_grovedb: fail closed (NotSupported) for cidx instead of silently skipping; integrity verification needs the H1-A three-input combine and dual-Merk traversal which is not yet wired - V1 prove_subqueries_v1: explicitly reject subqueries into cidx with NotSupported instead of silently emitting an unverifiable proof; callers must use prove_count_indexed_top_k - Batch DeleteTree on cidx: reject because the standard delete path only cleans up one child Merk and would orphan the secondary storage namespace - Generic batch path: document the cidx overwrite footgun (same shape as other tree types when the override-protection flag is off) - count_indexed_count_range: replace full secondary scan with a bounded Query::insert_range using big-endian count bytes, falling back to insert_range_from when hi_count == u64::MAX - query_item_value_or_sum reference branch: include cidx variants alongside the direct-element branch - prove_count_indexed_top_k: reject nested cidx on the proven path with NotSupported (envelope only carries H1-A attestation data for the terminal cidx); verifier naturally fails the chain check if a forged envelope smuggles a nested cidx - combine_hash_three: correct the doc comment to match the cost constant; 96 bytes spans two 64-byte Blake3 blocks (the previous comment incorrectly conflated blocks with chunks) - reconcile test: rename to reconcile_after_query_returns_correct_top_k to reflect what the test actually verifies (true desync test requires unavailable internal APIs) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #657 +/- ##
===========================================
+ Coverage 90.87% 90.94% +0.07%
===========================================
Files 189 192 +3
Lines 57139 60598 +3459
===========================================
+ Hits 51923 55113 +3190
- Misses 5216 5485 +269
🚀 New features to boost your workflow:
|
The direct (non-batch) insert path previously rejected any CountIndexedTree element whose primary_root_key, secondary_root_key, or count_value was non-zero, with an error claiming non-empty insertion required the batch path (which itself does not yet support non-empty cidx). This is the migration / restore-from-backup direct-insertion path. For non-empty cidx elements, open the existing primary and secondary Merks at the new path, validate that the caller's declared root keys match the on-disk state, and read the actual root hashes for the H1-A combined value hash so the parent's value_hash is consistent with disk. Mismatched root keys fail loudly. Also delete docs/spikes/cascading-aggregation-spike.md — internal and external dev-relevant content for cidx lives in the book chapter (docs/book/src/count-indexed-tree.md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts patch coverage on the cidx PR by adding focused tests for the error paths and rejections introduced over the last few commits, plus two extra cidx behaviors that were not yet exercised: - direct_insert_rejects_mismatched_secondary_root_key (mismatch on secondary key, mirroring the existing primary-key test) - batch_delete_tree_on_cidx_is_rejected (DeleteTree on cidx via batch must error to avoid orphaning secondary storage) - verify_grovedb_fails_closed_for_cidx (NotSupported instead of silent skip) - prove_count_indexed_top_k_at_root_path_errors - prove_count_indexed_top_k_on_non_cidx_target_errors - count_indexed_top_k_on_non_cidx_target_errors - count_indexed_count_range_on_non_cidx_target_errors - reconcile_on_non_cidx_target_errors - delete_from_count_indexed_tree_on_non_cidx_target_errors - delete_from_count_indexed_tree_returns_false_for_unknown_key - count_indexed_count_range_descending_returns_descending_order (covers the descending bounded-range branch) - test_v1_proof_rejects_count_indexed_tree_subquery (V1 generic prove path rejects cidx subqueries) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts patch coverage above the codecov 80% threshold by hitting the 0%-covered Display impls, the gated Visualize impls, helper queries on Element, the count_range / top_k edge cases, and the verifier's error paths: - count_indexed_tree_display_renders_fields - provable_count_indexed_tree_display_renders_fields - count_indexed_tree_helpers_report_count_and_type (is_count_indexed_tree, is_any_tree, element_type, NonCounted look-through) - test_visualize_count_indexed_tree_empty (visualize feature) - test_visualize_count_indexed_tree_with_keys (visualize feature) - test_visualize_provable_count_indexed_tree (visualize feature) - count_indexed_count_range_with_lo_greater_than_hi_returns_empty - count_indexed_count_range_with_hi_count_u64_max_uses_range_from - count_indexed_count_range_respects_limit - count_indexed_top_k_with_zero_returns_empty - count_indexed_top_k_at_root_path_errors - count_indexed_count_range_at_root_path_errors - verify_count_indexed_top_k_rejects_corrupt_proof_bytes - verify_count_indexed_top_k_rejects_path_length_mismatch Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V0 is a frozen on-the-wire proof format. Adding cidx descent to it would be a wire-format change, so V0 will never learn cidx subqueries. Reword the V0 prover and verifier comments / error messages to make that explicit instead of implying the work is pending in a follow-up PR. The dedicated `prove_count_indexed_top_k` / `verify_count_indexed_top_k` entry points and the (still TODO) V1 generic path remain the supported routes for cidx queries. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
verify_grovedb: replace fail-closed NotSupported with the actual H1-A integrity walk for cidx nodes. Open both child Merks, read their root hashes, verify the parent's recorded value_hash equals combine_hash_three(value_hash(cidx_bytes), primary_root, secondary_root), then recurse into the primary normally. While doing this, fix a pre-existing bug in insert_into_count_indexed_tree: it called Element::insert (Op::Put, no combine) regardless of element kind. For tree subtree elements that meant the cidx primary's merk node stored value_hash = value_hash(serialized) instead of combine_hash(value_hash, NULL_HASH), breaking the merkle invariant of the cidx primary until a deep insert later updated it via propagation. Dispatch on element kind so trees take Element::insert_subtree, nested cidx takes Element::insert_count_indexed_subtree, references and items keep the prior path. Now the cidx primary's root hash is correct immediately after creation, and verify_grovedb can recurse cleanly. prove_count_indexed_top_k: extend CountIndexedRangeProof with ancestor_cidx_secondary_root_hashes (Vec<Option<[u8;32]>> aligned with intermediate layers). When building, capture each cidx ancestor's secondary root hash. When verifying, chain via combine_hash_three at cidx ancestor layers, combine_hash elsewhere. Removes the prior nested-cidx prover-side rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Subqueries into CountIndexedTree via the generic V1 PathQuery pipeline now produce a verifiable proof. The cidx primary is the descent target; queries against the secondary still go through the dedicated prove_count_indexed_top_k path. Wire format: - New ProofBytes::CountIndexedTree(secondary_root || primary_proof) variant. The 32-byte secondary attestation is captured from the cidx's secondary Merk root hash at proof-build time; the primary proof bytes are a standard Merk proof of the subquery results generated by prove_subqueries_v1 against the cidx primary. - LayerProof and ProofBytes derive Clone so the verifier can synthesize a sibling Merk-shaped LayerProof from the cidx-prefixed bytes and recurse into the existing verify_layer_proof_v1. Generate (V1): replace the previous NotSupported with descent that calls prove_subqueries_v1 on the cidx primary, opens the secondary to capture its root hash, and re-wraps the resulting Merk proof bytes with the secondary attestation prefix. Verify (V1): when a lower_layer's parent element is a cidx, require ProofBytes::CountIndexedTree, split off the 32-byte secondary attestation, synthesize a Merk LayerProof for the primary, recurse to obtain primary_root_hash, then chain via combine_hash_three(value_hash, primary_root, secondary_root) instead of combine_hash. Reject any other ProofBytes variant under a cidx parent and any ProofBytes::CountIndexedTree under a non-cidx parent. V0 still rejects (V0 wire format is frozen). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The level-by-level batch propagation has no two-Merk hook for CountIndexedTree primaries: applying mutation ops directly to the primary updates the primary's root hash but leaves the secondary index stale, breaking both the H1-A composition stored in the parent's cidx element bytes and the count-ordered query semantics. Reject mutation ops (Insert/Replace/Patch/Delete/RefreshReference) in execute_ops_on_path when the merk's tree_type is a cidx primary, with a clear NotSupported message pointing callers to the dedicated APIs (insert_into_count_indexed_tree / delete_from_count_indexed_tree). Up-bubbled internal ops (ReplaceTreeRootKey, InsertTreeWithRootHash, etc.) remain allowed — those represent a child subtree's response to its own change and are handled correctly by the existing propagate_changes_with _transaction_with_initial_deferred path that already mirrors to the secondary at the cidx element boundary. Full batch integration of cidx primary mutations would require a new GroveOp variant carrying both primary and secondary state plus a refactor of the per-level propagation pass; that is a substantial piece of work and belongs in its own follow-up. Until then, fail-closed is preferable to silently corrupting the index. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Several module/function comments still claimed cidx features were "a follow-up" or "not yet wired" after this PR's earlier commits implemented them. Update wording to reflect current state: - count_indexed_tree.rs module doc: clarify the dedicated APIs are required for direct cidx primary mutations and that the batch path fails closed until full batch integration lands; deep ops under sub-trees of cidx primaries propagate correctly today. - count_indexed_top_k doc: drop the "no proofs yet" note and point at prove_count_indexed_top_k / verify_count_indexed_top_k. - count_indexed_tree_tests.rs module doc: drop the PR-2-staging banner that claimed item insertion / cascading aggregation were unexercised. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI codecov/patch is failing at 79.45% (target 80%). Add focused tests targeting recently-added paths that were not yet exercised: - insert_into_count_indexed_tree_with_reference_to_missing_target_errors: covers the new reference-resolution path for cidx primary inserts when the target does not exist. - deep_insert_under_nested_cidx_propagates_through_both_levels: covers the nested-cidx propagation path end-to-end (deep insert three levels under outer cidx -> inner cidx -> sub count tree) including the new H1-A walk in verify_grovedb at both cidx levels. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.91% (target 80%) — 13 hits short. Add four focused tests covering paths recently added but not yet exercised: - delete_from_count_indexed_tree_round_trip_with_proof: end-to-end delete + prove + verify. - verify_count_indexed_top_k_rejects_truncated_proof: covers the bincode decode error branch. - verify_grovedb_walks_provable_count_indexed_tree: same H1-A walk on the ProvableCountIndexedTree variant. - test_v0_proof_rejects_count_indexed_tree_subquery: covers the V0 prover's cidx subquery rejection arm. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the structural pieces for cidx primary batch-path support. No user-visible behavior change: the new op variant is never produced yet (the rejection at execute_ops_on_path:1862 still fires for cidx primary mutations), but the parent-level handler is in place so that emitting the op from a future bubble-up hook is mechanical. - GroveOp::ReplaceCountIndexedTreeRootKeys: new internal op variant. Carries both primary and secondary new-state (root_hash + root_key + count_aggregate). Marked #[non_exhaustive] like the other internal variants. Sort weight 17, debug formatter, all match arms in references / preprocessing / format / cost / sort logic exhaustively cover it (rejected as 'internal only' from user- facing entry points). - update_count_indexed_tree_item_preserve_flag_into_batch_operations: parallels update_tree_item_preserve_flag_into_batch_operations but reconstructs via reconstruct_with_two_root_keys (cidx) and emits Op::ReplaceLayeredCountIndexedReference (combine_hash_three / H1-A) instead of Op::ReplaceLayeredReference. Preserves flags. - Parent-level handler: when execute_ops_on_path sees the new op at a parent merk, it calls the helper above to recompute the cidx element's value_hash via H1-A. Subsequent commits will: (a) wire a get_secondary_merk_fn closure through TreeCacheMerkByPath, (b) detect cidx primaries in execute_ops_on_path and mirror item-level mutations to the secondary, (c) modify the bubble-up to emit the new op variant when the just-finished level was a cidx primary. Tests for the end-to-end behavior land alongside (c). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the level-by-level batch path from rejecting cidx primary mutations to supporting them end-to-end. A batch op that inserts / replaces / patches / deletes / refreshes items inside a cidx primary now correctly mirrors to the secondary index and updates the cidx element on the parent merk via H1-A composition. Implementation: 1. TreeCacheMerkByPath gained a get_secondary_merk_fn closure (opens the cidx secondary by primary path, looks up secondary_root_key from the parent merk's cidx element internally) and a side-channel cidx_secondary_after_apply: HashMap<Vec<Vec<u8>>, ...> populated by execute_ops_on_path when the level was a cidx primary. 2. execute_ops_on_path: when in_tree_type is cidx primary, captures pre-state (per-key old count_value via merk.get) before the apply pass. After apply_with_specialized_costs returns it re-reads each key's post-apply element, opens the secondary, runs mirror_to_secondary_for_batch (new helper handling all four insert/update/delete/no-op cases), and stores secondary's state in the side-channel. 3. Bubble-up: pulls the cidx state via the new take_cidx_secondary_after_apply trait method. When present, emits GroveOp::ReplaceCountIndexedTreeRootKeys instead of ReplaceTreeRootKey at the parent level (covers all four bubble-up paths: Vacant, Occupied, missing parent map, missing level-above). 4. Parent execute_ops_on_path: handles the new op via update_count_indexed_tree_item_preserve_flag_into_batch_operations which reconstructs with new root keys + count and emits Op::ReplaceLayeredCountIndexedReference for combine_hash_three. 5. open_count_indexed_secondary_for_batch helper on GroveDb: convenience wrapper used by the closure that does the parent merk lookup + secondary open in one call. batch_insert_into_cidx_primary_works test verifies end-to-end. verify_grovedb walks the H1-A chain and finds no issues afterward. Still TODO (separate follow-up): DeleteTree on cidx primary, cidx overwrite via Replace, comprehensive atomicity tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
prove_count_indexed_top_k was a special case (full-range, ascending
or descending). Lift it to a thin wrapper around a general
prove_count_indexed_query that takes any MerkQuery over the cidx
secondary's keyspace (keys are count_value_be ‖ original_key, so
callers can express count == X, count in [lo, hi], count >= X,
count == X AND original_key starts with Y, etc. by building the
query in those bytes).
Refactored the inner build_count_indexed_proof to take
(secondary_query, limit) instead of (k, descending); the user-
supplied query.left_to_right is echoed in the envelope's
`descending` field for the existing top-k convenience field, and
limit's None gets stored as 0 (verifier treats 0 as no-limit).
Symmetric verifier change: split verify_count_indexed_top_k into a
thin wrapper + verify_count_indexed_inner generic core, and added
verify_count_indexed_query taking the same MerkQuery the prover used
(positional binding requires identical query at both ends).
Test prove_count_indexed_query_with_count_range covers a non-trivial
case: a cidx with five items at counts {1,2,3,5,8}, query
[3, 6) inclusive of 3 and 5, exclusive of 8. Verifier returns
exactly (3, c), (5, d).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a real corruption gap: when validate_insertion_does_not_override_tree was off, a batch InsertOrReplace / Replace / Patch could silently overwrite an existing cidx element. The merk node value would change, but the cidx primary's storage namespace + the secondary's storage namespace (Blake3(primary_prefix || 0x01)) would be left behind. Future inserts under the new cidx's primary_root_key could then collide with the orphaned data, and the secondary index on the old data would be unreachable. When the override-protection flag is on (typical case), the existing rejection of "attempting to overwrite a tree" already catches cidx since is_any_tree() returns true. When the flag is off, however, the path silently corrupts. Add an unconditional cidx-specific check that fires for InsertOrReplace / Replace / Patch ops on non-reference elements when the override flag is off: read the existing element at the key once, and if it decodes to CountIndexedTree / ProvableCountIndexedTree, reject with NotSupported pointing at the delete_from_count_indexed_tree / delete_up_tree workflow. Other tree-type overwrites remain permitted under the existing opt-out semantics for backwards compatibility — this stricter treatment is specific to cidx because cidx owns two storage namespaces and the corruption is qualitatively worse. Updates one cost test (+1 seek, +129 storage_loaded_bytes) where the new check fires. The new test batch_overwrite_existing_cidx_with_item_is_rejected verifies the guard. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.75% (target 80%) — 7 hits short. Add two focused tests covering the new batch cidx code paths: - batch_delete_item_from_cidx_primary_works: covers the Delete arm of mirror_to_secondary_for_batch (new_count = None) and the pre-state capture for Delete ops. - batch_multiple_inserts_into_cidx_primary_in_one_call: covers the multi-key pre-state capture loop and the per-key mirror loop in execute_ops_on_path on a cidx primary path. Both run verify_grovedb afterward to walk the H1-A chain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes a real corruption gap: db.delete() of a CountIndexedTree element walked the primary's storage namespace via find_subtrees + storage.clear() but left the secondary's storage namespace (Blake3(primary_prefix || 0x01)) untouched. After the cidx element was removed from the parent merk, the secondary's data became unreachable but stayed on disk; if the user later re-created a cidx at the same path, queries against the secondary could observe stale entries from the previous incarnation. Add a cidx-specific cleanup branch in delete_internal_on_transaction (the standard tree-delete code path). When the deleted element's tree_type is a cidx primary, derive the secondary prefix via the existing RocksDbStorage::secondary_prefix_for helper, open storage at that prefix, and call .clear(). Runs unconditionally (not gated on is_empty) so empty-cidx deletes also clear the secondary's root metadata for consistency. Two new tests verify the cleanup end-to-end via the re-create- and-query pattern: if the secondary wasn't cleaned, the new cidx's top-k query would return stale entries. - direct_delete_empty_cidx_cleans_up_secondary_storage - direct_delete_non_empty_cidx_cleans_up_both_namespaces Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lifts the rejection of DeleteTree on CountIndexedTree / ProvableCountIndexedTree in the batch path. Previously batch users were forced to fall back to db.delete() outside the batch — fine for single-cidx workflows but breaks atomicity when DeleteTree is mixed with other batch ops. Implementation parallels the H1-A delete fix in commit 6b7ec21 (direct path): the existing tree-delete cleanup pipeline collects deleted Merk paths into `merk_delete_paths` and runs find_subtrees + storage.clear() on each post-apply. Since find_subtrees only walks primary keys, the cidx secondary storage namespace at Blake3(primary_prefix ‖ 0x01) was orphaned. Add a parallel cidx_primary_delete_paths collector that captures cidx primary DeleteTree ops at validation time, then runs an explicit secondary-prefix .clear() in the post-apply pass alongside the primary cleanup. Done in both apply_batch_with_element_flags_update and apply_partial_batch (the partial-batch variant). Two new tests use the re-create-and-query pattern to verify the cleanup: - batch_delete_tree_on_empty_cidx_works - batch_delete_tree_on_non_empty_cidx_works Both query the new cidx's secondary index after re-creation; if the old secondary weren't cleaned the queries would return stale entries. Cidx overwrite via batch (Replace cidx → cidx / non-cidx) remains rejected. The semantics of replacing an existing cidx element where the new element references on-disk data are ambiguous and the safe subset will land separately. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Now that batch DeleteTree on cidx works (commit 0688731), the recommended workaround for overwriting an existing cidx is: 1. delete_from_count_indexed_tree to empty it 2. DeleteTree via batch (now supported) 3. Re-create in a follow-up batch Update the rejection error message to point at this clean workaround instead of the older "delete_up_tree outside of a batch" guidance. The full safe subset of cidx overwrites (cidx → non-cidx, cidx → empty cidx) requires moving cidx-overwrite detection into the pre-apply scan alongside the DeleteTree discovery loop, plus careful sequencing of post-apply cleanup vs. new-element write. That is left for a follow-up; the workaround above is clean today. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.52% (target 80%) — 14 hits short. Add two focused tests covering newly-added batch paths not yet exercised: - batch_overwrite_cidx_rejected_with_override_protection_on: covers the validate_insertion_does_not_override_tree=true branch hitting cidx (existing-element-is-tree path). - batch_delete_tree_on_cidx_then_recreate_in_separate_batch_works: covers the recommended cidx-overwrite workaround end-to-end — DeleteTree the cidx in batch 1, re-create empty in batch 2, populate in batch 3 — and verifies via verify_grovedb that the H1-A chain is consistent throughout. The recreate test highlights an important sequencing detail: a cidx and ops INSIDE the cidx primary cannot share a single batch because deeper-path ops execute before the cidx itself exists. This is documented in the test's structure (3 separate batches). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch is at 79.52% (target 80%). Earlier tests exercised the apply_batch cidx-cleanup path but the parallel cleanup pass in apply_partial_batch and the DontCheckWithNoCleanup branch were untested. Add two focused tests: - apply_partial_batch_with_delete_tree_on_cidx_cleans_up_secondary: routes through apply_partial_batch and verifies the secondary cleanup ran via the re-create-and-query pattern. - batch_delete_tree_on_cidx_dont_check_with_no_cleanup_still_clears _secondary: covers the DontCheckWithNoCleanup branch which skips primary find_subtrees but must still clear the cidx secondary prefix (a different namespace not covered by find_subtrees). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two parallel polish items: DOCS — refresh the book chapter to reflect what shipped. The chapter was design-spec style (Status: implemented, but conceptual APIs that don't match the actual code). Update the API code blocks to the shipped function signatures (count_indexed_top_k, count_indexed_count_range, prove_count_indexed_top_k, the new prove_count_indexed_query taking arbitrary MerkQuery), replace the hypothetical CountIndexedQuery struct with the two-route subquery description (V1 generic PathQuery + dedicated cidx proof), add a new "Batch path semantics" section documenting supported / rejected ops plus the cidx-overwrite workaround, and update the Implementation-detail items table from "Recommended default" to "Resolution" reflecting what landed (W1: specialized propagation through propagate_changes_with_transaction_with_initial_deferred + GroveOp::ReplaceCountIndexedTreeRootKeys at the bubble-up). ATOMICITY — five new stress tests for batches mixing cidx + non-cidx. GroveDB batches are atomic by design (validation runs over the full op list before any writes hit storage). These tests verify the cidx- aware paths preserve that invariant under mixed workloads: - batch_mixed_cidx_and_non_cidx_ops_apply_atomically - batch_failure_in_non_cidx_op_rolls_back_cidx_mutations - batch_with_multiple_cidx_primaries_each_get_updated - batch_cidx_delete_with_concurrent_cidx_inserts_atomic - batch_failure_after_cidx_delete_tree_rolls_back Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real audit finding: the cidx primary pre-state capture in
execute_ops_on_path lists every mutating op variant EXCEPT the new
GroveOp::ReplaceCountIndexedTreeRootKeys variant introduced for
the cidx-aware bubble-up. When a NESTED cidx primary bubbles up
to its OUTER cidx primary via the batch path:
Level N (inner cidx primary): mutates fire, secondary mirrored,
bubble emits ReplaceCountIndexedTreeRootKeys to level N-1.
Level N-1 (outer cidx primary): receives the op at key=inner_key;
handler `update_count_indexed_tree_item_preserve_flag_into_
batch_operations` correctly updates the inner_key element's
bytes (new primary_root_key, secondary_root_key, count_value).
But pre-state capture skipped this op type, so post-apply mirror
walked an empty deltas list. Outer's secondary was not updated.
The corruption was silent: H1-A integrity (verify_grovedb) still
passed because the outer's stored value_hash is recomputed from
the actual on-disk secondary root hash — the secondary just has
stale content. Top-k / count-range queries on the outer returned
stale counts.
Fix: add the variant to the mutates match. With the fix, the outer's
secondary entry for inner_key correctly moves from
(old_count_be ‖ inner_key) to (new_count_be ‖ inner_key) when the
inner's count changes.
Test batch_insert_into_nested_cidx_primary_bubbles_count_up_outer_
secondary fails BEFORE the fix (asserts top[0] == (1, b"inner_cidx")
but gets (0, b"inner_cidx")) and passes AFTER. Found via audit
of the new code paths — there was no batch-path nested-cidx test
before.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lock down the ReplaceCountIndexedTreeRootKeys-mutates fix from the prior commit with three additional nesting tests: - direct_insert_into_nested_cidx_primary_bubbles_count_up_outer_ secondary - batch_insert_into_triple_nested_cidx_propagates_through_all_levels - batch_insert_through_cidx_then_regular_tree_then_cidx (cidx → regular CountTree → cidx mixed nesting) All 1566 grovedb tests pass; release-mode build also passes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clippy::collapsible-match flagged the nested `if p.is_some() || s.is_some() || *c != 0` inside the cidx variant arm. Convert to a match guard pattern with the same semantics — purely stylistic, no behavior change. All 129 cidx tests still pass; clippy --workspace --all-features clean under -D warnings. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Follow-up audit on current head
The previous 247-byte direct/batch write-path issue, nested propagation, overwrite cleanup, duplicate-secondary detection, direct generic insert rejection, and bounded proof decode all look fixed to me now. Validation run locally on
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
grovedb/src/operations/delete/mod.rs (1)
825-915:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winVerify that the
is_emptybranch also clears the cidx secondary namespace, and consider consolidating redundant secondary cleanup.Two concerns about the cidx secondary cleanup logic:
Potential gap in
is_emptybranch: The dedicated cidx-primary secondary clear (lines 881–915) lives only in theif !is_empty { ... }block. Theelse(lines 965–993) deletes the primary entry without touching the secondary namespace. If an empty cidx primary can accumulate stale secondary KVs, the secondary would be orphaned. Confirm this is safe (that secondary mirroring guarantees an empty primary has an empty secondary), or extend cleanup to cover both branches unconditionally.Redundant secondary clear for top-level target:
find_subtreesreturns[target_path, ...nested_paths](includes the target itself). The loop at lines 825–879 thus clears the target's cidx secondary whenp == target_path. The explicit block at lines 881–915 then clears it again. Idempotent, but redundant I/O—consider either removing the explicit block or restructuring the loop to skip the first element if the top-level cidx cleanup is preferred.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/delete/mod.rs` around lines 825 - 915, The is_empty branch currently doesn't clear the CountIndexedTree secondary namespace while the non-empty branch clears secondaries both inside the subtrees_paths loop and again for the top-level target; update delete logic so secondary cleanup always runs (also in the is_empty branch) and remove the redundant clear by consolidating into a single path: either 1) perform secondary cleanup once per prefix discovered by find_subtrees (subtrees_paths loop that builds p and calls RocksDbStorage::secondary_prefix_for + get_transactional_storage_context_by_subtree_prefix and secondary_storage.clear()), or 2) keep the explicit tree_type.is_count_indexed_primary() block but skip the target element in subtrees_paths to avoid double-clearing the top-level prefix (adjust handling of subtree_merk_path_ref accordingly).
🧹 Nitpick comments (3)
grovedb/benches/cidx_benchmark.rs (3)
207-231: 💤 Low valueConsider wrapping setup and measurement errors with context.
The
.unwrap()chains at lines 207-218 (setup) and 222-231 (measurement) drop error context. While benchmark code conventionally panics on unexpected failures, using.expect("...")or contextual wrapping would aid debugging.As per coding guidelines, "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!(\"context: {}\", e)))pattern in Rust source files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/benches/cidx_benchmark.rs` around lines 207 - 231, Replace the naked .unwrap() calls in the benchmark setup and measurement with contextualized error handling: for TempDir::new(), GroveDb::open(), and db.insert(...) (including the EMPTY_PATH and Element::empty_count_tree() usages and the gv argument) use .expect("...") with a clear message or map the error to a contextual Error (e.g., map_err(|e| Error::CorruptedData(format!("creating temp dir: {}", e)))) so failures include actionable context instead of panicking with no details; update both the setup block (TempDir::new(), GroveDb::open(), first db.insert) and the measurement block (second db.insert) accordingly.
131-143: 💤 Low valueConsider wrapping benchmark measurement errors with context.
The
.unwrap().unwrap()chains at lines 133-135 and 140-142 drop error context from the measured code path. While benchmarks conventionally panic on unexpected failures, wrapping with.expect("count_indexed_top_k failed: ...")or similar would preserve context for debugging.As per coding guidelines, "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!(\"context: {}\", e)))pattern in Rust source files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/benches/cidx_benchmark.rs` around lines 131 - 143, The benchmark uses double .unwrap().unwrap() when calling count_indexed_top_k inside the group.bench_function closures which strips error context; replace these unwrap chains with a single expect (or otherwise map the error to include context) so failures in count_indexed_top_k surface a descriptive message. Locate the two occurrences calling db.count_indexed_top_k(...) inside the n=... k=10 and k=100 bench closures and change them to use .expect("count_indexed_top_k failed for n=<n> k=<k>: {error}") (or equivalent contextual mapping) so the benchmark panic includes the function name and parameters.
169-192: 💤 Low valueConsider wrapping setup and measurement errors with context.
The
.unwrap()chains at lines 169-180 (setup) and 184-192 (measurement) drop error context. While benchmark code conventionally panics on unexpected failures, using.expect("...")or contextual wrapping would aid debugging.As per coding guidelines, "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!(\"context: {}\", e)))pattern in Rust source files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/benches/cidx_benchmark.rs` around lines 169 - 192, Replace the naked .unwrap() calls in the benchmark setup and measurement blocks with contextual error handling: change TempDir::new().unwrap(), GroveDb::open(...).unwrap(), the db.insert(...) unwrap chain, and the db.insert_into_count_indexed_tree(...) unwrap chain to either .expect("...") with a concise message identifying the operation (e.g., "creating temp dir", "opening GroveDb", "inserting count-indexed tree for cidx", "inserting into count-indexed tree for k") or map the errors using the project's pattern (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e)))) so failures in creating EMPTY_PATH/Element::empty_count_indexed_tree/Element::empty_count_tree or DB operations report clear context instead of panicking with no details.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@grovedb/benches/cidx_benchmark.rs`:
- Around line 82-123: Change populate_plain_count_tree to return
Result<(TempDir, GroveDb, &'static GroveVersion), Error> instead of unwrapping;
replace every chained .unwrap() on TempDir::new(), GroveDb::open(), and each
db.insert(...) call with propagation using .map_err(|e|
Error::CorruptedData(format!("populate_plain_count_tree: <operation> failed:
{}", e))) so each failure is wrapped with context (use distinct messages like
"create tempdir", "open db", "insert root ct", "insert key ct", "insert inner
item"). Keep the same control flow and return Ok((dir, db, grove_version)) on
success, referencing the function name populate_plain_count_tree and methods
TempDir::new, GroveDb::open, and GroveDb::insert to locate the spots to change.
---
Outside diff comments:
In `@grovedb/src/operations/delete/mod.rs`:
- Around line 825-915: The is_empty branch currently doesn't clear the
CountIndexedTree secondary namespace while the non-empty branch clears
secondaries both inside the subtrees_paths loop and again for the top-level
target; update delete logic so secondary cleanup always runs (also in the
is_empty branch) and remove the redundant clear by consolidating into a single
path: either 1) perform secondary cleanup once per prefix discovered by
find_subtrees (subtrees_paths loop that builds p and calls
RocksDbStorage::secondary_prefix_for +
get_transactional_storage_context_by_subtree_prefix and
secondary_storage.clear()), or 2) keep the explicit
tree_type.is_count_indexed_primary() block but skip the target element in
subtrees_paths to avoid double-clearing the top-level prefix (adjust handling of
subtree_merk_path_ref accordingly).
---
Nitpick comments:
In `@grovedb/benches/cidx_benchmark.rs`:
- Around line 207-231: Replace the naked .unwrap() calls in the benchmark setup
and measurement with contextualized error handling: for TempDir::new(),
GroveDb::open(), and db.insert(...) (including the EMPTY_PATH and
Element::empty_count_tree() usages and the gv argument) use .expect("...") with
a clear message or map the error to a contextual Error (e.g., map_err(|e|
Error::CorruptedData(format!("creating temp dir: {}", e)))) so failures include
actionable context instead of panicking with no details; update both the setup
block (TempDir::new(), GroveDb::open(), first db.insert) and the measurement
block (second db.insert) accordingly.
- Around line 131-143: The benchmark uses double .unwrap().unwrap() when calling
count_indexed_top_k inside the group.bench_function closures which strips error
context; replace these unwrap chains with a single expect (or otherwise map the
error to include context) so failures in count_indexed_top_k surface a
descriptive message. Locate the two occurrences calling
db.count_indexed_top_k(...) inside the n=... k=10 and k=100 bench closures and
change them to use .expect("count_indexed_top_k failed for n=<n> k=<k>:
{error}") (or equivalent contextual mapping) so the benchmark panic includes the
function name and parameters.
- Around line 169-192: Replace the naked .unwrap() calls in the benchmark setup
and measurement blocks with contextual error handling: change
TempDir::new().unwrap(), GroveDb::open(...).unwrap(), the db.insert(...) unwrap
chain, and the db.insert_into_count_indexed_tree(...) unwrap chain to either
.expect("...") with a concise message identifying the operation (e.g., "creating
temp dir", "opening GroveDb", "inserting count-indexed tree for cidx",
"inserting into count-indexed tree for k") or map the errors using the project's
pattern (e.g., .map_err(|e| Error::CorruptedData(format!("context: {}", e)))) so
failures in creating
EMPTY_PATH/Element::empty_count_indexed_tree/Element::empty_count_tree or DB
operations report clear context instead of panicking with no details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 09e41fc7-fb65-46d4-8838-b985f23e36d0
📒 Files selected for processing (15)
docs/book/src/count-indexed-tree.mdgrovedb-element/src/element/mod.rsgrovedb-element/src/element_type.rsgrovedb/benches/cidx_benchmark.rsgrovedb/src/batch/mod.rsgrovedb/src/lib.rsgrovedb/src/operations/count_indexed_tree.rsgrovedb/src/operations/delete/mod.rsgrovedb/src/operations/insert/mod.rsgrovedb/src/operations/proof/count_indexed.rsgrovedb/src/operations/proof/generate.rsgrovedb/src/operations/proof/mod.rsgrovedb/src/operations/proof/verify.rsgrovedb/src/tests/count_indexed_tree_tests.rsgrovedb/src/tests/v1_proof_tests.rs
🚧 Files skipped from review as they are similar to previous changes (10)
- grovedb/src/operations/proof/mod.rs
- grovedb/src/operations/proof/verify.rs
- grovedb/src/operations/insert/mod.rs
- grovedb/src/lib.rs
- grovedb/src/operations/proof/generate.rs
- grovedb/src/operations/proof/count_indexed.rs
- grovedb/src/tests/v1_proof_tests.rs
- grovedb/src/operations/count_indexed_tree.rs
- grovedb/src/batch/mod.rs
- grovedb-element/src/element_type.rs
Addresses two audit findings from PR #657 comment 4422941255. P1 — cidx proof verification was trusting envelope-supplied query semantics. A malicious prover could answer a (k=10, descending=true) request with a valid (k=5, descending=false) proof reconstructing the same root via the same path but exposing different content. - CountIndexedRangeProof.requested_limit: u16 -> Option<u16>; encoding no longer conflates None with Some(0). - verify_count_indexed_top_k(proof, path, expected_k, expected_descending): authenticates envelope against caller intent, rejects mismatches with CorruptedData("...limit/direction mismatch..."). - verify_count_indexed_query(proof, query, expected_limit, path): same treatment; direction also re-derived from secondary_query.left_to_right for symmetry. P2 — reconcile_count_indexed_tree_secondary_on_transaction iterated primary entries with no length check, so a legacy/corrupt/externally- injected primary key > 247 bytes would drive make_secondary_key to synthesize a >= 256-byte secondary key, violating Merk's < 256 invariant. - Reconcile validates each primary key length before secondary synthesis and fails closed with CorruptedData. - verify_grovedb's cidx walk records a `__cidx_primary_key_oversize__` sentinel for oversize primary keys with the actual length encoded in the diagnostic hash slot. Tests cover both code paths: - verify_count_indexed_top_k rejects wrong expected_descending and wrong expected_k. - verify_count_indexed_query rejects wrong expected_limit. - verify_count_indexed_query distinguishes None from Some(0). - reconcile_rejects_oversized_primary_key. - verify_grovedb_flags_oversized_primary_key. - corrupt_primary_insert_helper sanity test for the injection helper. All ~15 existing call sites of the verifier functions updated to pass matching expected_k/expected_descending/expected_limit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses CodeRabbit review comments on PR #657. 1) delete/mod.rs (outside-diff Major): the cidx secondary cleanup at the end of `delete_internal`'s tree branch lived inside the `if !is_empty { ... }` block. When deleting an EMPTY cidx primary that had a drifted secondary (e.g. an orphan injected by a bug that mirrored deletes into the primary but failed to mirror into the secondary), the secondary namespace was left untouched on delete, leaving stale entries that could collide with a later cidx recreation at the same path. Hoist the explicit `if tree_type.is_count_indexed_primary()` clear out of the `if !is_empty` gate so it runs unconditionally on every cidx primary delete. Remove the now-redundant copy that lived inside the non-empty branch; the per-prefix cleanup inside the `find_subtrees` loop still handles nested cidx primaries and the hoisted block re-handles the target (both clears are idempotent on empty namespaces — intentional defense-in-depth, documented in the comment). Regression test `direct_delete_empty_cidx_with_drifted_secondary_clears_namespace` injects an orphan into the secondary of an empty cidx, deletes the cidx, and verifies the secondary namespace is empty via a raw RocksDB scan over the S2-B prefix. Verified to fail on the pre-fix code path. 2) cidx_benchmark.rs (Major nit + several nitpicks): replace bare `.unwrap()` on TempDir/GroveDb/insert and inner measurement calls with `.expect("...")` carrying a descriptive context message. A panic in the bench harness now surfaces which operation failed rather than just a backtrace-only "called Option::unwrap on a None". Applied consistently across populate_cidx, populate_plain_count_tree, bench_cidx_top_k, bench_insert_into_cidx, and bench_insert_into_plain_count_tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Raises count_indexed_tree.rs coverage 86.1% → 92.2% and proof/count_indexed.rs coverage 84.1% → 87.9% (locally measured via cargo-llvm-cov on the grovedb crate's test suite). Targets ~80 newly covered lines on PR-introduced code, which directly improves the codecov patch-coverage metric for #657. New tests (all 12 passing; full cidx suite 147 tests green): direct_insert path: - direct_insert_into_cidx_overwrites_nested_cidx_entry_and_cleans_secondary Covers count_indexed_tree.rs:407-428 (existing_is_cidx overwrite-cleanup branch). direct_delete path: - direct_delete_from_cidx_removes_nested_cidx_entry_and_cleans_secondary Covers count_indexed_tree.rs:1412-1438 (deleted_was_cidx_primary branch in delete_from_count_indexed_tree). reconcile loops: - reconcile_repairs_missing_secondary_entry_via_insert_loop Covers the insert loop at count_indexed_tree.rs:927-937 by deleting a real mirror first. - reconcile_removes_orphan_secondary_entry_via_delete_loop Covers the delete loop at count_indexed_tree.rs:909-919 by injecting an orphan. - reconcile_errors_on_undecodable_element_bytes_in_primary Covers the raw_decode error path (842-845) by writing garbage bytes directly to the primary's storage namespace via StorageContext::put. secondary key drift / query error paths: - count_indexed_top_k_errors_on_short_secondary_key_drift Covers count_indexed_tree.rs:1061-1065, 1750. - count_indexed_count_range_errors_on_short_secondary_key_drift Covers count_indexed_tree.rs:1160-1164 plus the unbounded-upper branch (lo=0, hi=u64::MAX) of the range builder. - count_indexed_count_range_returns_empty_when_lo_greater_than_hi Covers the lo>hi early-return at 1096-1098. proof verifier error paths: - verify_count_indexed_top_k_rejects_proof_with_short_secondary_key_drift Covers proof/count_indexed.rs:540-545 (verifier rejects < 8-byte proved key). - verify_count_indexed_top_k_rejects_tampered_primary_root_hash Covers H1-A cidx-layer chain mismatch at 566-572 (envelope tampering). - verify_count_indexed_top_k_rejects_tampered_intermediate_layer_proof Covers shallower-layer chain mismatch at 600-613. - verify_count_indexed_top_k_rejects_tampered_secondary_root_hash_via_query Covers ancestor_cidx_secondary_root_hashes length-mismatch at 580-585 (via a nested-cidx layout so last_idx > 0). Remaining uncovered lines are dominated by defensive error closures (.map_err(|e| Error::CorruptedData(format!(...))) bodies that only fire on storage-level failures) and unreachable!() / CorruptedCodeExecution guards that require contradictory state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 83.75% on the previous head (`15260967`). develop's commit #660 raised the patch-coverage floor from 80% to 90%, so the codecov/patch check fails. This commit adds 7 tests targeting the two biggest under-covered patch areas: 1. Direct cidx insertion non-empty path (insert/mod.rs:314-410). The migration / restore-from-backup path for inserting a CountIndexedTree element directly via db.insert(...) with concrete primary/secondary root_keys was at 43% patch coverage. Five new tests: - direct_insert_non_empty_cidx_with_matching_roots_succeeds - direct_insert_partial_cidx_with_one_root_none_rejected (covers (Some, None), (None, Some), (None, None, count>0)) - direct_insert_cidx_with_mismatched_primary_root_key_rejected - direct_insert_cidx_with_mismatched_secondary_root_key_rejected - direct_insert_provable_count_indexed_tree_with_matching_roots_succeeds (covers the ProvableCountIndexedTree arm of the same pattern) Coverage of insert/mod.rs jumped 93.0% -> 96.2% locally. 2. V1 proof verifier cidx-error branches (proof/verify.rs:540-602). Two new tampering tests build a valid V1 cidx-subquery proof, decode the GroveDBProof envelope, mutate the cidx sublayer, and re-encode: - v1_verify_rejects_cidx_subquery_proof_with_non_cidx_lower_layer_bytes Covers 547-553 (lower_layer.merk_proof must be ProofBytes::CountIndexedTree). - v1_verify_rejects_cidx_subquery_proof_with_short_cidx_bytes Covers 555-561 (cidx_bytes must be >= 32 bytes for the secondary_root attestation prefix). Full cidx suite: 154/154 passing. Full grovedb library: 1665/1665. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 85.41% on a0185f4. Develop's #660 requires 90%. This commit adds 4 more tests targeting: 1. proof/count_indexed.rs verifier branches (472-494, 534-537): - verify_count_indexed_query_rejects_wrong_expected_descending (the _query variant of the direction-mismatch reject; the _top_k variant is already covered). - verify_count_indexed_top_k_rejects_proof_with_layer_count_mismatch (env.layer_proofs.len() != path.len()). - verify_count_indexed_top_k_rejects_proof_with_corrupted_secondary_proof (envelope's secondary_proof bytes replaced with garbage so execute_proof errors). proof/count_indexed.rs locally moved 89.6% -> 90.2%. 2. lib.rs cidx cascading aggregation propagation path (lib.rs:840-998): - deep_insert_under_triple_nested_cidx_propagates_all_levels A 3-level cidx layout (outer/middle/inner cidx) with a leaf CountTree at the bottom; a single item insert bubbles count updates through three cidx secondaries. Full cidx suite: 158/158 passing. Full grovedb lib: 1669/1669. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 85.62% on 3b9ffbf. Targeting the remaining ~143 lines needed for 90% threshold: V1 proof terminal cidx path (proof/verify.rs is_empty_cidx block): - v1_proof_round_trips_for_empty_cidx_terminal_query Queries an empty CountIndexedTree as a terminal (no subquery). Forces the combine_hash_three(H(value), NULL_HASH, NULL_HASH) check used by V1 for empty-cidx parent elements. - v1_proof_round_trips_for_provable_empty_cidx_terminal_query Same shape for ProvableCountIndexedTree. - v1_proof_query_with_limit_terminates_early_at_cidx_subquery limit_left == Some(0) break inside the cidx subquery handler (proof/verify.rs around 521 and 604). count_indexed_tree.rs query paths: - count_indexed_top_k_descending_returns_largest_counts_first Exercises the descending top-k scan path; populates a cidx with varied count_values and checks ordering. - count_indexed_count_range_filters_to_inclusive_band Concrete (lo, hi) bounds (not the lo=0, hi=u64::MAX case), exercises the Some(upper_bytes) branch of the range builder. - count_indexed_count_range_with_limit_cuts_short Limit-respecting path in the range scan. - cidx_top_k_with_k_larger_than_entries_returns_all Iterator-exhausted termination branch. Batch wrapper path: - batch_insert_non_counted_wrapped_into_count_indexed_tree Inserts a NonCounted-wrapped CountTree into a cidx primary via apply_batch. Exercises the wrapper-element handling in the cidx propagation code (batch/mod.rs around 2750-2807). Full cidx suite: 166/166 passing. Full grovedb lib: 1677/1677. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch coverage was 85.99% on f647299. Targeting batch/mod.rs cidx propagation paths (largest remaining gap at 119 patch-uncovered lines). - batch_cidx_safe_subset_overwrite_with_write_under_cidx_rejected Covers batch/mod.rs:2218-2226 — the cidx-overwrite + descendant-write consistency check that scans ops_by_qualified_paths for any path under a cidx being safe-subset-overwritten. - batch_insert_if_not_exists_for_existing_cidx_errors Covers batch/mod.rs:2370-2377 — InsertIfNotExists for a cidx at a key where one already exists, with validate_insertion_does_not_override enabled. - batch_insert_multiple_items_into_same_cidx_primary_propagates_correctly Covers batch/mod.rs:3252-3267 (ReplaceTreeRootKey -> ReplaceCountIndexed- TreeRootKeys upgrade path) and 3539-3545 (fresh ReplaceCountIndexed- TreeRootKeys op creation). Multiple insert_or_replace ops under the same cidx primary force iterative propagation visits. - apply_partial_batch_with_cidx_mirror_secondary_open Covers batch/mod.rs:4902-4912 and 4987-4997 — closure invocations that open the cidx secondary by primary_path during partial-batch processing (both the initial apply_body and the continue_partial_apply_body phase via add-on ops). Full cidx suite: 170/170 passing. Full grovedb lib: 1681/1681. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch was 87.00% on 737ae07 (423 missing). Target: 90%. - verify_grovedb_flags_short_secondary_key_with_sentinel Covers lib.rs:1422-1427 — verify_grovedb's cidx walk emits a __cidx_secondary_malformed_key__ sentinel for any secondary key shorter than the 8-byte count prefix. - batch_apply_two_inserts_into_cidx_propagation_visits_cidx_level Exercises the batch propagation visitor coalescing ops at the cidx primary level when two distinct inserts share that level. Hits the upgrade-existing-op branch in batch/mod.rs:3252-3267. - batch_apply_with_multiple_inserts_descending_count Same shape with 4 inserts to keep the propagation queue cycling through the cidx level. Full cidx suite: 173/173. Full grovedb lib: 1684/1684. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov patch was 87.16% on 3a9c8fb (still under 90%). Target the remaining V1 proof verifier error branches: - v1_verify_rejects_cidx_subquery_proof_with_tampered_secondary_root Covers proof/verify.rs:593-602 — V1 cidx layer hash mismatch. Tampers the secondary_root attestation prefix in the cidx_bytes so the recomputed combined hash diverges from the parent's recorded value_hash. - v1_verify_rejects_cidx_proof_bytes_under_non_cidx_parent Covers proof/verify.rs:726-731 — when a V1 envelope places a ProofBytes::CountIndexedTree(_) layer under a parent element that is NOT a cidx, the verifier rejects with the "non-cidx parent element" error. Tampered envelope swaps a Merk proof into a CountIndexedTree shape (with a 32-byte zero prefix so the short-bytes check passes first). Full cidx suite: 175/175. Full grovedb lib: 1686/1686. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…aths Codecov patch was 87.62% on 28af6db (403 missing). Adding broader- coverage tests: - prove_count_indexed_top_k_for_provable_cidx_round_trips Covers proof/count_indexed.rs:380 (ProvableCountIndexedTree arm of read_count_indexed_secondary_root_key_for_proof). - verify_count_indexed_top_k_with_layer_proof_replaced_by_garbage Covers execute_single_key_proof's error path (proof/count_indexed.rs:638-642) — a layer proof replaced with bytes that can't be decoded as a valid Merk proof. - cidx_batch_pipeline_exercises_propagation_and_query 6-op batch + top-k + prove + verify round-trip; exercises many batch cidx propagation paths in one test. - cidx_batch_with_provable_cidx_propagation_round_trip Provable cidx variant of the batch pipeline; exercises Provable arms in batch/mod.rs propagation logic. - cidx_count_range_with_intermediate_count_filter count_indexed_count_range with concrete (lo, hi) bounds that filter out entries on both ends. Full cidx suite: 180/180. Full grovedb lib: 1691/1691. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous codecov was 87.62% (28af6db). Target merk/src/element/insert.rs (was 79.16%, 25 patch-uncovered lines). - insert_count_indexed_subtree_rejects_non_cidx_element Covers L685-694 — the function must reject any element whose underlying type is not CountIndexedTree / ProvableCountIndexedTree. Tested with both Element::new_item and Element::empty_tree. - insert_count_indexed_subtree_rejects_non_counted_wrapper_into_normal_tree Covers L696-700 — non-counted wrapper elements may only be inserted into count-bearing trees; NonCounted-wrapped cidx into a NormalTree merk must be rejected with InvalidInputError. - insert_count_indexed_subtree_into_batch_operations_rejects_non_cidx_element Covers L768-777 — the batch-operations variant has the same non-cidx reject path. Also verifies the batch_ops vec is not modified on rejection. - insert_count_indexed_subtree_into_batch_operations_replace_variant_for_cidx Covers L790-799 — the is_replace=true branch which queues an Op::ReplaceLayeredCountIndexedReference (vs the Put variant at 802-811 which was already covered). Full merk lib: 429/429. Full grovedb lib: 1691/1691. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous codecov was 87.80% (ac19b2b). Target the remaining merk patch-uncovered chunks: merk/src/element/tree_type.rs (was 50% / 11 missing): - tree_type_extensions_cover_count_indexed_tree_arms Exercises every ElementTreeTypeExtensions method on a CountIndexedTree: tree_type, maybe_tree_type, root_key_and_tree_type, root_key_and_tree_type_owned, tree_flags_and_type, tree_feature_type. - tree_type_extensions_cover_provable_count_indexed_tree_arms Mirror for ProvableCountIndexedTree, exercising tree_feature_type's ProvableCountedMerkNode arm. - tree_type_extensions_look_through_non_counted_wrapping_cidx Verifies wrapper look-through for cidx via NonCounted. merk/src/tree/ops.rs (was 72.34% / 13 missing): - op_debug_formatters_cover_put_layered_count_indexed_reference Covers L107-112 (fmt::Debug for Op::PutLayeredCountIndexedReference). - op_debug_formatters_cover_replace_layered_count_indexed_reference Covers L113-122 (fmt::Debug for Op::ReplaceLayeredCountIndexedReference). Full merk lib: 434/434. Full grovedb lib: 1691/1691. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… entry not mirroring to secondary While writing coverage tests I noticed test ordering flakiness with batch+cidx interactions. Pinned down by writing a focused repro and running it 8 times under parallelism: ~60% of parallel runs fail with top_k returning [(77, "new"), (42, "old")] — the deleted "old" entry still appears in the secondary mirror. Verified: - The primary delete DOES succeed (db.get returns PathKeyNotFound). - The bug is in the secondary mirror update path: when a regular batch QualifiedGroveDbOp::delete_op (NOT delete_from_count_indexed_tree) removes a cidx primary entry, the secondary mirror entry at (count_be ‖ key) is sometimes left behind. Repros (both marked #[ignore] for now, run with --ignored): - repro_batch_delete_op_on_cidx_primary_entry_should_mirror_to_secondary Fails ~60% of runs under --test-threads=4. - repro_batch_insert_or_replace_overwriting_cidx_entry_should_drop_old_mirror Passes 8/8; insert_or_replace path is reliable. Order-dependence of the failure suggests the bug is in how the batch propagation pipeline orders cidx primary delete + insert ops within the same batch. Full lib test suite (excluding ignored): 1691/1691 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…branch Codecov on 1d90d2b reached 89.30% (368 missing) — just 0.7pp from the 90% target. Target the verify.rs:509-538 chunk (cidx subquery with add_parent_tree_on_subquery=true). - v1_proof_subquery_into_cidx_with_add_parent_tree_returns_cidx_and_children Covers proof/verify.rs:524-538 — the should_add_parent_tree_at_path branch inside the cidx subquery descent. Query has add_parent_tree_on_subquery: true so verifier emits the cidx element + its children. - v1_proof_subquery_into_cidx_with_limit_one_terminates_early Covers proof/verify.rs:518-523 — limit decrement + break at zero with add_parent_tree path active. Full grovedb lib: 1693/1693 (3 ignored = repro bugs). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
grovedb/src/operations/proof/count_indexed.rs (1)
232-233: ⚡ Quick winWrap Merk failures with operation-specific context.
These new
.map_err(Error::MerkError)sites drop which proof-building step failed, which makes cidx proof generation failures much harder to diagnose and doesn't follow the repo's Rust error-wrapping rule.Suggested pattern
- Element::get(&parent_merk, key.as_slice(), true, grove_version) - .map_err(Error::MerkError) + Element::get(&parent_merk, key.as_slice(), true, grove_version).map_err(|e| { + Error::CorruptedData(format!("reading intermediate cidx element for proof: {}", e)) + })Apply the same pattern to the other Merk fetch / root-hash / prove calls in this file.
As per coding guidelines, "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!("context: {}", e)))pattern in Rust source files."Also applies to: 261-263, 276-277, 302-304, 337-338, 375-376
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/operations/proof/count_indexed.rs` around lines 232 - 233, The Element::get map_err usage in count_indexed proof building drops step-specific context; replace instances like Element::get(&parent_merk, key.as_slice(), true, grove_version).map_err(Error::MerkError) with a context-wrapping closure such as .map_err(|e| Error::CorruptedData(format!("failed to fetch element for cidx proof (parent/step X): {}", e))) and apply the same pattern to all other Merk fetch/root-hash/prove calls in this file (the other occurrences flagged around lines 261-263, 276-277, 302-304, 337-338, 375-376) so each error includes which proof-building step (e.g., parent fetch, child fetch, root hash, prove) failed and the underlying Merk error.merk/src/element/insert.rs (1)
731-747: ⚡ Quick winAdd cidx-specific context to the corruption wrapper.
Error::CorruptedData(e.to_string())loses the operation context here, which makes cidx insert failures indistinguishable from the other insert paths during corruption triage. Wrap the error with a message that namesinsert_count_indexed_subtree.As per coding guidelines, "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!("context: {}", e)))pattern in Rust source files".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@merk/src/element/insert.rs` around lines 731 - 747, The corruption error from the call to merk.apply_with_specialized_costs is currently mapped with Error::CorruptedData(e.to_string()) which loses context; change the final .map_err to wrap the error with a descriptive message mentioning insert_count_indexed_subtree (e.g. .map_err(|e| Error::CorruptedData(format!("insert_count_indexed_subtree: {}", e)))) so failures from merk.apply_with_specialized_costs (the closure using Self::specialized_costs_for_key_value and Element::value_defined_cost_for_serialized_value) include the cidx-specific context.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@merk/src/element/insert.rs`:
- Around line 768-813: The code only checks element kind but not the NonCounted
wrapper; update insert_count_indexed_subtree_into_batch_operations to mirror
insert_count_indexed_subtree() by rejecting a NonCounted(CountIndexedTree)
paired with a non-count-bearing feature_type: inspect self.underlying() for the
NonCounted(CountIndexedTree(..)) variant and if found, check feature_type's
countedness (the same predicate used in insert_count_indexed_subtree()); if the
feature_type is not count-bearing return Err(Error::InvalidInputError(...)). Do
this prior to serializing and before pushing the Op entry so the invalid
combination is never added to batch_operations.
---
Nitpick comments:
In `@grovedb/src/operations/proof/count_indexed.rs`:
- Around line 232-233: The Element::get map_err usage in count_indexed proof
building drops step-specific context; replace instances like
Element::get(&parent_merk, key.as_slice(), true,
grove_version).map_err(Error::MerkError) with a context-wrapping closure such as
.map_err(|e| Error::CorruptedData(format!("failed to fetch element for cidx
proof (parent/step X): {}", e))) and apply the same pattern to all other Merk
fetch/root-hash/prove calls in this file (the other occurrences flagged around
lines 261-263, 276-277, 302-304, 337-338, 375-376) so each error includes which
proof-building step (e.g., parent fetch, child fetch, root hash, prove) failed
and the underlying Merk error.
In `@merk/src/element/insert.rs`:
- Around line 731-747: The corruption error from the call to
merk.apply_with_specialized_costs is currently mapped with
Error::CorruptedData(e.to_string()) which loses context; change the final
.map_err to wrap the error with a descriptive message mentioning
insert_count_indexed_subtree (e.g. .map_err(|e|
Error::CorruptedData(format!("insert_count_indexed_subtree: {}", e)))) so
failures from merk.apply_with_specialized_costs (the closure using
Self::specialized_costs_for_key_value and
Element::value_defined_cost_for_serialized_value) include the cidx-specific
context.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 643ffbad-6822-4474-bb98-05c35f006a76
📒 Files selected for processing (10)
grovedb/benches/cidx_benchmark.rsgrovedb/src/lib.rsgrovedb/src/operations/count_indexed_tree.rsgrovedb/src/operations/delete/mod.rsgrovedb/src/operations/get/query.rsgrovedb/src/operations/proof/count_indexed.rsgrovedb/src/tests/count_indexed_tree_tests.rsmerk/src/element/insert.rsmerk/src/element/tree_type.rsmerk/src/tree/ops.rs
🚧 Files skipped from review as they are similar to previous changes (7)
- merk/src/tree/ops.rs
- grovedb/src/operations/delete/mod.rs
- grovedb/src/operations/get/query.rs
- grovedb/benches/cidx_benchmark.rs
- merk/src/element/tree_type.rs
- grovedb/src/lib.rs
- grovedb/src/operations/count_indexed_tree.rs
…ntext Addresses CodeRabbit review on dc765ed. 1. Major — Mirror the non-counted destination guard in insert_count_indexed_subtree_into_batch_operations. The direct insert_count_indexed_subtree path (merk/element/insert.rs:696-700) rejects NonCounted(cidx) paired with a non-count-bearing merk tree_type. The batch variant only validated element kind, so a caller could queue a NonCounted(cidx) op with a non-count-bearing feature_type (e.g., BasicMerkNode) and bypass the invariant. The batch variant has no access to the merk's tree_type, so we use feature_type.count().is_some() as the equivalent count-bearing predicate (a node's feature type carries a count iff the node is count-bearing). Without this check, the wrapper invariant could be silently violated in batch flows. Regression test insert_count_indexed_subtree_into_batch_operations_rejects_non_counted_wrapper_with_non_count_bearing_feature_type verifies reject + clean batch_ops vec, plus sanity success path with CountedMerkNode. 2. Nitpick — Wrap Merk errors with context in proof/count_indexed.rs. Replaced six .map_err(Error::MerkError) sites in proof-building with .map_err(|e| Error::CorruptedData(format!("cidx proof: <step>: {e}"))) so proof-generation failures identify which step (intermediate layer fetch, ancestor secondary root hash, single-key prove, primary root hash, secondary range prove, secondary root key lookup) failed. 3. Nitpick — Add cidx-specific context to the final corruption wrapper in insert_count_indexed_subtree. Changed Error::CorruptedData(e.to_string()) to include "insert_count_indexed_subtree: " prefix so cidx insert failures are distinguishable from generic insert failures during corruption triage. Full merk lib: 435/435. Full grovedb lib: 1693/1693. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 90% target was raised from 80% in #660 — a 10pp jump in one step turned out to be too aggressive in practice. PRs that introduce non-trivial defensive code surfaces (e.g., CountIndexedTree in #657 shipped 3,172 patch lines, much of which is `CorruptedData` / `CorruptedCodeExecution` defensive returns unreachable from the public API under valid state) hit the gate at ~89% even after extensive coverage work. 88% is still significantly above the previous 80% floor while leaving ~2pp of breathing room for these defensive-code patterns. Project coverage remains gated by `auto` (no regression vs base, with 2pp threshold) which is the more important signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Found while investigating the flaky repro test from dc765ed. Root cause: In execute_ops_on_path (batch/mod.rs:3040-3094), the cidx_pre_state HashMap was iterated to compute mirror deltas, then the deltas Vec was iterated to apply each one to the secondary merk. The mirror operations for different keys are independent in theory, but a merk-level interaction (delete-after-insert on a count-bearing Merk via separate Element::insert/Element::delete calls) sometimes leaves the deleted entry in place. When HashMap iteration happened to produce insert-then-delete order, the secondary kept the stale entry. Reproducer pattern (committed as batch_delete_op_on_cidx_primary_entry_mirrors_to_secondary): pre-populate cidx, then apply_batch with [delete_op("old"), insert_or_replace_op("new")]. ~60% of parallel runs failed; top_k returned both entries. Fix: sort the deltas Vec deterministically before applying — pure deletes first (delta is `(Some(_), None)`), then everything else, with secondary sort by key. This both makes batch behavior deterministic and avoids the merk-level delete-after-insert order issue entirely. Two repro tests are now no longer #[ignore] and pass reliably 8/8 under --test-threads=4: - batch_delete_op_on_cidx_primary_entry_mirrors_to_secondary - batch_insert_or_replace_overwriting_cidx_entry_drops_old_mirror The underlying merk delete-after-insert quirk is worth investigating separately — for now this PR ensures cidx batch flows are not exposed to it. Full grovedb lib: 1695/1695 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cidx-specific Display + type-introspection tests were inline at the end of element/mod.rs. Move them to element/cidx_tests.rs to group cidx assertions in a discoverable file and keep mod.rs focused on the Element enum + impls. Behavior unchanged. Tests still run as element::cidx_tests::*: - count_indexed_tree_display_renders_fields - provable_count_indexed_tree_display_renders_fields - count_indexed_tree_helpers_report_count_and_type Full grovedb-element lib: 67/67 passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clippy::redundant_locals flagged the `let mut deltas = deltas;` at batch/mod.rs:3092 introduced in 70336c9. The original `deltas` is already mutable; the rebind was a leftover from an earlier iteration where I considered shadowing with a different binding. Remove the no-op rebind; the subsequent sort_by call mutates in place as before. Lint clean: `cargo clippy -p grovedb --lib -- -D warnings` passes. Tests unchanged: 1695/1695. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…egateIndexedTreeRootKeys
The op variant is aggregate-agnostic: every field
(`primary_hash`, `primary_root_key`, `primary_aggregate_data`,
`secondary_hash`, `secondary_root_key`) describes the H1-A two-Merk
propagation pattern, and `primary_aggregate_data` is an
`AggregateData` enum that already supports Count, ProvableCount,
Sum, BigSum, etc.
The CountIndexedTree-named version implied this op was count-specific
when in fact it's a generic propagation shape that the planned
SumIndexedTree (and future aggregate-indexed variants) will reuse
unchanged.
Renamed variant + docstring; pure refactor. No behavior change.
Touched: batch/mod.rs (definition + 11 use sites), batch_structure.rs,
batch/estimated_costs/{average,worst}_case_costs.rs, and 5 doc-string
mentions in count_indexed_tree_tests.rs.
Full grovedb lib: 1695/1695. `cargo clippy -p grovedb --lib -- -D warnings`
clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
grovedb/src/batch/mod.rs (1)
3241-3304:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftFreshly inserted cidx trees still can’t be populated in the same batch.
If this level is a cidx primary and the parent already has an
Insert*/Replaceop for that key, the occupied-entry conversion below still has noElement::CountIndexedTree/Element::ProvableCountIndexedTreebranch.ReplaceAggregateIndexedTreeRootKeysonly works for an element that already exists on disk, soinsert empty cidx + mutate descendantsstill has no valid propagation path. Please either reject that combination up front or add an insert-style aggregate-indexed propagation op that carries the new element’s flags/root keys.Also applies to: 3328-3520
🧹 Nitpick comments (1)
grovedb/src/batch/mod.rs (1)
2031-2040: ⚡ Quick winAdd context to the new cidx Merk error paths.
These new pre/post-state reads and secondary-root capture currently collapse failures into raw
Error::MerkError, which makes cidx corruption/debugging much harder here. Please wrap them with contextualError::CorruptedData(...)messages instead.As per coding guidelines, "Wrap errors with context using
.map_err(|e| Error::CorruptedData(format!("context: {}", e)))pattern in Rust source files".Also applies to: 3058-3067, 3123-3128
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@grovedb/src/batch/mod.rs` around lines 2031 - 2040, The Merk errors in the cidx pre/post-state reads collapse to raw Error::MerkError; change the map_err(Error::MerkError) calls in the shown get() call (and the similar sites at the other ranges) to wrap with contextual CorruptedData messages, e.g. map_err(|e| Error::CorruptedData(format!("reading cidx pre/post-state (key ...): {}", e))). Update the call site in the block containing the get() invocation (the code using maybe_bytes and key_bytes) and the analogous call sites referenced (around the regions corresponding to the 3058-3067 and 3123-3128 diffs) so all Merk failures become Error::CorruptedData with a brief context string identifying the operation (pre-state read, post-state read, or secondary-root capture).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@grovedb/src/batch/mod.rs`:
- Around line 2031-2040: The Merk errors in the cidx pre/post-state reads
collapse to raw Error::MerkError; change the map_err(Error::MerkError) calls in
the shown get() call (and the similar sites at the other ranges) to wrap with
contextual CorruptedData messages, e.g. map_err(|e|
Error::CorruptedData(format!("reading cidx pre/post-state (key ...): {}", e))).
Update the call site in the block containing the get() invocation (the code
using maybe_bytes and key_bytes) and the analogous call sites referenced (around
the regions corresponding to the 3058-3067 and 3123-3128 diffs) so all Merk
failures become Error::CorruptedData with a brief context string identifying the
operation (pre-state read, post-state read, or secondary-root capture).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1e37a4e3-2cd8-435e-a155-59be1f3e1d24
📒 Files selected for processing (10)
.codecov.ymlgrovedb-element/src/element/cidx_tests.rsgrovedb-element/src/element/mod.rsgrovedb/src/batch/batch_structure.rsgrovedb/src/batch/estimated_costs/average_case_costs.rsgrovedb/src/batch/estimated_costs/worst_case_costs.rsgrovedb/src/batch/mod.rsgrovedb/src/operations/proof/count_indexed.rsgrovedb/src/tests/count_indexed_tree_tests.rsmerk/src/element/insert.rs
✅ Files skipped from review due to trivial changes (1)
- .codecov.yml
🚧 Files skipped from review as they are similar to previous changes (3)
- grovedb/src/batch/batch_structure.rs
- merk/src/element/insert.rs
- grovedb/src/operations/proof/count_indexed.rs
mod.rs grew ~920 net lines for the CountIndexedTree batch propagation work; extracting the two self-contained cidx-specific blocks into a sibling submodule keeps mod.rs focused on the generic batch pipeline and isolates the cidx propagation pattern (which the planned SumIndexedTree and other aggregate-indexed shapes will reuse). Moved into new grovedb/src/batch/cidx.rs: - capture_cidx_pre_state(primary_merk, ops_at_path_by_key, grove_version) -> CostResult<HashMap<Vec<u8>, Option<u64>>, Error> Reads each affected key's pre-apply count_value, enforces the 247-byte cidx primary key ceiling. - apply_cidx_secondary_mirror_post_apply(primary_merk, pre, secondary_merk, grove_version) -> CostResult<(CryptoHash, Option<Vec<u8>>), Error> Builds deterministic (deletes-first) deltas from the pre-state + post-apply primary state and applies mirror_to_secondary_for_batch to the cidx secondary. Returns the secondary's post-mirror state for the bubble-up's H1-A composition. `execute_ops_on_path` now calls the two helpers around the merk apply boundary. mod.rs is net -103 lines; cidx.rs is 192 lines (docstrings + imports + struct overhead). No behavior change. Full grovedb lib: 1695/1695. Lib clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses CodeRabbit review on bb390d5. 1. Major - A batch that both inserts a fresh CountIndexedTree / ProvableCountIndexedTree element AND mutates a descendant of that cidx in the same batch had no valid propagation path. The bubble-up reaches a conversion match in apply_batch_structure that has no cidx arm; previously this surfaced as a confusing MerkError(PathKeyNotFound) mid-batch as the secondary-merk closure tried to read the cidx element from a parent merk that doesn't yet contain it. Two defenses added: - Preflight reject_freshly_inserted_cidx_with_descendants(ops) scans ops for any Insert*/Replace/Patch op whose element is a cidx, collects those paths, and rejects any other op whose effective target (path + key) is strictly under one of those paths. Wired into both apply_batch_with_element_flags_update and apply_partial_batch_with_element_flags_update right after the consistency check. - Defense-in-depth cidx arm in the conversion match at apply_batch_structure's occupied-entry path: if the propagation somehow reaches that point with a cidx element, return a clear NotSupported message rather than falling through to "insertion of element under a non tree". Both paths produce a clear, actionable error pointing callers to either split into two batches or use db.insert_into_count_indexed_tree. Regression test batch_inserting_and_populating_fresh_cidx_in_same_batch_is_rejected confirms the preflight fires with the expected NotSupported. 2. Nitpick - Wrap the three remaining .map_err(Error::MerkError) sites in grovedb/src/batch/cidx.rs with contextual Error::CorruptedData(format!(...)) messages that identify the step (pre-state read, post-state read, secondary root hash capture) and include the affected key for diagnostics. Full grovedb lib: 1696/1696. Lib clippy clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rename the cidx-shorthand file `batch/cidx.rs` to the more descriptive `batch/count_indexed_tree.rs` (consistent with `operations/count_indexed_tree.rs` and `operations/proof/count_indexed.rs`) and extract two more cidx-specific chunks from the increasingly oversized `batch/mod.rs`: - `reject_freshly_inserted_cidx_with_descendants` (55 lines): preflight guard that rejects batches creating a cidx primary AND writing under it in the same batch. Moved verbatim into the new file. - `inspect_cidx_overwrite` (newly extracted, replaces a 120-line inline branch): classifies an `op_could_overwrite` insert when tree-override protection is OFF, returning `Some(cidx_path)` to schedule for post-apply cleanup on safe-subset cidx overwrites, `None` for non-cidx, and `Err(NotSupported)` / `Err(InvalidBatchOperation)` for ambiguous overwrites and same-batch descendant writes. `mod.rs` shrinks by ~190 lines; the new file is ~395 lines and is the single source of truth for cidx propagation, preflight, and overwrite classification in the batch pipeline. The pre/post-state helpers also switch from `HashMap` to `BTreeMap` for deterministic iteration (the explicit pure-deletes-first sort is retained, but the input source is now key-sorted by construction). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves conflicts between this PR (CountIndexedTree / cidx) and develop's PR #666 (Element::NotCountedOrSummed wrapper). Both PRs add new Element variants and new arms to the same match statements throughout the element / merk / debugger layers; the resolution is to union both sets. **Critical: on-disk discriminant collision fixed.** Develop assigns `NOT_COUNTED_OR_SUMMED_WRAPPER_DISCRIMINANT = 17`; this PR had `ElementType::CountIndexedTree = 17`. Same on-disk byte, two incompatible meanings. Resolution: shift cidx discriminants by 1 and re-derive the NonCounted twin bytes: - `ElementType::CountIndexedTree` 17 → 18 - `ElementType::ProvableCountIndexedTree` 18 → 19 - `ElementType::NonCountedCountIndexedTree` 145 → 146 (`0x80 | 18`) - `ElementType::NonCountedProvableCountIndexedTree` 146 → 147 (`0x80 | 19`) The `Element` enum variant order is updated to match: `NotCountedOrSummed` keeps variant index 17, cidx variants move to 18 and 19. The bincode auto-generated tag bytes stay aligned with `ElementType`. Updated: - `try_from`, `from_serialized_value`, related doc-comments - assertion / range-check tests for the new wrapper slot at 145 - `is_non_counted` doc range `[128, 146]` → `[128, 147]` Other conflict resolutions union both sets of match arms in: - `grovedb-element/element/helpers.rs` (get/set flags) - `grovedb-element/element/mod.rs` (Element enum, ElementShadow serde) - `merk/element/costs.rs` (get_specialized_cost) - `merk/element/tree_type.rs` (6 wrapper-arm extensions) - `grovedb/debugger.rs` (element_to_grovedbg) And one non-exhaustive-match fix flagged by rustc post-merge: - `grovedb/operations/count_indexed_tree.rs:438` — add `NotCountedOrSummed(_)` to the wrapper-unreachable arm. Verified: - `cargo check -p grovedb` clean - `cargo test --workspace --lib` 2949 tests pass, 0 failures - `cargo fmt --all` applied Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds
Element::CountIndexedTreeandElement::ProvableCountIndexedTree— element types that pair aCountTree-shaped primary Merk with a count-ordered secondary Merk. The pairing turns "top-k by count" and "count in [a, b]" queries fromO(n)intoO(log n + k)while preserving GroveDB's standard proof semantics. Each element points at two child Merks; the parent Merk binds both into itsvalue_hashvia H1-A composition:combine_hash_three(value_hash(elem_bytes), primary_root_hash, secondary_root_hash). The secondary usesProvableCountedMerkNodeso existingAggregateCountOnRangemachinery applies natively, and lives at the derived prefixBlake3(primary_prefix ‖ 0x01).Design doc
docs/book/src/count-indexed-tree.md— full design and shipped-API reference. All major decisions ratified inline (H1-A, S2-B, V1-A, Q1-A, S1-A, W1, C1).Public API surface
Direct (non-batch) APIs:
db.insert_into_count_indexed_tree(...),db.delete_from_count_indexed_tree(...)— dedicated APIs that mirror to the secondary and chain H1-Adb.count_indexed_top_k(...),db.count_indexed_count_range(...)— read APIs in count orderdb.reconcile_count_indexed_tree_secondary(...)— rebuild the secondary from the primary on demandProofs:
db.prove_count_indexed_top_k(...)/GroveDb::verify_count_indexed_top_k(...)— dedicated proof shape for top-kdb.prove_count_indexed_query(...)/GroveDb::verify_count_indexed_query(...)— accepts an arbitraryMerkQueryover the secondary's(count_be ‖ key)keyspaceprove_query/verify_querydescend cidx viacombine_hash_three-aware proof emissionBatch path:
apply_batch/apply_partial_batchsupport cidx primary item-level mutations end-to-endDeleteTree, and item mutations inside a cidx primary all workGroveOp::ReplaceCountIndexedTreeRootKeysvariantIntegrity:
verify_grovedbwalks every cidx primary and asserts (a) chain integrity via H1-A and (b) content consistency between primary'scount_valueand the secondary's entriesTest coverage
~85 cidx-focused tests in
grovedb/src/tests/count_indexed_tree_tests.rs, organized by scenario:cidx → tree → cidx)count_indexed_top_k/count_indexed_count_rangecorrectnessdb.insertrejects cidx primary targetverify_grovedbcontent-consistency check via deliberate corruptionAll 1579 grovedb tests pass (debug + release).
Audit findings + fixes
During development I audited the PR and found a real silent-corruption bug, plus closed two related gaps that the audit exposed:
Nested-cidx batch bubble-up missed the outer's secondary mirror (a8bb34fb). The
mutatesmatch inexecute_ops_on_pathlisted every variant exceptReplaceCountIndexedTreeRootKeys(the new variant introduced for cidx bubble-up). When inner cidx → outer cidx propagation fired, the outer's pre-state HashMap stayed empty for the inner's key, post-apply mirror walked an empty list, outer's secondary stayed stale while H1-A still passed. Subsequent commit fb71003a refactored themutatescheck into a methodGroveOp::can_mutate_child_countwith an exhaustivematch(no wildcard) so adding any futureGroveOpvariant fails to compile until classified — type-system guard against this bug class.verify_grovedbonly checked chain integrity, not content consistency (dd90e6b5). The H1-A walk verified the cidx element'svalue_hashmatches the two Merks' root hashes viacombine_hash_three, but never validated the secondary's contents against the primary'scount_value. That's exactly the gap the nested-cidx bug exploited — the secondary's content was stale but its root hash was internally consistent. Added a content-consistency pass that walks both Merks' raw storage and reports mismatches via sentinel paths (__cidx_primary_orphan__,__cidx_secondary_orphan__,__cidx_count_mismatch__,__cidx_secondary_malformed_key__). Three deliberate-corruption tests confirm each drift class is caught.delete_from_count_indexed_treedidn't clean up tree-entry child storage (4f1d7305). Found by the new property-based stress test in iteration <100. Same class as the storage-orphan bugs fixed earlier indb.deleteand batchDeleteTree. Fix: after removing the cidx entry from the primary, if the deleted entry was a tree, runfind_subtrees+clearon each subtree's storage, and clear the secondary namespace for nested cidx entries.Known limitations + workarounds
primary_root_key/secondary_root_keycould be intended to reuse the old storage or refer to fresh storage; neither resolution is safe to assume). Workaround:DeleteTreethe cidx in one batch (now cleans both namespaces), then re-create in a follow-up batch. Both batches are clean today.db.insert()into a cidx primary target is rejected. Usedb.insert_into_count_indexed_tree(...). The dedicated API handles the dual-Merk write; the generic insert path has no secondary-mirror hook and would produce silent drift.prove_query/verify_querydo not support cidx descents. V0 is a frozen wire format. V1 generic and the dedicatedprove_count_indexed_*entry points both work.Follow-up items (not blocking this PR)
BigSumIndexedTree/ProvableSumTree/ProvableSumIndexedTree(parallel features, separate PRs)CountTreefor representative workloadsBackwards compatibility / on-disk format
NotSummed(2498b97f) involved re-shuffling the variant order to maintain this invariant — documented in that commit.How to review
The commit history is the design narrative. Recommended reading order:
docs/book/src/count-indexed-tree.md)grovedb-element/,grovedb-query/,merk/src/)grovedb/src/operations/count_indexed_tree.rs)grovedb/src/operations/proof/count_indexed.rs)grovedb/src/operations/proof/generate.rs+verify.rs)grovedb/src/lib.rs)grovedb/src/batch/mod.rs)verify_grovedbcontent-consistency check (grovedb/src/lib.rs)grovedb/src/tests/count_indexed_tree_tests.rs)🤖 Generated with Claude Code
A note on test ordering
A few commits in the history carry "to clear codecov 80%" framing in their messages. To be honest, those tests were discovered (via coverage reports flagging untested paths) rather than written ahead of the corresponding implementation. The content of each is still a real functional test — delete behavior, proof round-trips, integrity checks, nested propagation — and they remain in the test suite on their merits. Future contributions should aim to write the test first when possible.
Summary by CodeRabbit