Problem
When a chain source produces an update for LocalChain, the update is typically seeded from local's existing checkpoints and shares a long suffix with local — only the tip (and, in reorg cases, a short prefix near the tip) differs. The current merge walks every checkpoint in the update (unless if the underlying pointer matches) to find the point of agreement, even though the bulk of the update is usually redundant with what local already has.
We want an O(1) signal, usable at each step of the merge walk, that tells us "the suffix from this checkpoint back matches local" so we can terminate early.
Proposal
Add a cumulative_xor: [u8; 32] field to each CheckPoint, defined as the XOR of this checkpoint's block hash with all block hashes below it back to the chain's base. Maintained incrementally on insert() as parent.cumulative_xor ^ self.hash.
During merge, when walking update and local in lockstep at matching heights, if update_cp.cumulative_xor == local_cp.cumulative_xor, the suffixes agree and the walk can terminate.
Behavior
- When update and local share a suffix with matching heights (the expected case for updates seeded from local), the walk terminates as soon as cumulative XORs match.
- When update and local sample at different heights — e.g., the update originated from a differently-structured source — the XORs will not match even if the underlying blockchains agree, and the merge falls through to the full walk. This is a performance miss, not a correctness issue.
- The field is an early-termination hint. The merge remains structurally correct by walk; XOR only determines when the walk can stop.
Testing
- Cumulative XOR correctly computed on construction and maintained across
insert().
- Equal suffixes (same hashes at same heights) produce equal cumulative XOR at the top of the suffix.
- No-reorg case: update's parent checkpoint matches local's tip — walk terminates immediately after the new tip.
- Reorg case: divergence at tip, suffix below agrees — walk terminates at the first matching height below the reorg.
- Differently-sampled chains: XORs differ, merge falls through to full walk, result is correct.
- Edge cases: empty chain, single-block chain, update whose base differs from local's base.
Benchmarks
Numbers for the common case (frequent sync, long shared suffix) and the cold-catchup case (short shared suffix) to quantify the win and the per-insert overhead against baseline.
Open questions
- Under
CheckPoint<H> generalization: XOR block hashes in both the lightweight and verified paths so fingerprints are comparable across representations, or keep semantics path-specific?
- Is the 32-byte-per-checkpoint memory cost acceptable across all
bdk_core usage, or should this be opt-in via a wrapper type?
Problem
When a chain source produces an update for
LocalChain, the update is typically seeded from local's existing checkpoints and shares a long suffix with local — only the tip (and, in reorg cases, a short prefix near the tip) differs. The current merge walks every checkpoint in the update (unless if the underlying pointer matches) to find the point of agreement, even though the bulk of the update is usually redundant with what local already has.We want an O(1) signal, usable at each step of the merge walk, that tells us "the suffix from this checkpoint back matches local" so we can terminate early.
Proposal
Add a
cumulative_xor: [u8; 32]field to eachCheckPoint, defined as the XOR of this checkpoint's block hash with all block hashes below it back to the chain's base. Maintained incrementally oninsert()asparent.cumulative_xor ^ self.hash.During merge, when walking update and local in lockstep at matching heights, if
update_cp.cumulative_xor == local_cp.cumulative_xor, the suffixes agree and the walk can terminate.Behavior
Testing
insert().Benchmarks
Numbers for the common case (frequent sync, long shared suffix) and the cold-catchup case (short shared suffix) to quantify the win and the per-insert overhead against baseline.
Open questions
CheckPoint<H>generalization: XOR block hashes in both the lightweight and verified paths so fingerprints are comparable across representations, or keep semantics path-specific?bdk_coreusage, or should this be opt-in via a wrapper type?