IBD Handle Syncer Pruning Movement #702

freshair18 · 2025-06-27T19:26:42Z

Addresses #679

IBD Type determination
A new IBD type is added, currently called pruning_catchup. Disregarding some subtleties, this IBD type is triggered when all the following are fulfilled:

the syncer's and node's pruning points do not match,
the node does know the header of the syncer's pp and can tell it's in the future of its own pp
the node does not have the block body of the syncer's pp - if it does have that block body, it means vanialla syncing can carry on as normal and the node will prune on its own in due time.

Conveniently, negotiate_missing_syncer_chain_segment allows for an easy way to derive the syncer's current pruning point hash.

Validation Before Movement
Before any sensitive and irreversible part, the node first downloads and validates headers from the Syncer until its declared sink. "Destructive" changes would only occur when :

1)the syncer pp is a valid pruning sample (it satisfies the blue_score requirements to be a pp)
2)there are sufficiently many headers built on top of it, specifically, the syncer's sink validated header blue_score is greater than P.b_score+pruning_depth.
3) the syncer pruning point is on the selected chain from the syncer's sink, and any pruning points declared
on headers on its path must be consistent with those already known

Transitional States During Catchup
Updating to a new pruning point, conceptually consists of three stages:

Updating various stores , most prominently pruning_point store, but also virtual store, past_pruning_points, pruning samples, selected chain store, and body_tips store. All those could be updated in a batch. (ignoring pruning samples for which it does not matter). I will refer to this stage as the "pruning point movement".
Downloading the new pruning utxo set from a peer, and verifying it matches the header
Downloading the block bodies of the new pruning point and its anticone - these blocks should only undergo trusted validation as their parents will forever miss block bodies. Hence they require special attention.

During IBD_with_headers_proof (as it previously was), these three stages are performed atomically, using a "discardable" staging consensus, which either goes through all of them and only then is commited, or the current consensus remains active.

Unlike an IBD with headers proof, pruning_catchup inherently consists of building on the information of the current consensus rather than starting from scratch.

The current implementation hence introduces transitional states, with corresponding "flags" for the intermediary cases where the pruning point movement occured but a new pruning utxo set is yet to be downloaded, and or the anticone's block bodies have not all went through verification. The required anticone in particular is maintained by computing and storing it already during the pp movement, with it being computed in relation to the syncer's sink (In theory this maintained set could be shrunk on the fly as more bodies are synced, but at the moment this set is maintained in an all or nothing manner - since sending validated blocks to validation causes no harm and is fast enough).

Given the easy recognition, these intermediary states could just be handled in future syncs. These transitional states are unabusable given the standard security assumption of an honest majority at every pruning period: as we synced sufficiently many headers on top of the pruning point, we know the syncee's Dag on top of it represents the honest network, and hence its PP represents a valid pruning utxo set, and the blocks on the anticone must have had a block_body - or the honest network would have "rejected" this Dag (more precisely, the pp would not have been on the selected chain of it). It is remarked the same assumption was used previously when choosing to commit a staging consensus before all blocks synced underwent validation.

decoupling utxo download from pruning_movement also allows for sync_with_headers_proof to commit prior to downloading the utxo set, greatly improving the UI experience of many users who disconnect during the long UTXO download and have to start fresh syncing from anew.

Transitional States Security
Pruning: generally pruning is not activated unless a virtual task is completed, and hence would not be called while in the limbo state of a missing utxo set. To be on the safe side it is confirmed we are not in a transitional state before attempting to naturally advance the pruning_utxo_set. This could perhaps be turned to an assert.
Block Relay: a check is added if the consensus is in a transitional state to immediately send it to IBD if it is.
HandleRelayBlockRequests: The node will ignore requests to send over its sink if it is in a transitional state, to avoid log cluttering and disconnecting due to a potential missing block error.

For simplicity both transitional states are checked in all the above, though at times a distinction could be made between them.

A final sidenote: advancing the pruning_utxo_set (and pruning in general) is also prevented until the virtual is sufficiently above the pruning point: this ultimately stems from technicalities, and may be modified in the future, but it seems fine to prevent pruning while blocks IBD is still taking place.

freshair18 · 2025-06-27T19:27:09Z

Open Issues

(a)Even if catch up is possible, it might still be worthwhile to attempt to sync from another peer who had still not pruned and has maintained the missing block body segment, rather than downloading an entire fresh utxo set. A more robust implementation could check with various peers and only initiate catchup once no other choice remains. The same can even apply to the syncer as is: while its pruning point has changed, its retention point/ root may lag behind (intentionally, or even just due to slow pruning) and they will still be able to send the relevant data if the syncee could recognize this scenario.

(b)The transitional states possibly break some implicit assumptions by the node and clients about the status of an active consensus, which demand further thought on how best address them.

namely:
(1) that it has a "valid" pruning_utxo_set corresponding to a known hash. Some clients possibly rely on this. The contents of this field are currently cleared to make way to a new utxoset corresponding to the new pp. This in particular could be circumvented by instead maintaining a "staging" pruning utxo set (several extra GB only during IBD), and only commiting it once it is downloaded and validated. It can still be argued that clients may rely on the pruning_utxo set having a "valid body path" from the utxoset position to the virtual. I find this unlikely, but dare make no absolute claim on the way people use the node.
(2) a more acute issue is that clients and the node may rely on the sink's body to be available at any time (and to a far lesser extent, the same could theoretically apply for any block on the pp's anticone ). This assumption is broken here, and is not easy to fix with the current code structure.

Possible solutions involve: cloning the consensus - which is expensive in both space and time and relatively complex. A variant of this could perhaps, allow consensii to share most but not all of the stores, and thus only clone a very limited amount of storage required touched upon by the transitional state (namely stores effected by body validation.)
A suggestion by @michaelsutton is to "restage" the consensus when it is an unstable state, and replace it with a default consensus in the meanwhile, which despite being empty, does fulfill the invariants. This would require keeping the staging consensus between crashes and disconnects of the node, which should eventually be a goal regardless, as it will allow fresh syncs to fail during headers sync without starting from scratch. This discussion also naturally leads to KIP7 and KIP8.

* Remove temporary dust prevention mechanism * Disable uninlined_format_args lint * Apply workspace lints to all crates * clippy

…aspa into pruning_catch_up

freshair18 · 2025-07-09T14:34:00Z

Addendum to the above comment on open issues : in hindsight I came to realize (b1) probably also applies to the virtual utxo set which can end up in limbo if the node is reset in very specific times, this seems more acute then the pruning_utxo_set, but I believe it can be solved in similar manner to that described above.

freshair18 · 2025-07-15T01:16:34Z

(c) following a "catchup" usually a lot of old data can be pruned right away. For simplicity and safety's sake the current implementation will not start pruning until pruning_depth block data has accumulated on top of the new pruning point. While superficially reasonable, this prevents even the pruning of old data several pruning depths ago. A user repeatedly "catching up" at the worst of times (exactly before triggering pruning every time) could find themselves with unexpectedly large storage, though it is challenging to imagine anything too extreme.
I would say that in this context it is still far better to err on the safe side than try to optimize for this scenario.

…uning sample?

coderofstuff

Disembodied Anticone must be changed in another PR, before a release

coderofstuff

@someone235 @freshair18 up to you if you want to fix these comments here or in a different PR after this is merged

consensus/src/consensus/mod.rs

coderofstuff · 2025-11-09T05:16:22Z

consensus/src/consensus/mod.rs

+    // Verify that the new pruning point can be safely imported
+    // and return all new pruning point on path to it that needs to be updated in consensus
+    fn get_and_verify_novel_pruning_points(&self, new_pruning_point: Hash, syncer_sink: Hash) -> ConsensusResult<VecDeque<Hash>> {


Update the function docs to use /// for rust standard.

coderofstuff · 2025-11-09T05:24:47Z

consensus/src/consensus/mod.rs

+        }
+        info!("Setting {new_pruning_point} as the pruning point");
+        // 4) The pruning points declared on headers on that path must be consistent with those already known by the node:
+        let pruning_point_read = self.pruning_point_store.upgradable_read();


Why does this need to be an upgradable_read as opposed to read()?

coderofstuff · 2025-11-09T05:26:44Z

consensus/src/consensus/mod.rs

+
+    // Verify that the new pruning point can be safely imported
+    // and return all new pruning point on path to it that needs to be updated in consensus
+    fn get_and_verify_novel_pruning_points(&self, new_pruning_point: Hash, syncer_sink: Hash) -> ConsensusResult<VecDeque<Hash>> {


Can you split this into two parts:

is_pruning_point_importable - checks whether the new pruning point can be imported

get_path_to_pruning_point - gets the pp path to the new pruning point (if possible)
?

Tell me how you think of it but it is a tad unnatural to me: verifying the new pruning point consists of getting these new pruning points and seeing that they form a coherent "chain" is in itself part of the verification.

Hmm, ok, let me explain what I'm thinking. The function signature takes in a pruning point (presumably a new one). The name get_and_verify_novel_pruning_points appears to indicate that you are attempting to verify some set of pruning points. It's inconsistent with the input, since you're just passing in one. So if someone were reading this function signature, you are left to wonder "what pruning points are you referring to when you're just passing in one?"

If the intent is as you describe, can I propose a rename of this to get_and_verify_path_to_new_pruning_point. From this name, you can tell there's a reference to a new pruning point and you know you'll be receiving a path to that new pruning point.

consensus/src/consensus/mod.rs

consensus/core/src/errors/pruning.rs

protocol/flows/src/v5/blockrelay/flow.rs

database/src/registry.rs

rpc/service/src/service.rs

consensus/src/model/stores/pruning_meta.rs

consensus/src/consensus/mod.rs

coderofstuff · 2025-11-09T18:15:29Z

consensus/src/consensus/mod.rs

+
+    // Verify that the new pruning point can be safely imported
+    // and return all new pruning point on path to it that needs to be updated in consensus
+    fn get_and_verify_novel_pruning_points(&self, new_pruning_point: Hash, syncer_sink: Hash) -> ConsensusResult<VecDeque<Hash>> {


Hmm, ok, let me explain what I'm thinking. The function signature takes in a pruning point (presumably a new one). The name get_and_verify_novel_pruning_points appears to indicate that you are attempting to verify some set of pruning points. It's inconsistent with the input, since you're just passing in one. So if someone were reading this function signature, you are left to wonder "what pruning points are you referring to when you're just passing in one?"

If the intent is as you describe, can I propose a rename of this to get_and_verify_path_to_new_pruning_point. From this name, you can tell there's a reference to a new pruning point and you know you'll be receiving a path to that new pruning point.

consensus/src/processes/pruning_proof/apply.rs

coderofstuff · 2025-11-09T18:23:31Z

How did you conduct tests for this feature? As in, what is the setup so I can run a similar test locally?

protocol/flows/src/ibd/flow.rs

consensus/src/consensus/factory.rs

consensus/core/src/api/mod.rs

consensus/src/consensus/mod.rs

freshair18 added 10 commits May 28, 2025 12:58

naive catchup, utxo sync flag

0b857a5

intrusive pruning point update

5a3e1a7

catchup new pruning points import functionality fix

82db559

minor fixes of utxo syncing safety

fb547b8

improve sync flag pruning safety

876bc4d

improved trusted_bodies_syncing

1fb0874

minor fixes

e773857

error fix, and relay blocks improvements

565c629

clippy

fa0ceb6

Merge branch 'master' into pruning_catch_up

24035fb

freshair18 and others added 4 commits July 1, 2025 17:35

fixed should_sync

0796f9e

validate pruning points in catchup fix

57e32e4

Remove temporary dust prevention mechanism (kaspanet#698)

6e09393

* Remove temporary dust prevention mechanism * Disable uninlined_format_args lint * Apply workspace lints to all crates * clippy

download trusted bodies before utxo

989cd5b

freshair18 force-pushed the pruning_catch_up branch from d755a1e to 989cd5b Compare July 1, 2025 15:01

freshair18 and others added 5 commits July 1, 2025 18:08

deleted redundant files

3db6619

Merge branch 'master' into pruning_catch_up

9493e60

fixed prning_point update bug

b25fc7c

fixed more edge cases

21afb01

Merge branch 'pruning_catch_up' of https://github.com/ushinar/rusty-k…

a66e76f

…aspa into pruning_catch_up

freshair18 added 4 commits July 10, 2025 03:45

missing batch write

6eba3f1

formatting

92d8250

halt prune if sync was suddenly initiated

d87f583

prevents pruning freeze due to recovery before catching up

056ec3d

freshair18 and others added 3 commits July 15, 2025 18:07

removed constantly repeating info!

f0a64ab

solved a deadlock

67e885a

Merge branch 'master' into pruning_catch_up

c997b8b

freshair18 and others added 2 commits November 8, 2025 02:41

extra infos regarding utxoset download + descriptive errors for is pr…

e23a0bf

…uning sample?

Minor style fix

1283899

coderofstuff previously approved these changes Nov 8, 2025

View reviewed changes

coderofstuff reviewed Nov 9, 2025

View reviewed changes

freshair18 added 3 commits November 9, 2025 17:06

addressed a few more comments

d0a7adf

merge recent push

fe81dd2

removed IBD type none

2c661f1

freshair18 dismissed coderofstuff’s stale review via 2c661f1 November 9, 2025 15:42

coderofstuff requested changes Nov 9, 2025

View reviewed changes

changed terminology + minor fixes

880c3c0

coderofstuff reviewed Nov 10, 2025

View reviewed changes

protocol/flows/src/ibd/flow.rs Show resolved Hide resolved

coderofstuff requested a review from michaelsutton November 10, 2025 21:59

freshair18 and others added 6 commits November 11, 2025 14:28

name clarity fix

0a4e21c

anticone storage improved interface

d192e47

refactored initial consensus values caching

ef3aeda

drop utxo stable flag in a staging consensus

eef733a

Merge branch 'master' into pruning_catch_up

34f8305

changed cache->set

80e7c0a

someone235 requested changes Nov 11, 2025

View reviewed changes

consensus/src/consensus/factory.rs Outdated Show resolved Hide resolved

consensus/core/src/api/mod.rs Outdated Show resolved Hide resolved

consensus/src/consensus/mod.rs Outdated Show resolved Hide resolved

freshair18 added 2 commits November 11, 2025 15:37

fixed comment

b578407

name change + separation line

10cb31d

someone235 previously approved these changes Nov 11, 2025

View reviewed changes

coderofstuff previously approved these changes Nov 12, 2025

View reviewed changes

Merge branch 'master' into pruning_catch_up

2ea9727

freshair18 dismissed stale reviews from coderofstuff and someone235 via 2ea9727 November 13, 2025 02:23

Merge branch 'kaspanet:master' into pruning_catch_up

2e9a831

someone235 approved these changes Nov 13, 2025

View reviewed changes

someone235 merged commit 7282223 into kaspanet:master Nov 13, 2025
6 checks passed

IBD Handle Syncer Pruning Movement #702

IBD Handle Syncer Pruning Movement #702

Uh oh!

Conversation

freshair18 commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freshair18 commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freshair18 commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

freshair18 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderofstuff left a comment

Choose a reason for hiding this comment

Uh oh!

coderofstuff left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderofstuff Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderofstuff Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderofstuff Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

freshair18 Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

coderofstuff Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderofstuff Nov 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderofstuff commented Nov 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

freshair18 commented Jun 27, 2025 •

edited

Loading

freshair18 commented Jun 27, 2025 •

edited

Loading

freshair18 commented Jul 9, 2025 •

edited

Loading

freshair18 commented Jul 15, 2025 •

edited

Loading

coderofstuff commented Nov 9, 2025 •

edited

Loading