Skip to content

Conversation

@freshair18
Copy link
Collaborator

@freshair18 freshair18 commented Jun 27, 2025

Addresses #679

IBD Type determination
A new IBD type is added, currently called pruning_catchup. Disregarding some subtleties, this IBD type is triggered when all the following are fulfilled:

  1. the syncer's and node's pruning points do not match,
  2. the node does know the header of the syncer's pp and can tell it's in the future of its own pp
  3. the node does not have the block body of the syncer's pp - if it does have that block body, it means vanialla syncing can carry on as normal and the node will prune on its own in due time.

Conveniently, negotiate_missing_syncer_chain_segment allows for an easy way to derive the syncer's current pruning point hash.

Validation Before Movement
Before any sensitive and irreversible part, the node first downloads and validates headers from the Syncer until its declared sink. "Destructive" changes would only occur when :

1)the syncer pp is a valid pruning sample (it satisfies the blue_score requirements to be a pp)
2)there are sufficiently many headers built on top of it, specifically, the syncer's sink validated header blue_score is greater than P.b_score+pruning_depth.
3) the syncer pruning point is on the selected chain from the syncer's sink, and any pruning points declared
on headers on its path must be consistent with those already known

Transitional States During Catchup
Updating to a new pruning point, conceptually consists of three stages:

  1. Updating various stores , most prominently pruning_point store, but also virtual store, past_pruning_points, pruning samples, selected chain store, and body_tips store. All those could be updated in a batch. (ignoring pruning samples for which it does not matter). I will refer to this stage as the "pruning point movement".
  2. Downloading the new pruning utxo set from a peer, and verifying it matches the header
  3. Downloading the block bodies of the new pruning point and its anticone - these blocks should only undergo trusted validation as their parents will forever miss block bodies. Hence they require special attention.

During IBD_with_headers_proof (as it previously was), these three stages are performed atomically, using a "discardable" staging consensus, which either goes through all of them and only then is commited, or the current consensus remains active.

Unlike an IBD with headers proof, pruning_catchup inherently consists of building on the information of the current consensus rather than starting from scratch.

The current implementation hence introduces transitional states, with corresponding "flags" for the intermediary cases where the pruning point movement occured but a new pruning utxo set is yet to be downloaded, and or the anticone's block bodies have not all went through verification. The required anticone in particular is maintained by computing and storing it already during the pp movement, with it being computed in relation to the syncer's sink (In theory this maintained set could be shrunk on the fly as more bodies are synced, but at the moment this set is maintained in an all or nothing manner - since sending validated blocks to validation causes no harm and is fast enough).

Given the easy recognition, these intermediary states could just be handled in future syncs. These transitional states are unabusable given the standard security assumption of an honest majority at every pruning period: as we synced sufficiently many headers on top of the pruning point, we know the syncee's Dag on top of it represents the honest network, and hence its PP represents a valid pruning utxo set, and the blocks on the anticone must have had a block_body - or the honest network would have "rejected" this Dag (more precisely, the pp would not have been on the selected chain of it). It is remarked the same assumption was used previously when choosing to commit a staging consensus before all blocks synced underwent validation.

  • decoupling utxo download from pruning_movement also allows for sync_with_headers_proof to commit prior to downloading the utxo set, greatly improving the UI experience of many users who disconnect during the long UTXO download and have to start fresh syncing from anew.

Transitional States Security
Pruning: generally pruning is not activated unless a virtual task is completed, and hence would not be called while in the limbo state of a missing utxo set. To be on the safe side it is confirmed we are not in a transitional state before attempting to naturally advance the pruning_utxo_set. This could perhaps be turned to an assert.
Block Relay: a check is added if the consensus is in a transitional state to immediately send it to IBD if it is.
HandleRelayBlockRequests: The node will ignore requests to send over its sink if it is in a transitional state, to avoid log cluttering and disconnecting due to a potential missing block error.

For simplicity both transitional states are checked in all the above, though at times a distinction could be made between them.

A final sidenote: advancing the pruning_utxo_set (and pruning in general) is also prevented until the virtual is sufficiently above the pruning point: this ultimately stems from technicalities, and may be modified in the future, but it seems fine to prevent pruning while blocks IBD is still taking place.

@freshair18
Copy link
Collaborator Author

freshair18 commented Jun 27, 2025

Open Issues

(a)Even if catch up is possible, it might still be worthwhile to attempt to sync from another peer who had still not pruned and has maintained the missing block body segment, rather than downloading an entire fresh utxo set. A more robust implementation could check with various peers and only initiate catchup once no other choice remains. The same can even apply to the syncer as is: while its pruning point has changed, its retention point/ root may lag behind (intentionally, or even just due to slow pruning) and they will still be able to send the relevant data if the syncee could recognize this scenario.

(b)The transitional states possibly break some implicit assumptions by the node and clients about the status of an active consensus, which demand further thought on how best address them.

namely:
(1) that it has a "valid" pruning_utxo_set corresponding to a known hash. Some clients possibly rely on this. The contents of this field are currently cleared to make way to a new utxoset corresponding to the new pp. This in particular could be circumvented by instead maintaining a "staging" pruning utxo set (several extra GB only during IBD), and only commiting it once it is downloaded and validated. It can still be argued that clients may rely on the pruning_utxo set having a "valid body path" from the utxoset position to the virtual. I find this unlikely, but dare make no absolute claim on the way people use the node.
(2) a more acute issue is that clients and the node may rely on the sink's body to be available at any time (and to a far lesser extent, the same could theoretically apply for any block on the pp's anticone ). This assumption is broken here, and is not easy to fix with the current code structure.

Possible solutions involve: cloning the consensus - which is expensive in both space and time and relatively complex. A variant of this could perhaps, allow consensii to share most but not all of the stores, and thus only clone a very limited amount of storage required touched upon by the transitional state (namely stores effected by body validation.)
A suggestion by @michaelsutton is to "restage" the consensus when it is an unstable state, and replace it with a default consensus in the meanwhile, which despite being empty, does fulfill the invariants. This would require keeping the staging consensus between crashes and disconnects of the node, which should eventually be a goal regardless, as it will allow fresh syncs to fail during headers sync without starting from scratch. This discussion also naturally leads to KIP7 and KIP8.

freshair18 and others added 4 commits July 1, 2025 17:35
* Remove temporary dust prevention mechanism

* Disable uninlined_format_args lint

* Apply workspace lints to all crates

* clippy
@freshair18
Copy link
Collaborator Author

freshair18 commented Jul 9, 2025

Addendum to the above comment on open issues : in hindsight I came to realize (b1) probably also applies to the virtual utxo set which can end up in limbo if the node is reset in very specific times, this seems more acute then the pruning_utxo_set, but I believe it can be solved in similar manner to that described above.

@freshair18
Copy link
Collaborator Author

freshair18 commented Jul 15, 2025

(c) following a "catchup" usually a lot of old data can be pruned right away. For simplicity and safety's sake the current implementation will not start pruning until pruning_depth block data has accumulated on top of the new pruning point. While superficially reasonable, this prevents even the pruning of old data several pruning depths ago. A user repeatedly "catching up" at the worst of times (exactly before triggering pruning every time) could find themselves with unexpectedly large storage, though it is challenging to imagine anything too extreme.
I would say that in this context it is still far better to err on the safe side than try to optimize for this scenario.

coderofstuff
coderofstuff previously approved these changes Nov 8, 2025
Copy link
Collaborator

@coderofstuff coderofstuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disembodied Anticone must be changed in another PR, before a release

Copy link
Collaborator

@coderofstuff coderofstuff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@someone235 @freshair18 up to you if you want to fix these comments here or in a different PR after this is merged

Comment on lines 543 to 545
// Verify that the new pruning point can be safely imported
// and return all new pruning point on path to it that needs to be updated in consensus
fn get_and_verify_novel_pruning_points(&self, new_pruning_point: Hash, syncer_sink: Hash) -> ConsensusResult<VecDeque<Hash>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update the function docs to use /// for rust standard.

}
info!("Setting {new_pruning_point} as the pruning point");
// 4) The pruning points declared on headers on that path must be consistent with those already known by the node:
let pruning_point_read = self.pruning_point_store.upgradable_read();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be an upgradable_read as opposed to read()?


// Verify that the new pruning point can be safely imported
// and return all new pruning point on path to it that needs to be updated in consensus
fn get_and_verify_novel_pruning_points(&self, new_pruning_point: Hash, syncer_sink: Hash) -> ConsensusResult<VecDeque<Hash>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split this into two parts:

  • is_pruning_point_importable - checks whether the new pruning point can be imported
  • get_path_to_pruning_point - gets the pp path to the new pruning point (if possible)
    ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tell me how you think of it but it is a tad unnatural to me: verifying the new pruning point consists of getting these new pruning points and seeing that they form a coherent "chain" is in itself part of the verification.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, let me explain what I'm thinking. The function signature takes in a pruning point (presumably a new one). The name get_and_verify_novel_pruning_points appears to indicate that you are attempting to verify some set of pruning points. It's inconsistent with the input, since you're just passing in one. So if someone were reading this function signature, you are left to wonder "what pruning points are you referring to when you're just passing in one?"

If the intent is as you describe, can I propose a rename of this to get_and_verify_path_to_new_pruning_point. From this name, you can tell there's a reference to a new pruning point and you know you'll be receiving a path to that new pruning point.


// Verify that the new pruning point can be safely imported
// and return all new pruning point on path to it that needs to be updated in consensus
fn get_and_verify_novel_pruning_points(&self, new_pruning_point: Hash, syncer_sink: Hash) -> ConsensusResult<VecDeque<Hash>> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, ok, let me explain what I'm thinking. The function signature takes in a pruning point (presumably a new one). The name get_and_verify_novel_pruning_points appears to indicate that you are attempting to verify some set of pruning points. It's inconsistent with the input, since you're just passing in one. So if someone were reading this function signature, you are left to wonder "what pruning points are you referring to when you're just passing in one?"

If the intent is as you describe, can I propose a rename of this to get_and_verify_path_to_new_pruning_point. From this name, you can tell there's a reference to a new pruning point and you know you'll be receiving a path to that new pruning point.

@coderofstuff
Copy link
Collaborator

coderofstuff commented Nov 9, 2025

How did you conduct tests for this feature? As in, what is the setup so I can run a similar test locally?

someone235
someone235 previously approved these changes Nov 11, 2025
coderofstuff
coderofstuff previously approved these changes Nov 12, 2025
@freshair18 freshair18 dismissed stale reviews from coderofstuff and someone235 via 2ea9727 November 13, 2025 02:23
@someone235 someone235 merged commit 7282223 into kaspanet:master Nov 13, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IBD Handle Syncer Pruning Movement

4 participants