WIP: Harden deferred notebook import/replay to avoid sync stalls and retry thrash by blakebyrnes · Pull Request #409 · argonprotocol/mainchain

blakebyrnes · 2026-02-23T15:32:54Z

Why

When notebook audits are temporarily unavailable, import was mixing safety checks with network recovery work in ways that could hurt liveness. In particular, the previous path could hold import on notary connection/update work and could emit MissingState under deferred-queue saturation, which can lead to retry churn.

What changed

Added an import-safe notebook audit verification mode with bounded wait and no inline notary update/connect side effects.
Kept background notary update behavior for normal catch-up paths.
Changed deferred queue saturation to header-only import instead of MissingState.
Preserved justifications on deferred header split and allowed justified/finalized upgrades to reimport.
Bounded replay scan work per pass to avoid long blocked scans.

Edge cases handled

Missing parent state during execution paths.
Deferred queue saturation under sustained load.
Replay queues with mixed ready/unready entries.
Deferred blocks that later arrive with justification/finality upgrades.
Unsafe defer attempts with unresolved intermediates.

Impact

Lower risk of import stalls during transient notary/archive outages.
Lower risk of retry-thrash behavior from saturation paths.
Better deterministic replay recovery while preserving existing consensus safety checks.

Validation

cargo test -p argon-node-consensus --lib
cargo make fmt
cargo make lint

Copilot

Pull request overview

This PR hardens deferred notebook import and replay logic to prevent sync stalls and retry thrashing during transient notary/archive unavailability. The changes introduce bounded timeouts, header-only fallback behavior when queues are saturated, and separate import-safe vs. background notebook verification modes.

Changes:

Adds import-safe notebook verification with bounded timeouts and no side effects
Changes deferred queue saturation behavior from MissingState to header-only import
Introduces bounded replay scanning to prevent long-running queue processing

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
node/consensus/src/test.rs	Adds comprehensive test coverage for deferred import, queue saturation, and replay recovery scenarios
node/consensus/src/pending_import_replay.rs	Implements pending import replay queue with persistence, bounded scanning, and header-only fallback logic
node/consensus/src/notary_client.rs	Refactors notebook audit verification to support bounded timeouts and import-safe mode without notary updates
node/consensus/src/mock_importer.rs	Provides test infrastructure including mock chain client and importer factory functions
node/consensus/src/lib.rs	Moves test module to separate file and adds pending_import_replay module
node/consensus/src/import_queue.rs	Refactors import logic with pre-import gates, deferred import handling, and background replay task
node/consensus/src/error.rs	Adds new error variants for deferred audits and pending import constraints, plus retryable error detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

node/consensus/src/test.rs

node/consensus/src/notary_client.rs

node/consensus/src/pending_import_replay.rs

node/consensus/src/import_queue.rs

Add a persisted pending-import replay queue and wire import-gating for deferred full-body execution.\n\n- persist/recover pending full imports via aux storage\n- replay deferred imports when parent state and notebook prerequisites are ready\n- keep header-only imports bounded with queue capacity and explicit queue-full behavior\n- split consensus test scaffolding into dedicated mock and test modules\n- harden notary audit catch-up to defer on unavailable audits instead of misclassifying as invalid digest fix(consensus): harden deferred notebook import and replay Block import had multiple failure modes when notebook audits were temporarily unavailable: import-time verification could pull in notary update/connect work while the import lock was held, deferred queue saturation returned MissingState and could trigger retry churn, and deferred reimport short-circuits could skip justification/finality upgrades. This change adds an import-safe notebook verification path with bounded wait and no inline notary updates, treats deferred queue saturation as header-only import (instead of MissingState), preserves justifications on deferred header split, and allows justified/finalized upgrades to reimport. Replay scanning is also bounded per pass to avoid long blocked scans. Impact: lower risk of sync stalls and peer-thrashing during transient notary/archive outages while preserving consensus safety checks and deterministic recovery behavior. Validation: cargo test -p argon-node-consensus --lib, cargo make fmt, cargo make lint.

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-26T19:52:37Z

node/consensus/src/pending_import_replay.rs

+		if pending_imports.len() >= MAX_PENDING_IMPORTS {
+			warn!(
+				block_hash = ?pending_import.hash,
+				number = ?pending_import.block.header.number(),
+				queue_len = pending_imports.len(),
+				"Pending replay queue is full while requeueing deferred import; keeping deferred import"
+			);
+		}
+		pending_imports.push(pending_import);
+		sort_pending_imports(&mut pending_imports);
+		self.persist_snapshot(&pending_imports);


requeue_pending_import logs when the queue is at/above MAX_PENDING_IMPORTS, but it still pushes the entry anyway. This defeats the queue bound and can grow the in-memory and persisted aux payload beyond the intended cap under repeated retries (the warning message also says "queue is full" but behavior still enqueues). Consider returning early (dropping or deferring persistence) when full, or evicting/compacting before pushing so pending_imports.len() never exceeds MAX_PENDING_IMPORTS.

Copilot · 2026-02-26T19:52:38Z

node/consensus/src/import_queue.rs

+		let mut replayed = 0usize;
+		while let Some((pending_import, replay_context)) = self
+			.pending_full_import_queue
+			.dequeue_ready_for_replay(&self.notary_client)
+			.await
+		{
+			replayed = replayed.saturating_add(1);
+			let mut replay_retry_block =


replayed is incremented in replay_pending_full_imports but never used. If the project treats warnings as errors (e.g., via clippy/lint settings), this will fail the build. Either remove it or use it (e.g., for a debug/info log or metrics) to justify keeping the counter.

Copilot · 2026-02-26T19:52:38Z

node/consensus/src/mock_importer.rs

+		sp_blockchain::Info {
+			finalized_hash: fin.1,
+			finalized_number: fin.0,
+			finalized_state: None,
+			best_hash: best.1,
+			best_number: best.0,
+			block_gap: None,
+			genesis_hash: best.1,
+			number_leaves: 0,
+		}


In the MemChain test client, info().genesis_hash is set to best.1, which changes as best changes. genesis_hash is expected to be stable and refer to the chain's genesis block; using the current best hash can break any logic that relies on client.info().genesis_hash (e.g., compute puzzle key block derivation). Store the genesis hash once in MemChain and return that here instead.

blakebyrnes changed the title ~~Harden deferred notebook import/replay to avoid sync stalls and retry thrash~~ WIP: Harden deferred notebook import/replay to avoid sync stalls and retry thrash Feb 23, 2026

blakebyrnes marked this pull request as draft February 23, 2026 15:33

blakebyrnes marked this pull request as ready for review February 26, 2026 18:15

Copilot AI review requested due to automatic review settings February 26, 2026 18:15

Copilot AI reviewed Feb 26, 2026

View reviewed changes

node/consensus/src/test.rs Outdated Show resolved Hide resolved

node/consensus/src/notary_client.rs Outdated Show resolved Hide resolved

node/consensus/src/pending_import_replay.rs Outdated Show resolved Hide resolved

node/consensus/src/import_queue.rs Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of blakebyrnes February 26, 2026 18:21 View session

blakebyrnes force-pushed the pr/import-replay-security-flow-analysis branch from 7afce34 to 968e9a5 Compare February 26, 2026 19:38

Copilot AI review requested due to automatic review settings February 26, 2026 19:49

blakebyrnes force-pushed the pr/import-replay-security-flow-analysis branch from 968e9a5 to b335547 Compare February 26, 2026 19:49

Copilot started reviewing on behalf of blakebyrnes February 26, 2026 19:49 View session

Copilot AI reviewed Feb 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Harden deferred notebook import/replay to avoid sync stalls and retry thrash#409

WIP: Harden deferred notebook import/replay to avoid sync stalls and retry thrash#409
blakebyrnes wants to merge 1 commit intomainfrom
pr/import-replay-security-flow-analysis

blakebyrnes commented Feb 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Copilot AI Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

blakebyrnes commented Feb 23, 2026

Why

What changed

Edge cases handled

Impact

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants