This repository was archived by the owner on Jan 20, 2026. It is now read-only.
Replay events during restart to avoid tx missing#211
Merged
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #211 +/- ##
==========================================
+ Coverage 57.92% 58.14% +0.21%
==========================================
Files 249 249
Lines 33918 33935 +17
==========================================
+ Hits 19646 19730 +84
+ Misses 12710 12637 -73
- Partials 1562 1568 +6
|
Kbhat1
approved these changes
Mar 14, 2024
stevenlanders
approved these changes
Mar 14, 2024
philipsu522
approved these changes
Mar 14, 2024
yzang2019
added a commit
that referenced
this pull request
Mar 18, 2024
udpatil
pushed a commit
that referenced
this pull request
Mar 27, 2024
* reformat logs to use simple concatenation with separators (#207) * Use write-lock in (*TxPriorityQueue).ReapMax funcs (#209) ReapMaxBytesMaxGas and ReapMaxTxs funcs in TxPriorityQueue claim > Transactions returned are not removed from the mempool transaction > store or indexes. However, they use a priority queue to accomplish the claim > Transaction are retrieved in priority order. This is accomplished by popping all items out of the whole heap, and then pushing then back in sequentially. A copy of the heap cannot be obtained otherwise. Both of the mentioned functions use a read-lock (RLock) when doing this. This results in a potential scenario where multiple executions of the ReapMax can be started in parallel, and both would be popping items out of the priority queue. In practice, this can be abused by executing the `unconfirmed_txs` RPC call repeatedly. Based on our observations, running it multiple times per millisecond results in multiple threads picking it up at the same time. Such a scenario can be obtained via the WebSocket interface, and spamming `unconfirmed_txs` calls there. The behavior that happens is a `Panic in WSJSONRPC handler` when a queue item unexpectedly disappears for `mempool.(*TxPriorityQueue).Swap`. (`runtime error: index out of range [0] with length 0`) This can additionally lead to a `CONSENSUS FAILURE!!!` if the race condition occurs for `internal/consensus.(*State).finalizeCommit` when it tries to do `mempool.(*TxPriorityQueue).RemoveTx`, but the ReapMax has already removed all elements from the underlying heap. (`runtime error: index out of range [-1]`) This commit switches the lock type to a write-lock (Lock) to ensure no parallel modifications take place. This commit additionally updates the tests to allow parallel execution of the func calls in testing, as to prevent regressions (in case someone wants to downgrade the locks without considering the implications from the underlying heap usage). * Fix root dir for tendermint reindex command (#210) * Replay events during restart to avoid tx missing (#211) --------- Co-authored-by: Denys S <150304777+dssei@users.noreply.github.com> Co-authored-by: Valters Jansons <sigv@users.noreply.github.com> Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com>
yzang2019
added a commit
that referenced
this pull request
Mar 26, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com>
Kbhat1
pushed a commit
that referenced
this pull request
Mar 30, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com>
Kbhat1
pushed a commit
that referenced
this pull request
Apr 16, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com>
yzang2019
added a commit
that referenced
this pull request
Apr 21, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
yzang2019
added a commit
that referenced
this pull request
Apr 21, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
yzang2019
added a commit
that referenced
this pull request
May 14, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
yzang2019
added a commit
that referenced
this pull request
May 14, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
May 30, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Jun 9, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Jun 13, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Jun 20, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Jun 30, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Jul 7, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Aug 15, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Aug 23, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
Kbhat1
pushed a commit
that referenced
this pull request
Oct 22, 2025
* Reduce noisy tendermint logs * Suppress logs for all entering Use peermanager scores for blocksync peers and don't error out on block mismatch (#162) * Use peermanager scores for blocksync peers * Add debug * Randomize * debug * use state to filter * debug * debug * debug * debug * add comments * don't err * revert timeout * Add missing param * Remove flaky test * fix nil * debug * debug * debug * debug --------- Co-authored-by: Yiming Zang <50607998+yzang2019@users.noreply.github.com> Perf: Increase buffer size for pubsub server to boost performance (#167) * Increase buffer size for pubsub server * Add more timeout for test failure * Add more timeout * Fix test split scripts * Fix test split * Fix unit test * Unit test * Unit test [P2P] Optimize block pool requester retry and peer pick up logic (#170) * P2P Improvements: Fix block sync reactor and block pool retry logic Fix block sync auto restart not working as expected (#175) Fix edge case for blocksync (#178) Replay events during restart to avoid tx missing (#211) Add a new config to speed up block sync (#244) * Never switch to consensus due to timeout * Add blocksync peer config to speed up block sync rate * Fix config parsing * Add some logs feat: Exclude unconditional peers when connection limit checking (#245) Improve Peer Score Algorithm (#248) * feat: improve peer scoring algo * debug * debug * more debug * debug TryDiaNext * remove log * fix score type * rever block sync logic * rever block sync logic * rever block sync logic * Add block request log * Add apply block latency * add processEpeerEvent log back * update unit test * update unit test --------- Co-authored-by: yzang2019 <zymfrank@gmail.com> Add logs for blocksync Add log to debug replay Fix Fix setBlock true false Add more log for dialing Add more logs Add logs for missing height Fix Fix log Fix log Fix log Fix bpr pool goroutine Fix bpr pool goroutine Fix goroutine leak Fix goroutine leak
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Describe your changes and provide context
Problem:
This is an edge case, a node shutting down process can be triggered during ApplyBlock, and within ApplyBlock function, there are a few major steps:
The process can go down during any of these 5 steps, and if process went down between step 4 and 5, it would lead to the events not fired and txs not being indexed correctly, even though the blocks are successfully commited. After the node restart, when we replay blocks, we will not replay or re-fire those events because there's no block left to be replayed.
Solution:
It is a bit hard to make sure events always fire correctly during shutdown, because events publish is an async process, there's not really a way to make sure shutdown will always wait until events are all published and subscribed and processed.
So instead of fixing the shutdown logic, we choose to reindex the events after a node restart and during the replay/recover stage. Hence this PR add a function to replay the events even if there's no block to replay, which will then ensure the events are always correctly published regardless when the shutdown happens.
Testing performed to validate your change
Tested on atlantic-2 archive node, we manually add a sleep between each step to repro the ungraceful shutdown bug, and proves this PR does fix the edge case.