Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

TheBlueMatt · 2025-12-08T14:43:26Z

On restart, LDK expects the chain to be replayed starting from
where it was when objects were last serialized. This is fine in the
normal case, but if there was a reorg and the node which we were
syncing from either resynced or was changed, the last block that we
were synced as of might no longer be available. As a result, it
becomes impossible to figure out where the fork point is, and thus
to replay the chain.

Luckily, changing the block source during a reorg isn't exactly
common, but we shouldn't end up with a bricked node.

To address this, lightning-block-sync allows the user to pass in
Cache which can be used to cache recent blocks and thus allow for
reorg handling in this case. However, serialization for, and a
reasonable default implementation of a Cache was never built.

Instead, here, we start taking a different approach. To avoid
developers having to persist yet another object, we move
BestBlock to storing some number of recent block hashes. This
allows us to find the fork point with just the serialized state.

In conjunction with 403dc1a (which
allows us to disconnect blocks without having the stored header),
this should allow us to replay chain state after a reorg even if
we no longer have access to the top few blocks of the old chain
tip.

While we only really need to store ANTI_REORG_DELAY blocks (as we
generally assume that any deeper reorg won't happen and thus we
don't guarantee we handle it correctly), its nice to store a few
more to be able to handle more than a six block reorg. While other
parts of the codebase may not be entirely robust against such a
reorg if the transactions confirmed change out from under us, its
entirely possible (and, indeed, common) for reorgs to contain
nearly identical transactions.

While we're here, we also parallelize a few parts of the initial sync and utilize the cache to speed it up. As such, this is based on #4147 which is based on #4175..

Now that it has the same MSRV as everything else in the workspace, it doesn't need to live on its own.

Now that our MSRV is above 1.68 we can use the `pin!` macro to avoid having to `Box` various futures, avoiding some allocations, especially in `lightning-net-tokio`, which happens in a tight loop.

Now that our MSRV is 1.75, we can return `impl Trait` from trait methods. Here we use this to clean up `KVStore` methods, dropping the `Pin<Box<dyn ...>>` we had to use to have trait methods return a concrete type. Sadly, there's two places where we can't drop a `Box::pin` until we switch to edition 2024.

Now that our MSRV is 1.75, we can return `impl Trait` from trait methods. Here we use this to clean up `lightning-block-sync` trait methods, dropping the `Pin<Box<dyn ...>>` we had to use to have trait methods return a concrete type.

Now that our MSRV is 1.75, we can return `impl Trait` from trait methods. Here we use this to clean up `lightning` crate trait methods, dropping the `Pin<Box<dyn ...>>`/`AsyncResult` we had to use to have trait methods return a concrete type.

Now that we have an MSRV that supports returning `impl Trait` in trait methods, we can use it to avoid the `Box<dyn ...>` we had spewed all over our BOLT 11 invoice serialization.

Reading `ChannelMonitor`s on startup is one of the slowest parts of LDK initialization. Now that we have an async `KVStore`, there's no need for that, we can simply paralellize their loading, which we do here. Sadly, because Rust futures are pretty unergonomic, we have to add some `unsafe {}` here, but arguing its fine is relatively straightforward.

`tokio::spawn` can be use both to spawn a forever-running background task or to spawn a task which gets `poll`ed independently and eventually returns a result which the callsite wants. In LDK, we have only ever needed the first, and thus didn't bother defining a return type for `FutureSpawner::spawn`. However, in the next commit we'll start using `FutureSpawner` in a context where we actually do want the spawned future's result. Thus, here, we add a result output to `FutureSpawner::spawn`, mirroring the `tokio::spawn` API.

`MonitorUpdatingPersister::read_all_channel_monitors_with_updates` was made to do the IO operations in parallel in a previous commit, however in practice this doesn't provide material parallelism for large routing nodes. Because deserializing `ChannelMonitor`s is the bulk of the work (when IO operations are sufficiently fast), we end up blocked in single-threaded work nearly the entire time. Here, we add an alternative option - a new `read_all_channel_monitors_with_updates_parallel` method which uses the `FutureSpawner` to cause the deserialization operations to proceed in parallel.

When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on startup, we have to make sure to load any `ChannelMonitorUpdate`s and re-apply them as well. For users of async persistence who don't have any `ChannelMonitorUpdate`s (e.g. because they set `maximum_pending_updates` to 0 or, in the future, we avoid persisting updates for small `ChannelMonitor`s), this means two round-trips to the storage backend, one to load the `ChannelMonitor` and one to try to read the next `ChannelMonitorUpdate` only to have it fail. Instead, here, we use `KVStore::list` to fetch the list of stored `ChannelMonitorUpdate`s, which for async `KVStore` users allows us to parallelize the list of update fetching and the `ChannelMonitor` loading itself. Then we know exactly when to stop reading `ChannelMonitorUpdate`s, including reading none if there are none to read. This also avoids relying on `KVStore::read` correctly returning `NotFound` in order to correctly discover when to stop reading `ChannelMonitorUpdate`s.

When reading `ChannelMonitor`s from a `MonitorUpdatingPersister` on startup, we have to make sure to load any `ChannelMonitorUpdate`s and re-apply them as well. Now that we know which `ChannelMonitorUpdate`s to load from `list`ing the entries from the `KVStore` we can parallelize the reads themselves, which we do here. Now, loading all `ChannelMonitor`s from an async `KVStore` requires only three full RTTs - one to list the set of `ChannelMonitor`s, one to both fetch the `ChanelMonitor` and list the set of `ChannelMonitorUpdate`s, and one to fetch all the `ChannelMonitorUpdate`s (with the last one skipped when there are no `ChannelMonitorUpdate`s to read).

On restart, LDK expects the chain to be replayed starting from where it was when objects were last serialized. This is fine in the normal case, but if there was a reorg and the node which we were syncing from either resynced or was changed, the last block that we were synced as of might no longer be available. As a result, it becomes impossible to figure out where the fork point is, and thus to replay the chain. Luckily, changing the block source during a reorg isn't exactly common, but we shouldn't end up with a bricked node. To address this, `lightning-block-sync` allows the user to pass in `Cache` which can be used to cache recent blocks and thus allow for reorg handling in this case. However, serialization for, and a reasonable default implementation of a `Cache` was never built. Instead, here, we start taking a different approach. To avoid developers having to persist yet another object, we move `BestBlock` to storing some number of recent block hashes. This allows us to find the fork point with just the serialized state. In conjunction with 403dc1a (which allows us to disconnect blocks without having the stored header), this should allow us to replay chain state after a reorg even if we no longer have access to the top few blocks of the old chain tip. While we only really need to store `ANTI_REORG_DELAY` blocks (as we generally assume that any deeper reorg won't happen and thus we don't guarantee we handle it correctly), its nice to store a few more to be able to handle more than a six block reorg. While other parts of the codebase may not be entirely robust against such a reorg if the transactions confirmed change out from under us, its entirely possible (and, indeed, common) for reorgs to contain nearly identical transactions.

The deserialization of `ChannelMonitor`, `ChannelManager`, and `OutputSweeper` is implemented for a `(BlockHash, ...)` pair rather than on the object itself. This ensures developers are pushed to think about initial chain sync after deserialization and provides the latest chain sync state conviniently at deserialization-time. In the previous commit we started storing additional recent block hashes in `BestBlock` for use during initial sync to ensure we can handle reorgs while offline if the chain source loses the reorged-out blocks. Here, we move the deserialization routines to be on a `(BestBlock, ...)` pair instead of `(BlockHash, ...)`, providing access to those recent block hashes at deserialization-time.

In 403dc1a we converted the `Listen` disconnect semantics to only pass the fork point, rather than each block being disconnected. We did not, however, update the semantics of `lightning-block-sync`'s `Cache` to reduce patch size. Here we go ahead and do so, dropping `ChainDifference::disconnected_blocks` as well as its no longer needed.

On restart, LDK expects the chain to be replayed starting from where it was when objects were last serialized. This is fine in the normal case, but if there was a reorg and the node which we were syncing from either resynced or was changed, the last block that we were synced as of might no longer be available. As a result, it becomes impossible to figure out where the fork point is, and thus to replay the chain. Luckily, changing the block source during a reorg isn't exactly common, but we shouldn't end up with a bricked node. To address this, `lightning-block-sync` allows the user to pass in `Cache` which can be used to cache recent blocks and thus allow for reorg handling in this case. However, serialization for, and a reasonable default implementation of a `Cache` was never built. Instead, here, we start taking a different approach. To avoid developers having to persist yet another object, we move `BestBlock` to storing some number of recent block hashes. This allows us to find the fork point with just the serialized state. In a previous commit, we moved deserialization of various structs to return the `BestBlock` rather than a `BlockHash`. Here we move to actually using it, taking a `BestBlock` in place of `BlockHash` to `init::synchronize_listeners` and walking the `previous_blocks` list to find the fork point rather than relying on the `Cache`.

In the previous commit, we moved to relying on `BestBlock::previous_blocks` to find the fork point in `lightning-block-sync`'s `init::synchronize_listeners`. Here we now drop the `Cache` parameter as we no longer rely on it. Because we now have no reason to want a persistent `Cache`, we remove the trait from the public interface. However, to keep disconnections reliable we return the `UnboundedCache` we built up during initial sync from `init::synchronize_listeners` which we expect developers to pass to `SpvClient::new`.

In the previous commit we moved to hard-coding `UnboundedCache` in the `lightning-block-sync` interface. This is great, except that its an unbounded cache that can use arbitrary amounts of memory (though never really all that much - its just headers that come in while we're running). Here we simply limit the size, and while we're at it give it a more generic `HeaderCache` name.

In the next commit we'll fetch blocks during initial connection in parallel, which requires a multi-future poller. Here we add a symlink to the existing `lightning` `async_poll.rs` file, making it available in `lightning-block-sync`

In `init::synchronize_listeners` we may end up spending a decent chunk of our time just fetching block data. Here we parallelize that step across up to 36 blocks at a time. On my node with bitcoind on localhost, the impact of this is somewhat muted by block deserialization being the bulk of the work, however a networked bitcoind would likely change that. Even still, fetching a batch of 36 blocks in parallel happens on my node in ~615 ms vs ~815ms in serial.

In `lightning-blocksync::init::synchronize_listeners`, we may have many listeners we want to do a chain diff on. When doing so, we should make sure we utilize our header cache, rather than querying our chain source for every header we need for each listener. Here we do so, inserting into the cache as we do chain diffs. On my node with a bitcoind on localhost, this brings the calculate-differences step of `init::synchronize_listeners` from ~500ms to under 150ms.

ldk-reviews-bot · 2025-12-08T14:43:29Z

👋 Hi! I see this is a draft PR.
I'll wait to assign reviewers until you mark it as ready for review.
Just convert it out of draft status when you're ready for review!

TheBlueMatt and others added 22 commits November 10, 2025 15:26

Move lightning-transaction-sync back into the main workspace

7956af0

Now that it has the same MSRV as everything else in the workspace, it doesn't need to live on its own.

Trivially replace Box::pin with pin! in a few places

2105284

Now that our MSRV is above 1.68 we can use the `pin!` macro to avoid having to `Box` various futures, avoiding some allocations, especially in `lightning-net-tokio`, which happens in a tight loop.

Drop required Boxing of lightning-block-sync Futures

a435228

Now that our MSRV is 1.75, we can return `impl Trait` from trait methods. Here we use this to clean up `lightning-block-sync` trait methods, dropping the `Pin<Box<dyn ...>>` we had to use to have trait methods return a concrete type.

Drop Boxing of iterators during BOLT 11 invoice serialization

b1f1ee2

Now that we have an MSRV that supports returning `impl Trait` in trait methods, we can use it to avoid the `Box<dyn ...>` we had spewed all over our BOLT 11 invoice serialization.

Consolidate all the pub aync utils to native_async

da065c1

Add async_poll.rs to lightning-block-sync

6300116

In the next commit we'll fetch blocks during initial connection in parallel, which requires a multi-future poller. Here we add a symlink to the existing `lightning` `async_poll.rs` file, making it available in `lightning-block-sync`

Silence "elided lifetime has a name" warnings in no-std locking

dcc895f

TheBlueMatt mentioned this pull request Dec 8, 2025

lightning-block-sync: Implement serialization logic for header Cache types #3600

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

TheBlueMatt commented Dec 8, 2025 •

edited

Loading

Uh oh!

ldk-reviews-bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use BestBlock for chain state serialization (and somewhat parallelize init) #4266

Are you sure you want to change the base?

Use BestBlock for chain state serialization (and somewhat parallelize init) #4266

Conversation

TheBlueMatt commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

Use `BestBlock` for chain state serialization (and somewhat parallelize init) #4266

TheBlueMatt commented Dec 8, 2025 •

edited

Loading