Skip to content

Restructure electrum/esplora sync logic#461

Merged
notmandatory merged 14 commits intobitcoindevkit:masterfrom
LLFourn:sync_pipeline
Nov 25, 2021
Merged

Restructure electrum/esplora sync logic#461
notmandatory merged 14 commits intobitcoindevkit:masterfrom
LLFourn:sync_pipeline

Conversation

@LLFourn
Copy link
Copy Markdown
Collaborator

@LLFourn LLFourn commented Oct 28, 2021

Description

This PR does dependency inversion on the previous sync logic for electrum and esplora captured in the trait ElectrumLikeSync. This means that the sync logic does not reference the blockchain at all. Instead the blockchain asks the sync logic (in script_sync.rs) what it needs to continue the sync and tries to retrieve it.

The initial purpose of doing this is to remove invocations of maybe_await in the abstract sync logic in preparation for completely removing maybe_await in the future. The other major benefit is it gives a lot more freedom for the esplora logic to use the rich data from the responses to complete the sync with less HTTP requests than it did previously.

List of changes

  • sync logic moved to script_sync.rs and ElectrumLikeSync is gone.
  • esplora makes one http request per sync address. This means it makes half the number of http requests for a fully synced wallet and N*M less requests for a wallet which has N new transactions with M unique input transactions.
  • electrum and esplora save less raw transactions in the database. Electrum still requests input transactions for each of its transactions to calculate the fee but it does not save them to the database anymore.
  • The ureq and reqwest blockchain configuration is now unified into the same struct. This is the only API change. read_timeout and write_timeout have been removed in favor of a single timeout option which is set in both ureq and reqwest.
  • ureq now does concurrent (parallel) requests using threads.
  • An previously unnoticed bug has been fixed where by sending a lot of double spending transactions to the same address you could trick a bdk Esplora wallet into thinking it had a lot of unconfirmed coins. This is because esplora doesn't delete double spent transactions from its indexes immediately (not sure if this is a bug or a feature). A blockchain test is added for this.
  • BONUS: The second commit in this PR fixes the feerate calculation for esplora and adds a test (the previous algorithm didn't work at all). I could have made a separate PR but since I was touching this file a lot I decided to fix it here.

Notes to the reviewers

  • The most important thing to review is the the logic in script_sync.rs is sound.
  • Look at the two commits separately.
  • I think CI is failing because of MSRV problems again!
  • It would be cool to measure how much sync time is improved for your existing wallets/projects. For gun the speed improvements for modest but it is at least hammering the esplora server much less.
  • I noticed the performance of reqwest in blocking is much worse in this patch than previously. This is because somehow reqwest is not re-using the connection for each request in this new code. I have no idea why. The plan is to get rid of the blocking reqwest implementation in a follow up PR.

Checklists

All Submissions:

  • I've signed all my commits

@LLFourn LLFourn force-pushed the sync_pipeline branch 10 times, most recently from 7a488e5 to 0ebf789 Compare October 29, 2021 06:41
Copy link
Copy Markdown
Member

@RCasatta RCasatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

I did some tests with the test wallet I used in #166 and performance are much better in esplora;

  • before first:16s following:10s
  • now: 6s

electrum performance is similar to master, just slightly less now I think because of the HashSet containing less than chunk_size element

Comment thread src/blockchain/utils.rs
Comment thread src/blockchain/electrum.rs Outdated
Comment thread src/blockchain/script_sync.rs Outdated
Comment thread src/blockchain/electrum.rs Outdated
Comment thread src/blockchain/electrum.rs Outdated
Comment thread src/blockchain/script_sync.rs
"{}/scripthash/{}/txs/chain/{}",
self.url, script_hash, last_seen
),
None => format!("{}/scripthash/{}/txs", self.url, script_hash),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This esplora endpoint is very informative, the downside is that we always ask for full information even we may already have it. I wonder if we can use the endpoint without /txs to have a faster response from the server and request the data only if we recognize it has changed. The first sync will be slower, but the following should be faster and use less bandwith

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting point. I don't think we have a better endpoint for finding txs assoiciated with an address? I think there's certainly a lot of room for avoiding this when not doing a full sync. So for the approach I'd say we introduce a partial sync. For the case of esplora there a bunch of other things you can do like asking about outspend to see if something has been spent or using /scripthash/:hash/utxo to see if an address has new utxos.

Another idea is you could even have a mode where a wallet doesn't keep track of all of its tx history but only keeps track of current utxos.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a better endpoint for finding txs associated with an address?

I was imagining GET /scripthash/:hash could give hints if a call to GET /scripthash/:hash/txs is needed and be faster cause it doesn't contain txs information.

However, I think this and the different sync types you are suggesting can be investigated in other PRs since this is already a big improvement in comparison to what we have

Copy link
Copy Markdown
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concept + tACK bfd96a5

Overall agreed with the idea of simplifying syncing interface. The current structure of sync logic seems very methodical and easy to extend into other blockchain type. Ran the blockchain tests manually, to see everything is working ok.

Couldn't do sync bench-marking with current master. Will do and report back.

Below are some conceptual questions and nits.

Thanks for working on this, it is a lot of work, and I learned a ton reviewing this.

Comment thread src/blockchain/script_sync.rs Outdated
Comment thread src/blockchain/script_sync.rs
Comment thread src/blockchain/script_sync.rs
Comment thread src/blockchain/utils.rs
Comment thread src/blockchain/electrum.rs
Comment thread src/blockchain/esplora/reqwest.rs Outdated
Comment on lines +179 to +175
debug!("found tx_details for {}", txid);
assert_eq!(tx.txid(), *txid);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should assert before the log? Also is there any generic guaranties that requirement and satisfaction lists will always line up? Maybe we should specifically handle that somewhere?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should assert before the log?

I don't think we need to micromanage the placement of this.

Also is there any generic guaranties that requirement and satisfaction lists will always line up? Maybe we should specifically handle that somewhere?

I'm not sure there is anything you can do about this. It's a programmer error if they don't line up (hence the assert).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that makes sense. It depends on the client's of sync logic. Also was wondering is there any specific reason for putting the assert only in TxReq? It isn't there for other reqs.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you can see another spot where we can do an assert to check consistency let's do it. It's possible I've missed one.

Comment thread src/blockchain/esplora/reqwest.rs
Comment thread src/blockchain/esplora/reqwest.rs
Comment thread src/blockchain/esplora/reqwest.rs
Copy link
Copy Markdown
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReACK 77266c4

rajarshimaitra added a commit to rajarshimaitra/bdk that referenced this pull request Nov 8, 2021
This updates the rpc backend sync logic to use the script_sync pattern
added in bitcoindevkit#461.

A new stop_gap parameter is added to the rpc blockchain config, which
determines termination of sync loop.
rajarshimaitra added a commit to rajarshimaitra/bdk that referenced this pull request Nov 8, 2021
This updates the rpc backend sync logic to use the script_sync pattern
added in bitcoindevkit#461.

A new stop_gap parameter is added to the rpc blockchain config, which
determines termination of sync loop.
RCasatta
RCasatta previously approved these changes Nov 9, 2021
Copy link
Copy Markdown
Member

@RCasatta RCasatta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs a rebase because of the conflict in CHANGELOG, other than that
ACK 77266c4

@RCasatta RCasatta dismissed their stale review November 10, 2021 08:53

I temporary remove approval before discussing for fee optionality

@LLFourn
Copy link
Copy Markdown
Collaborator Author

LLFourn commented Nov 23, 2021

Just resolved conflicts with master.

@rajarshimaitra
Copy link
Copy Markdown
Contributor

Hey @LLFourn , any update on the fee optionality? Can we make fees optional in transactional details and have the script-sync not complain when all of the inputs are not found? This will allow to remove --txindex if we try to port this into RPC blockchain.

@LLFourn
Copy link
Copy Markdown
Collaborator Author

LLFourn commented Nov 23, 2021

@rajarshimaitra @RCasatta it seems odd to do that in this PR since the absent TxOuts path is not being tested or used here?

Copy link
Copy Markdown
Contributor

@rajarshimaitra rajarshimaitra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Retested ACK a630685

@rajarshimaitra @RCasatta it seems odd to do that in this PR since the absent TxOuts path is not being tested or used here?

Yes I think that makes sense. Probably what we want is some way to allow unavailable TxOuts. The only difference would be that fees for those Tx would be absent. We will not miss any utxos because those inputs not found get_transaction() are non wallet utxos, so we don't need to keep track of them.

Currently having a missing input causes failure. I think that needs some further investigating and can be done on a future PR.

For now this is good to get in.

@RCasatta
Copy link
Copy Markdown
Member

RCasatta commented Nov 24, 2021

@rajarshimaitra @RCasatta it seems odd to do that in this PR since the absent TxOuts path is not being tested or used here?

I see from #466 that this logic doesn't fit perfectly with RPC sync, if this cannot be used there, then I do agree it's unnecessary to handle absent TxOuts here

utack a630685

@notmandatory
Copy link
Copy Markdown
Member

notmandatory commented Nov 24, 2021

@LLFourn I'm planning to merge this PR today, thanks for all the work that went into it! and for addressing @RCasatta and @rajarshimaitra 's review comments. But since this is such a big change I'm going to push it to the 0.15.0 release (0.14.0 is going out today or tomorrow) so it has some time in the master branch before we release it.

@notmandatory notmandatory merged commit b2ac4a0 into bitcoindevkit:master Nov 25, 2021
@LLFourn LLFourn deleted the sync_pipeline branch November 25, 2021 04:18
notmandatory added a commit to bitcoindevkit/rust-esplora-client that referenced this pull request Aug 2, 2022
bbe1183 Make stop_gap a parameter to EsploraBlockchainConfig::new (LLFourn)
0f0a01a742448fa9db67752722fed7d1c028117b s/vin/vout/ (LLFourn)
1a64fd9c9595566366cfe96cc20d8ba2f4da1bd3 Delete src/blockchain/utils.rs (LLFourn)
21ac1eb Fix comments (LLFourn)
d39401162fb72a9a02999908ac0347685d8861a5 Less intermediary data states in sync (LLFourn)
dfb63d389b2804cb55da60de0d8fcad985d50e5c s/observed_txs/finished_txs/g (LLFourn)
626542f Make variable names consistent (LLFourn)
5eadf5ccf9722aa331008304221e981d01bca1c1 Add some logging to script_sync (LLFourn)
aaad560a91872318890208c4b3d5cb73a63029a8 Always get up to chunk_size heights to request headers for (LLFourn)
b3a8cf4 Don't request conftime during tx request (LLFourn)
808d7d8463c56cbcda0bd55a5eba422cf5080ce7 Update changelog (LLFourn)
f1fd218 Fix feerate calculation for esplora (LLFourn)
3f25389 Invert dependencies in electrum sync (LLFourn)

Pull request description:

  ## Description

  This PR does dependency inversion on the previous sync logic for electrum and esplora captured in the trait `ElectrumLikeSync`. This means that the sync logic does not reference the blockchain at all. Instead the blockchain asks the sync logic (in `script_sync.rs`) what it needs to continue the sync and tries to retrieve it.

  The initial purpose of doing this is to remove invocations of `maybe_await` in the abstract sync logic in preparation for completely removing `maybe_await` in the future. The other major benefit is it gives a lot more freedom for the esplora logic to use the rich data from the responses to complete the sync with less HTTP requests than it did previously.

  ## List of changes

  - sync logic moved to `script_sync.rs` and `ElectrumLikeSync` is gone.
  - esplora makes one http request per sync address. This means it makes half the number of http requests for a fully synced wallet and N*M less requests for a wallet which has N new transactions with M unique input transactions.
  - electrum and esplora save less raw transactions in the database. Electrum still requests input transactions for each of its transactions to calculate the fee but it does not save them to the database anymore.
  - The ureq and reqwest blockchain configuration is now unified into the same struct. This is the only API change. `read_timeout` and `write_timeout` have been removed in favor of a single `timeout` option which is set in both ureq and reqwest.
  - ureq now does concurrent (parallel) requests using threads.
  - An previously unnoticed bug has been fixed where by sending a lot of double spending transactions to the same address you could trick a bdk Esplora wallet into thinking it had a lot of unconfirmed coins. This is because esplora doesn't delete double spent transactions from its indexes immediately (not sure if this is a bug or a feature). A blockchain test is added for this.
  - BONUS: The second commit in this PR fixes the feerate calculation for esplora and adds a test (the previous algorithm didn't work at all). I could have made a separate PR but since I was touching this file a lot I decided to fix it here.

  ## Notes to the reviewers

  - The most important thing to review is the the logic in `script_sync.rs` is sound.
  - Look at the two commits separately.
  - I think CI is failing because of MSRV problems again!
  - It would be cool to measure how much sync time is improved for your existing wallets/projects. For `gun` the speed improvements for modest but it is at least hammering the esplora server much less.
  - I noticed the performance of reqwest in blocking is much worse in this patch than previously. This is because somehow reqwest is not re-using the connection for each request in this new code. I have no idea why. The plan is to get rid of the blocking reqwest implementation in a follow up PR.

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits

ACKs for top commit:
  rajarshimaitra:
    Retested ACK bitcoindevkit/bdk@983ea29

Tree-SHA512: de74981e9d1f80758a9f20a3314ed7381c6b7c635f7ede80b177651fe2f9e9468064fae26bf80d4254098accfacfe50326ae0968e915186e13313f05bf77990b
chrono-raven6i90 added a commit to chrono-raven6i90/rust-esplora-client that referenced this pull request Oct 28, 2025
bbe1183b17cc92e3268ad29e4c9cabb8d1161f99 Make stop_gap a parameter to EsploraBlockchainConfig::new (LLFourn)
0f0a01a742448fa9db67752722fed7d1c028117b s/vin/vout/ (LLFourn)
1a64fd9c9595566366cfe96cc20d8ba2f4da1bd3 Delete src/blockchain/utils.rs (LLFourn)
21ac1ebd0de29287813ee425b463b2ab8b546267 Fix comments (LLFourn)
d39401162fb72a9a02999908ac0347685d8861a5 Less intermediary data states in sync (LLFourn)
dfb63d389b2804cb55da60de0d8fcad985d50e5c s/observed_txs/finished_txs/g (LLFourn)
626542fae624f247d71d7025e004de6da5702847 Make variable names consistent (LLFourn)
5eadf5ccf9722aa331008304221e981d01bca1c1 Add some logging to script_sync (LLFourn)
aaad560a91872318890208c4b3d5cb73a63029a8 Always get up to chunk_size heights to request headers for (LLFourn)
b3a8cf49624f0e419812c6b55de401a01562d763 Don't request conftime during tx request (LLFourn)
808d7d8463c56cbcda0bd55a5eba422cf5080ce7 Update changelog (LLFourn)
f1fd218493eaf89d982b3014244d9e039563cbb3 Fix feerate calculation for esplora (LLFourn)
3f25389eebd890f1a84197ecba1d8f24d3b269c5 Invert dependencies in electrum sync (LLFourn)

Pull request description:

  ## Description

  This PR does dependency inversion on the previous sync logic for electrum and esplora captured in the trait `ElectrumLikeSync`. This means that the sync logic does not reference the blockchain at all. Instead the blockchain asks the sync logic (in `script_sync.rs`) what it needs to continue the sync and tries to retrieve it.

  The initial purpose of doing this is to remove invocations of `maybe_await` in the abstract sync logic in preparation for completely removing `maybe_await` in the future. The other major benefit is it gives a lot more freedom for the esplora logic to use the rich data from the responses to complete the sync with less HTTP requests than it did previously.

  ## List of changes

  - sync logic moved to `script_sync.rs` and `ElectrumLikeSync` is gone.
  - esplora makes one http request per sync address. This means it makes half the number of http requests for a fully synced wallet and N*M less requests for a wallet which has N new transactions with M unique input transactions.
  - electrum and esplora save less raw transactions in the database. Electrum still requests input transactions for each of its transactions to calculate the fee but it does not save them to the database anymore.
  - The ureq and reqwest blockchain configuration is now unified into the same struct. This is the only API change. `read_timeout` and `write_timeout` have been removed in favor of a single `timeout` option which is set in both ureq and reqwest.
  - ureq now does concurrent (parallel) requests using threads.
  - An previously unnoticed bug has been fixed where by sending a lot of double spending transactions to the same address you could trick a bdk Esplora wallet into thinking it had a lot of unconfirmed coins. This is because esplora doesn't delete double spent transactions from its indexes immediately (not sure if this is a bug or a feature). A blockchain test is added for this.
  - BONUS: The second commit in this PR fixes the feerate calculation for esplora and adds a test (the previous algorithm didn't work at all). I could have made a separate PR but since I was touching this file a lot I decided to fix it here.

  ## Notes to the reviewers

  - The most important thing to review is the the logic in `script_sync.rs` is sound.
  - Look at the two commits separately.
  - I think CI is failing because of MSRV problems again!
  - It would be cool to measure how much sync time is improved for your existing wallets/projects. For `gun` the speed improvements for modest but it is at least hammering the esplora server much less.
  - I noticed the performance of reqwest in blocking is much worse in this patch than previously. This is because somehow reqwest is not re-using the connection for each request in this new code. I have no idea why. The plan is to get rid of the blocking reqwest implementation in a follow up PR.

  ### Checklists

  #### All Submissions:

  * [x] I've signed all my commits

ACKs for top commit:
  rajarshimaitra:
    Retested ACK bitcoindevkit/bdk@983ea29

Tree-SHA512: de74981e9d1f80758a9f20a3314ed7381c6b7c635f7ede80b177651fe2f9e9468064fae26bf80d4254098accfacfe50326ae0968e915186e13313f05bf77990b
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants