Restructure electrum/esplora sync logic#461
Restructure electrum/esplora sync logic#461notmandatory merged 14 commits intobitcoindevkit:masterfrom
Conversation
7a488e5 to
0ebf789
Compare
RCasatta
left a comment
There was a problem hiding this comment.
Looks great to me!
I did some tests with the test wallet I used in #166 and performance are much better in esplora;
- before first:16s following:10s
- now: 6s
electrum performance is similar to master, just slightly less now I think because of the HashSet containing less than chunk_size element
| "{}/scripthash/{}/txs/chain/{}", | ||
| self.url, script_hash, last_seen | ||
| ), | ||
| None => format!("{}/scripthash/{}/txs", self.url, script_hash), |
There was a problem hiding this comment.
This esplora endpoint is very informative, the downside is that we always ask for full information even we may already have it. I wonder if we can use the endpoint without /txs to have a faster response from the server and request the data only if we recognize it has changed. The first sync will be slower, but the following should be faster and use less bandwith
There was a problem hiding this comment.
Interesting point. I don't think we have a better endpoint for finding txs assoiciated with an address? I think there's certainly a lot of room for avoiding this when not doing a full sync. So for the approach I'd say we introduce a partial sync. For the case of esplora there a bunch of other things you can do like asking about outspend to see if something has been spent or using /scripthash/:hash/utxo to see if an address has new utxos.
Another idea is you could even have a mode where a wallet doesn't keep track of all of its tx history but only keeps track of current utxos.
There was a problem hiding this comment.
I don't think we have a better endpoint for finding txs associated with an address?
I was imagining GET /scripthash/:hash could give hints if a call to GET /scripthash/:hash/txs is needed and be faster cause it doesn't contain txs information.
However, I think this and the different sync types you are suggesting can be investigated in other PRs since this is already a big improvement in comparison to what we have
cd72748 to
52a360f
Compare
52a360f to
bfd96a5
Compare
rajarshimaitra
left a comment
There was a problem hiding this comment.
Concept + tACK bfd96a5
Overall agreed with the idea of simplifying syncing interface. The current structure of sync logic seems very methodical and easy to extend into other blockchain type. Ran the blockchain tests manually, to see everything is working ok.
Couldn't do sync bench-marking with current master. Will do and report back.
Below are some conceptual questions and nits.
Thanks for working on this, it is a lot of work, and I learned a ton reviewing this.
| debug!("found tx_details for {}", txid); | ||
| assert_eq!(tx.txid(), *txid); |
There was a problem hiding this comment.
Maybe we should assert before the log? Also is there any generic guaranties that requirement and satisfaction lists will always line up? Maybe we should specifically handle that somewhere?
There was a problem hiding this comment.
Maybe we should assert before the log?
I don't think we need to micromanage the placement of this.
Also is there any generic guaranties that
requirementandsatisfactionlists will always line up? Maybe we should specifically handle that somewhere?
I'm not sure there is anything you can do about this. It's a programmer error if they don't line up (hence the assert).
There was a problem hiding this comment.
Hmm that makes sense. It depends on the client's of sync logic. Also was wondering is there any specific reason for putting the assert only in TxReq? It isn't there for other reqs.
There was a problem hiding this comment.
If you can see another spot where we can do an assert to check consistency let's do it. It's possible I've missed one.
bc48723 to
77266c4
Compare
This updates the rpc backend sync logic to use the script_sync pattern added in bitcoindevkit#461. A new stop_gap parameter is added to the rpc blockchain config, which determines termination of sync loop.
This updates the rpc backend sync logic to use the script_sync pattern added in bitcoindevkit#461. A new stop_gap parameter is added to the rpc blockchain config, which determines termination of sync loop.
Blockchain calls sync logic rather than the other way around. Sync logic is captured in script_sync.rs.
Use BTrees to store ordered sets rather than HashSets -> VecDequeue
77266c4 to
9c57708
Compare
I temporary remove approval before discussing for fee optionality
e7ca55e to
2ba6175
Compare
2ba6175 to
a630685
Compare
|
Just resolved conflicts with master. |
|
Hey @LLFourn , any update on the fee optionality? Can we make fees optional in transactional details and have the script-sync not complain when all of the inputs are not found? This will allow to remove |
|
@rajarshimaitra @RCasatta it seems odd to do that in this PR since the absent |
rajarshimaitra
left a comment
There was a problem hiding this comment.
Retested ACK a630685
@rajarshimaitra @RCasatta it seems odd to do that in this PR since the absent TxOuts path is not being tested or used here?
Yes I think that makes sense. Probably what we want is some way to allow unavailable TxOuts. The only difference would be that fees for those Tx would be absent. We will not miss any utxos because those inputs not found get_transaction() are non wallet utxos, so we don't need to keep track of them.
Currently having a missing input causes failure. I think that needs some further investigating and can be done on a future PR.
For now this is good to get in.
I see from #466 that this logic doesn't fit perfectly with RPC sync, if this cannot be used there, then I do agree it's unnecessary to handle absent TxOuts here utack a630685 |
|
@LLFourn I'm planning to merge this PR today, thanks for all the work that went into it! and for addressing @RCasatta and @rajarshimaitra 's review comments. But since this is such a big change I'm going to push it to the 0.15.0 release (0.14.0 is going out today or tomorrow) so it has some time in the |
bbe1183 Make stop_gap a parameter to EsploraBlockchainConfig::new (LLFourn) 0f0a01a742448fa9db67752722fed7d1c028117b s/vin/vout/ (LLFourn) 1a64fd9c9595566366cfe96cc20d8ba2f4da1bd3 Delete src/blockchain/utils.rs (LLFourn) 21ac1eb Fix comments (LLFourn) d39401162fb72a9a02999908ac0347685d8861a5 Less intermediary data states in sync (LLFourn) dfb63d389b2804cb55da60de0d8fcad985d50e5c s/observed_txs/finished_txs/g (LLFourn) 626542f Make variable names consistent (LLFourn) 5eadf5ccf9722aa331008304221e981d01bca1c1 Add some logging to script_sync (LLFourn) aaad560a91872318890208c4b3d5cb73a63029a8 Always get up to chunk_size heights to request headers for (LLFourn) b3a8cf4 Don't request conftime during tx request (LLFourn) 808d7d8463c56cbcda0bd55a5eba422cf5080ce7 Update changelog (LLFourn) f1fd218 Fix feerate calculation for esplora (LLFourn) 3f25389 Invert dependencies in electrum sync (LLFourn) Pull request description: ## Description This PR does dependency inversion on the previous sync logic for electrum and esplora captured in the trait `ElectrumLikeSync`. This means that the sync logic does not reference the blockchain at all. Instead the blockchain asks the sync logic (in `script_sync.rs`) what it needs to continue the sync and tries to retrieve it. The initial purpose of doing this is to remove invocations of `maybe_await` in the abstract sync logic in preparation for completely removing `maybe_await` in the future. The other major benefit is it gives a lot more freedom for the esplora logic to use the rich data from the responses to complete the sync with less HTTP requests than it did previously. ## List of changes - sync logic moved to `script_sync.rs` and `ElectrumLikeSync` is gone. - esplora makes one http request per sync address. This means it makes half the number of http requests for a fully synced wallet and N*M less requests for a wallet which has N new transactions with M unique input transactions. - electrum and esplora save less raw transactions in the database. Electrum still requests input transactions for each of its transactions to calculate the fee but it does not save them to the database anymore. - The ureq and reqwest blockchain configuration is now unified into the same struct. This is the only API change. `read_timeout` and `write_timeout` have been removed in favor of a single `timeout` option which is set in both ureq and reqwest. - ureq now does concurrent (parallel) requests using threads. - An previously unnoticed bug has been fixed where by sending a lot of double spending transactions to the same address you could trick a bdk Esplora wallet into thinking it had a lot of unconfirmed coins. This is because esplora doesn't delete double spent transactions from its indexes immediately (not sure if this is a bug or a feature). A blockchain test is added for this. - BONUS: The second commit in this PR fixes the feerate calculation for esplora and adds a test (the previous algorithm didn't work at all). I could have made a separate PR but since I was touching this file a lot I decided to fix it here. ## Notes to the reviewers - The most important thing to review is the the logic in `script_sync.rs` is sound. - Look at the two commits separately. - I think CI is failing because of MSRV problems again! - It would be cool to measure how much sync time is improved for your existing wallets/projects. For `gun` the speed improvements for modest but it is at least hammering the esplora server much less. - I noticed the performance of reqwest in blocking is much worse in this patch than previously. This is because somehow reqwest is not re-using the connection for each request in this new code. I have no idea why. The plan is to get rid of the blocking reqwest implementation in a follow up PR. ### Checklists #### All Submissions: * [x] I've signed all my commits ACKs for top commit: rajarshimaitra: Retested ACK bitcoindevkit/bdk@983ea29 Tree-SHA512: de74981e9d1f80758a9f20a3314ed7381c6b7c635f7ede80b177651fe2f9e9468064fae26bf80d4254098accfacfe50326ae0968e915186e13313f05bf77990b
bbe1183b17cc92e3268ad29e4c9cabb8d1161f99 Make stop_gap a parameter to EsploraBlockchainConfig::new (LLFourn)
0f0a01a742448fa9db67752722fed7d1c028117b s/vin/vout/ (LLFourn)
1a64fd9c9595566366cfe96cc20d8ba2f4da1bd3 Delete src/blockchain/utils.rs (LLFourn)
21ac1ebd0de29287813ee425b463b2ab8b546267 Fix comments (LLFourn)
d39401162fb72a9a02999908ac0347685d8861a5 Less intermediary data states in sync (LLFourn)
dfb63d389b2804cb55da60de0d8fcad985d50e5c s/observed_txs/finished_txs/g (LLFourn)
626542fae624f247d71d7025e004de6da5702847 Make variable names consistent (LLFourn)
5eadf5ccf9722aa331008304221e981d01bca1c1 Add some logging to script_sync (LLFourn)
aaad560a91872318890208c4b3d5cb73a63029a8 Always get up to chunk_size heights to request headers for (LLFourn)
b3a8cf49624f0e419812c6b55de401a01562d763 Don't request conftime during tx request (LLFourn)
808d7d8463c56cbcda0bd55a5eba422cf5080ce7 Update changelog (LLFourn)
f1fd218493eaf89d982b3014244d9e039563cbb3 Fix feerate calculation for esplora (LLFourn)
3f25389eebd890f1a84197ecba1d8f24d3b269c5 Invert dependencies in electrum sync (LLFourn)
Pull request description:
## Description
This PR does dependency inversion on the previous sync logic for electrum and esplora captured in the trait `ElectrumLikeSync`. This means that the sync logic does not reference the blockchain at all. Instead the blockchain asks the sync logic (in `script_sync.rs`) what it needs to continue the sync and tries to retrieve it.
The initial purpose of doing this is to remove invocations of `maybe_await` in the abstract sync logic in preparation for completely removing `maybe_await` in the future. The other major benefit is it gives a lot more freedom for the esplora logic to use the rich data from the responses to complete the sync with less HTTP requests than it did previously.
## List of changes
- sync logic moved to `script_sync.rs` and `ElectrumLikeSync` is gone.
- esplora makes one http request per sync address. This means it makes half the number of http requests for a fully synced wallet and N*M less requests for a wallet which has N new transactions with M unique input transactions.
- electrum and esplora save less raw transactions in the database. Electrum still requests input transactions for each of its transactions to calculate the fee but it does not save them to the database anymore.
- The ureq and reqwest blockchain configuration is now unified into the same struct. This is the only API change. `read_timeout` and `write_timeout` have been removed in favor of a single `timeout` option which is set in both ureq and reqwest.
- ureq now does concurrent (parallel) requests using threads.
- An previously unnoticed bug has been fixed where by sending a lot of double spending transactions to the same address you could trick a bdk Esplora wallet into thinking it had a lot of unconfirmed coins. This is because esplora doesn't delete double spent transactions from its indexes immediately (not sure if this is a bug or a feature). A blockchain test is added for this.
- BONUS: The second commit in this PR fixes the feerate calculation for esplora and adds a test (the previous algorithm didn't work at all). I could have made a separate PR but since I was touching this file a lot I decided to fix it here.
## Notes to the reviewers
- The most important thing to review is the the logic in `script_sync.rs` is sound.
- Look at the two commits separately.
- I think CI is failing because of MSRV problems again!
- It would be cool to measure how much sync time is improved for your existing wallets/projects. For `gun` the speed improvements for modest but it is at least hammering the esplora server much less.
- I noticed the performance of reqwest in blocking is much worse in this patch than previously. This is because somehow reqwest is not re-using the connection for each request in this new code. I have no idea why. The plan is to get rid of the blocking reqwest implementation in a follow up PR.
### Checklists
#### All Submissions:
* [x] I've signed all my commits
ACKs for top commit:
rajarshimaitra:
Retested ACK bitcoindevkit/bdk@983ea29
Tree-SHA512: de74981e9d1f80758a9f20a3314ed7381c6b7c635f7ede80b177651fe2f9e9468064fae26bf80d4254098accfacfe50326ae0968e915186e13313f05bf77990b
Description
This PR does dependency inversion on the previous sync logic for electrum and esplora captured in the trait
ElectrumLikeSync. This means that the sync logic does not reference the blockchain at all. Instead the blockchain asks the sync logic (inscript_sync.rs) what it needs to continue the sync and tries to retrieve it.The initial purpose of doing this is to remove invocations of
maybe_awaitin the abstract sync logic in preparation for completely removingmaybe_awaitin the future. The other major benefit is it gives a lot more freedom for the esplora logic to use the rich data from the responses to complete the sync with less HTTP requests than it did previously.List of changes
script_sync.rsandElectrumLikeSyncis gone.read_timeoutandwrite_timeouthave been removed in favor of a singletimeoutoption which is set in both ureq and reqwest.Notes to the reviewers
script_sync.rsis sound.gunthe speed improvements for modest but it is at least hammering the esplora server much less.Checklists
All Submissions: