Skip to content

Use the new script_sync logic for RPC backend#466

Closed
rajarshimaitra wants to merge 19 commits intobitcoindevkit:masterfrom
rajarshimaitra:rpc-script-sync
Closed

Use the new script_sync logic for RPC backend#466
rajarshimaitra wants to merge 19 commits intobitcoindevkit:masterfrom
rajarshimaitra:rpc-script-sync

Conversation

@rajarshimaitra
Copy link
Copy Markdown
Contributor

@rajarshimaitra rajarshimaitra commented Nov 8, 2021

Description

Recently in #461 a new syncing framework was introduced, for esplora and electrum backends. Among many other optimization improvements, script_sync can also provide easy extension into other kind of backend. All the database managements are handled by script_sync, and the blockchain backend just have to to satisfy sync requests with appropriate satisfiers.

In this PR I attempted to extend the same framework for RPC backend. Which I hope will allow us to optimize the sync logic in RPC even further.

Notes to the reviewers

This is a WIP PR as there are 2 blockchain test failures which I am still investigating. Opening this up for early reviews and comments. I am sure there are still many room for improvements. Also any eyes on debugging the test failures would be very helpful.

Test error log

running 19 tests
test blockchain::rpc::bdk_blockchain_tests::test_sync_before_and_after_receive ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_multiple_outputs_same_tx ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_basic ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_receive_multi ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_receive_rbf_replaced ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_after_send ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_receive_coinbase ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_address_reuse ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_reorg_block ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_remove_change ... FAILED
test blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_add_input_simple ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_add_input_no_change ... FAILED
test blockchain::rpc::bdk_blockchain_tests::test_sync_double_receive ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_outgoing_from_scratch ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_long_change_chain ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_simple ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_stop_gap_20 ... ok
test blockchain::rpc::bdk_blockchain_tests::test_update_confirmation_time_after_generate ... ok
test blockchain::rpc::bdk_blockchain_tests::test_sync_many_sends_to_a_single_address ... ok

failures:

---- blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_remove_change stdout ----
thread 'blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_remove_change' panicked at 'assertion failed: `(left == right)`
  left: `50000`,
 right: `0`: incorrect balance after change removal', src/blockchain/rpc.rs:495:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_add_input_no_change stdout ----
TransactionDetails {
    transaction: None,
    txid: 910b20a56e49c75a9b4d934f3256095d33c695463524a6a4f1ffefa4c76e0483,
    received: 0,
    sent: 75000,
    fee: Some(
        26000,
    ),
    confirmation_time: None,
    verified: true,
}
thread 'blockchain::rpc::bdk_blockchain_tests::test_sync_bump_fee_add_input_no_change' panicked at 'assertion failed: `(left == right)`
  left: `75000`,
 right: `0`: incorrect balance after add input', src/blockchain/rpc.rs:495:1

This PR build on #461, so only the last 2 commits needs to be reviewed.

A new optional stop_gap field, is added in RpcBlockchain, which determines the sync loop termination. The default value is 20.

in blockchain_tests::TestClient, the bitcoind instance is created with txindex=1, which is required to call get_raw_transactions() in core RPC. I would like to do it without txindex as before. But not all previous outputs are found by get_transaction() and some tests were failing due to this.

Checklists

All Submissions:

  • I've signed all my commits
  • I followed the contribution guidelines
  • I ran cargo fmt and cargo clippy before committing

This updates the rpc backend sync logic to use the script_sync pattern
added in bitcoindevkit#461.

A new stop_gap parameter is added to the rpc blockchain config, which
determines termination of sync loop.
BitcoinD is used with txindex flag for blockchain tests to support
get_raw_transaction rpc calls.
@RCasatta
Copy link
Copy Markdown
Member

RCasatta commented Nov 9, 2021

in blockchain_tests::TestClient, the bitcoind instance is created with txindex=1, which is required to call get_raw_transactions() in core RPC. I would like to do it without txindex as before. But not all previous outputs are found by get_transaction() and some tests were failing due to this.

I think this is an issue, at the moment rpc client doesn't need the bitcoin node to have the txindex enabled, this means it may happen receiving tx have unknown fee because not all the prevouts are known, but I think it's an acceptable tradeoff.

@LLFourn Is it possible to "partially satisfy" Request::Tx by skipping unknown prevouts or there are other implications (apart from obviously not knowing the tx fee)

Comment thread src/blockchain/rpc.rs
Comment thread src/blockchain/rpc.rs Outdated
.iter()
.filter(|tx_result| {
// Filter out txs related to the script_pubkey
if let Some(address) = &tx_result.detail.address {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can rely on this to filter txs, what happens for example if we have a tx with 2 owned outputs ?
I am not sure what's the best way to handle this since at this stage we haven't the raw tx, but I think we should operate on raw tx to check the presence of script_pubkey

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case it will just match for both the script_pubkeys. listtransactions() will give an entry corresponding to each owned addresses. So the satisfier will have the same txid for both the scripts. In fact this is happening already in some of the tests, and I guess the underlying db update logic already handles that.

Saying that I do think the current filter logic is inadequate, and because of that the tests are failing. see #466 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

listtransactions() will give an entry corresponding to each owned addresses.

I didn't know about this, should be ok then

Comment thread src/blockchain/rpc.rs
related_txs
})
.collect::<Vec<_>>();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I launched cargo test --features test-rpc -- test_sync_bump_fee_remove_change -- --nocapture
with a dbg!(&satisfier) here and noticed the bumped tx is not filtered out (causing the test to fail)

Copy link
Copy Markdown
Contributor Author

@rajarshimaitra rajarshimaitra Nov 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The reason its happening is, in case of a tx without change address, the listtransactions() will only have an entry with the recipient's address. Which is an out of wallet address for those two test failure cases, and thus not fetched by the current simple filter.

I think this has to be fixed first anyway. My current plan is to get the difference between listtransactions() and related_txs sets, and apply prev_out fetch on the ones in the set difference. Match with the address in their prev_outs, and add them to appropriate list in the satisfier.

Hopefully that can be done with get_transaction(), because the sought after inputs should always be a "owned" utxo, so its tx should be found. Also this way it will reduce too many get_transaction() call, we only fetch for tx with "owned inputs + not owned outputs". "owned output" txs will have already been filtered in the first address match try.

I hope this will work and at least will fix the test failures. Also looking for more thoughts or suggestions..

Copy link
Copy Markdown
Collaborator

@LLFourn LLFourn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RCasatta the TxOuts are used to calculate the fee and also how much the wallet was sent by checking if we own the script_pubkey. I'm not sure what the impact of this will be or how accurate fee and sent were before on the rpc backend.

@rajarshimaitra I can't concept ACK this since I'm not really experienced with the existing rpc sync logic. Can you elaborate what was the disadvantages of the previous approach and how this is improving it? In my mind bitcoind is keeping track of what transactions are associated with the scripts pubkeys owned by the wallet. When you filter through all the transactions again and again looking for things associated with the script it is duplicating the effort that bitcoind has already done.

It's possible you could do something similar to script_sync but without the ScriptReq part. Instead you start wtih DescriptorReq which is satisfied by getting all the txids associated with that descriptor and then it moves onto the TxReq stage. That seems like it would be more natural and would remove the strange stop_gap where there shouldn't be one (if I understand correctly).

Comment thread src/blockchain/rpc.rs
.map(|txin| {
// check if the prev_out is in db
if let Some(txout) =
db.get_previous_output(&txin.previous_output)?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think blockchain logic should not do this. Rather when you pass a None to Option<Vec<TxOut>> when you satisfy the request the TxReq can look it up in the database to see if it can salvage it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's pretty neat. But what if the prev_out is not in the db? Will that cause any problem? This filter is added here because to satisfy the TxReq, I need to know weather I have to fetch for the prev_out, which is costly, so I don't wanna do that for all the inputs. Thus the filter.

As @RCasatta mentioned here #466 (comment), if it's possible to partially satisfy the TxReq without all the prev_outs, then the txindex requirement problem will also get solved.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right my thinking was just wrong.

@rajarshimaitra
Copy link
Copy Markdown
Contributor Author

rajarshimaitra commented Nov 10, 2021

Thanks @LLFourn for the look.

Can you elaborate what was the disadvantages of the previous approach and how this is improving it?

Currently the sync logic is fetching the tx list (listunspents(), listtransactions()) from core, and updating the database with all relevant info one by one. So yes, the blockcdata for the wallet is maintained by the core, and the sync logic only fetches them and adds them to the db appropriately.

I did this experiment just to see weather a generic db update logic like script_sync makes sense for any other blockchain. Theoretically that should work and in my mind I saw few benefits of doing that.

  • Having a unified db update framework will make creating own custom blockchain backend for lib users easier. The db details are abstracted away, and all they need to do is satisfy the requests. This decouples the blockchain interface details with db updates, and reduces complexity from what I feel is the most complex part of the bdk lib, where flexibility is also allowed by design.

  • Currently all the blockchian sync logic are independent and they were written in their own ways. This makes learning about the sync process little harder. Having a uniform structure, can make groking it easier for new devs. And we want that, because we want them to use their own blockchains, not just our defaults.

  • By decoupling the db logic with sync logic we can better understand the optimization opportunities, if exists. We just have to minimize the fetch calls and cache as much as possible in the loop. Because the loops are structured, to me it feels much easier to reason about what needs to be cached and when.

But its entirely possible that trying to do the syncs in a self similar structure is actually less optimized. And I tried this out just to find that. 😄

It's possible you could do something similar to script_sync but without the ScriptReq part. Instead you start wtih DescriptorReq which is satisfied by getting all the txids associated with that descriptor and then it moves onto the TxReq stage.

That's interesting. Though I don't want to build another custom sync logic just for RPC because that will break all those "unifying" points above.

That seems like it would be more natural and would remove the strange stop_gap where there shouldn't be one (if I understand correctly).

I probably didn't get this one. As far as I understand the stop_gap is required by script_sync to know when to stop producing script requests. So somehow that needs to be communicated. This wasn't required before because RPC wasn't doing a script based sync. It was scanning the whole list of listtransactions() and listunspents() and stopped when the list ended.

@RCasatta
Copy link
Copy Markdown
Member

RCasatta commented Nov 10, 2021

I'm not sure what the impact of this will be or how accurate fee and sent were before on the rpc backend.

As you can see from the optionality of the field fee in TransactionDetails and the relative comment, the fee could be missing with the RPC backend (as it can be missing in a bitcoin full node without txindex)

(Now that I read the comment on the fee field, I think the "offline" part is wrong. An always online bitcoin full node without txindex doesn't know the fee of a received wallet transaction because it doesn't know its previous outputs)

@LLFourn
Copy link
Copy Markdown
Collaborator

LLFourn commented Nov 11, 2021

Thanks for the detailed explanation @rajarshimaitra.

That's interesting. Though I don't want to build another custom sync logic just for RPC because that will break all those "unifying" points above.

I don't think it'd have to be that bad. Keeping the decoupled structure while re-using most of the logic would be an improvement and achieve the properties you were interested in.

That seems like it would be more natural and would remove the strange stop_gap where there shouldn't be one (if I understand correctly).

I probably didn't get this one. As far as I understand the stop_gap is required by script_sync to know when to stop producing script requests. So somehow that needs to be communicated. This wasn't required before because RPC wasn't doing a script based sync. It was scanning the whole list of listtransactions() and listunspents() and stopped when the list ended.

Right this is my point. The fact that unnecessary parameters are showing up indicates to me that this solution doesn't quite fit. I think you can come up with a more precise solution by re-using most of the logic but removing the ScriptReq part of it.

the stop_gap parameter doesn't have a any logical existence in the RPC
sync logic and it's just a config variable for `start_sync()`. Removed
from RPC blockchain config.
Script request handling have been refactored from primarily looping
over the requested scripts, to primarily loop over the
listtransactions() result. By doing so we can better handle when to make
the costly input fetching, if the transaction doesn't have an owned
output.

Additional cache is added to store the fetched inputs, which are then
used in the Tx request handling, reducing rpc get_transaction() calls.

This fixes the current test failure of RBF replacement without change.
@rajarshimaitra
Copy link
Copy Markdown
Contributor Author

rajarshimaitra commented Nov 11, 2021

Made few refactoring changes. Instead of trying to create something novel, I tried one last time to work with existing script requests.

  • Refactored the script request handling to properly handle txs without an owned output. In that case we will now also fetch the inputs and look for matches there. This fixes the test failures.
  • Added extra cache to make the best out of the above input queries. The stored inputs are used in Tx request handling to reduce rpc calls. Though it might not be significant, because I expect occurrence of txs with unowned outputs will not be that frequent for normal scenarios.
  • Removed the stop_gap parameter, as it doesn't have any logical significance in rpc sync logic. Used the default 20 value as a script_sync config variable.
  • I tried removing get_raw_transaction() call, and use None value for non-wallet inputs. But that currently fails the script_sync() logic. cc @RCasatta @LLFourn. So kept it that way for now.

I think this has at least reached to the point I had in my goal at the start. We can use the existing script_sync for RPC. Weather its an improvement or not is the real question.

So I think I will drag it out of draft now. There might be more potential improvements, but this is now ready for reviews.

@rajarshimaitra rajarshimaitra marked this pull request as ready for review November 11, 2021 17:09
@rajarshimaitra rajarshimaitra changed the title [WIP] Use the new script_sync logic for RPC backend Use the new script_sync logic for RPC backend Nov 14, 2021
@rajarshimaitra
Copy link
Copy Markdown
Contributor Author

rajarshimaitra commented Nov 14, 2021

One good news. With bitcoin/bitcoin#23319, input fetching can be done a lot faster..

@notmandatory
Copy link
Copy Markdown
Member

Now that #501 is in you'll need to rebase on master to fix the CI issue.

@rajarshimaitra
Copy link
Copy Markdown
Contributor Author

rajarshimaitra commented Dec 18, 2021

Documenting few Bitcoin core wallet PRs that we will benefit from in RPC sync:

@LLFourn LLFourn self-assigned this Jan 27, 2022
@rajarshimaitra rajarshimaitra marked this pull request as draft February 9, 2022 13:29
@evanlinjin
Copy link
Copy Markdown
Member

Damn I wish I saw this PR before working on the RpcBlockchain bug fixes/reimplantation

@danielabrozzoni
Copy link
Copy Markdown
Member

Hey, we are in the process of releasing BDK 1.0, which will under the hood work quite differently from the current BDK. For this reason, I'm closing all the PRs that don't really apply anymore. If you think this is a mistake, feel free to rebase on master and re-open!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants