Skip to content

fix(spv): apply InstantSend locks to self-broadcast transactions#815

Merged
lklimek merged 8 commits into
v1.0-devfrom
fix/spv-islock-self-broadcast
Apr 8, 2026
Merged

fix(spv): apply InstantSend locks to self-broadcast transactions#815
lklimek merged 8 commits into
v1.0-devfrom
fix/spv-islock-self-broadcast

Conversation

@lklimek
Copy link
Copy Markdown
Contributor

@lklimek lklimek commented Apr 1, 2026

Summary

Imagine you send Dash from within the app. The transaction broadcasts successfully and your balance shows the outgoing amount as "unconfirmed". On the Dash network, InstantSend locks this transaction within seconds — but the app never notices. Your funds stay stuck as "unconfirmed" indefinitely, and the receiving wallet can't spend them until a block is mined (~2.5 minutes).

Now IS locks are applied immediately, making funds spendable within seconds of broadcast.

Root cause

Self-broadcast transactions follow a different path than transactions received from peers:

Peer transactions:
  MempoolManager::handle_tx() → wallet sees tx
  MempoolManager::mark_instant_send() → wallet marks UTXOs as IS-locked ✓

Self-broadcast transactions:
  notify_wallet_after_broadcast() → wallet sees tx (bypasses MempoolManager)
  IS lock arrives → MempoolManager::mark_instant_send() → tx NOT in mempool
    → stored as "pending IS lock" → never matched ✗
  → balance.spendable() stays 0

The notify_wallet_after_broadcast() workaround (for upstream rust-dashcore#487) feeds the tx directly to WalletManager::process_mempool_transaction(), bypassing the MempoolManager. When the IS lock arrives, the MempoolManager doesn't know about the tx, so the lock is stored as "pending" and never applied.

Evidence from trace logs: broadcast txids a867b813, 4363a2d0, 55344098, f03aec1a were notified to the wallet but never appeared as "Marked mempool tx" — their IS locks were silently dropped.

Fix

Add a wallet reference to SpvEventHandler and apply IS locks directly on the WalletManager when InstantLockReceived fires:

if let SyncEvent::InstantLockReceived { instant_lock, .. } = event {
    let txid = instant_lock.txid;
    let wallet = Arc::clone(&self.wallet);
    tokio::spawn(async move {
        let mut wm = wallet.write().await;
        wm.process_instant_send_lock(txid);
    });
}

For MempoolManager-tracked txs this is a harmless no-op — the WalletManager deduplicates via its instant_send_locks HashSet.

Test plan

  • cargo clippy --all-features --all-targets -- -D warnings passes
  • Backend E2E: wait_for_spendable_balance succeeds without waiting for block confirmation
  • Broadcast tx → IS lock received → spendable() reflects locked amount within seconds
  • Normal peer-received txs still work (no double-counting from dedup)

🤖 Co-authored by Claudius the Magnificent AI Agent

Self-broadcast transactions bypass the MempoolManager (fed directly
to WalletManager via notify_wallet_after_broadcast). When the IS lock
arrives from the network, the MempoolManager doesn't know about the
tx, so it stores the lock as "pending" — never matched, never applied.
Result: the tx stays unconfirmed and balance.spendable() returns 0.

Fix: in SpvEventHandler::on_sync_event(InstantLockReceived), apply
the IS lock directly on the WalletManager via process_instant_send_lock().
For MempoolManager-tracked txs this is a harmless no-op — the
WalletManager deduplicates via its instant_send_locks HashSet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 1, 2026

Warning

Rate limit exceeded

@lklimek has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 3 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 6 minutes and 3 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31047dd8-1e95-4df2-9a3c-86c13af63f6b

📥 Commits

Reviewing files that changed from the base of the PR and between 4cea4fe and 4179528.

📒 Files selected for processing (5)
  • src/backend_task/core/mod.rs
  • src/spv/manager.rs
  • tests/backend-e2e/framework/cleanup.rs
  • tests/backend-e2e/framework/harness.rs
  • tests/backend-e2e/tx_is_ours.rs
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/spv-islock-self-broadcast

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

lklimek and others added 2 commits April 1, 2026 17:36
Both notify_wallet_after_broadcast and the EventHandler IS lock
workaround exist because upstream broadcast doesn't call handle_tx
on the MempoolManager. Added TODO linking them so they can be
removed together when the upstream fix lands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a gap in SPV balance finality handling where InstantSend (IS) locks weren’t being applied to self-broadcast transactions (which bypass the MempoolManager via notify_wallet_after_broadcast()), causing outgoing funds to remain “unconfirmed” until mined.

Changes:

  • Add a WalletManager reference to SpvEventHandler.
  • On SyncEvent::InstantLockReceived, apply the IS lock directly to the WalletManager via an async task.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/spv/manager.rs
Comment on lines +286 to +290
let wallet = Arc::clone(&self.wallet);
tokio::spawn(async move {
let mut wm = wallet.write().await;
wm.process_instant_send_lock(txid);
});
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The InstantLockReceived handler applies the lock in a detached task, but the reconcile signal for this event is sent immediately afterward (outside the spawned task). If the wallet write lock is contended, reconcile_spv_wallets() can run before process_instant_send_lock() executes, leaving balances unchanged until some later unrelated reconcile trigger. Consider sending an additional reconcile signal after the lock is applied (e.g., clone reconcile_tx into the task and try_send once process_instant_send_lock returns), or otherwise ensure reconcile cannot run before the lock application completes.

Suggested change
let wallet = Arc::clone(&self.wallet);
tokio::spawn(async move {
let mut wm = wallet.write().await;
wm.process_instant_send_lock(txid);
});
let mut wm = self.wallet.write().await;
wm.process_instant_send_lock(txid);

Copilot uses AI. Check for mistakes.
Comment thread src/spv/manager.rs
Comment on lines +266 to +287
// TODO(workaround): Remove once dashpay/rust-dashcore#487 is fixed.
//
// Apply InstantSend locks directly on the WalletManager.
//
// Self-broadcast transactions bypass the MempoolManager (they are fed
// directly to WalletManager via notify_wallet_after_broadcast — see
// the other workaround in spawn_request_handler). When the IS lock
// arrives from the network, the MempoolManager doesn't know about
// the tx and stores it as a "pending IS lock" that is never matched.
// Applying the lock here ensures self-broadcast txs transition from
// unconfirmed to spendable.
//
// Once upstream broadcast calls handle_tx() on the MempoolManager,
// both workarounds (notify_wallet_after_broadcast and this) can be
// removed — the normal MempoolManager pipeline will handle everything.
//
// For MempoolManager-tracked txs this is a harmless no-op — the
// WalletManager deduplicates via its instant_send_locks HashSet.
if let SyncEvent::InstantLockReceived { instant_lock, .. } = event {
let txid = instant_lock.txid;
let wallet = Arc::clone(&self.wallet);
tokio::spawn(async move {
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds important new behavior (applying InstantSend locks to self-broadcast transactions) but there’s no regression test coverage in src/spv/tests.rs for InstantLockReceived or for the notify_wallet_after_broadcast path. Adding a focused test that simulates a broadcast-notified tx and then an InstantLockReceived event (asserting spendable/unconfirmed transitions) would help prevent this from silently regressing.

Copilot uses AI. Check for mistakes.
With 14+ orphaned wallets from previous runs, the 10s per-wallet
spendable balance wait added 2+ minutes to test startup. Most
orphaned wallets have 0 spendable balance anyway (IS locks never
arrived), so the wait is wasted.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek
Copy link
Copy Markdown
Contributor Author

lklimek commented Apr 1, 2026

Testing Progress & Analysis

IS Lock Fix Validation

The EventHandler IS lock path works correctly. Trace logs confirm:

INFO  EventHandler: applying IS lock directly on WalletManager txid=d7f42673...
DEBUG EventHandler: process_instant_send_lock completed txid=d7f42673...
INFO  EventHandler: applying IS lock directly on WalletManager txid=4d182f8700...
DEBUG EventHandler: process_instant_send_lock completed txid=4d182f8700...
  • 2 of 6 broadcast transactions received IS locks from the testnet
  • Both were correctly applied via our new SpvEventHandler::on_sync_event path
  • identity_withdraw test passed thanks to this fix (it requires spendable funds)
  • The remaining 4 transactions never received IS locks from the network at all — this is a testnet reliability issue, not a code bug

Test Results (parallel run with trace logging)

Test Result Notes
cleanup_only Init panic — cleanup timeout (pre-existing)
spv_wallet Init panic cascading from cleanup_only
fetch_contract No funds needed
identity_withdraw IS lock fix enabled this — funds became spendable
tx_is_ours No IS lock received for this tx
register_dpns No IS lock received for funding tx
send_funds No IS lock received for funding tx
identity_create No IS lock received for funding tx

Root Cause of Remaining Failures

The testnet only produced IS locks for 2 of 6 broadcast transactions. The other 4 never received InstantLockReceived events. This is confirmed by:

Related Issues Found During Investigation

  1. SPV broadcast_transaction does not notify local wallet manager — balance stale until block rust-dashcore#487 (open) — upstream broadcast_transaction doesn't notify local wallet. Our fix works alongside the existing notify_wallet_after_broadcast workaround.
  2. SPV request processor sends getcfheaders/getqrinfo to peers that don't support them rust-dashcore#616 (new, filed today) — SPV request processor sends getcfheaders/getqrinfo to peers that don't support them, causing 20s retry loops and sync stalls.

Additional Fix in This Branch

  • Reduced cleanup sweep timeout from 10s to 1s per orphaned wallet (14 wallets × 10s = 2+ min wasted on test startup). Commit c13831e8.

Resumption Notes

To continue work on this PR:

  1. The IS lock fix (SpvEventHandler::on_sync_eventprocess_instant_send_lock) is confirmed working
  2. Debug logging (EventHandler: applying IS lock...) can be removed or kept — it's useful for production diagnostics
  3. The cleanup timeout fix (c13831e8) should be pushed
  4. Single-threaded test runs (--test-threads=1) are more reliable for IS lock testing — parallel runs compete for framework wallet UTXOs
  5. Test reliability ultimately depends on testnet IS lock delivery, which is inconsistent

🤖 Co-authored by Claudius the Magnificent AI Agent

lklimek and others added 2 commits April 1, 2026 18:13
The init sequence waited 180s for spendable balance BEFORE waiting
for SPV to reach Running state. Wallet balances are only available
after compact filter sync completes, so the balance check always
timed out on the first attempt, wasting 3+ minutes per retry.

Swapped the order: wait for SPV Running first (up to 300s), then
check spendable balance (30s — should be near-instant after sync).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
send_wallet_payment_via_spv() built and signed the transaction under
the WalletManager write lock, then dropped the lock before
broadcasting. Concurrent callers could select the same UTXOs,
creating double-spend transactions that the network rejects (no IS
lock issued for the conflicting tx).

Now calls process_mempool_transaction() while still holding the write
lock, so spent UTXOs are immediately marked and unavailable to
concurrent callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek
Copy link
Copy Markdown
Contributor Author

lklimek commented Apr 1, 2026

Update: 7/8 E2E tests passing, 4x faster

After all fixes in this PR, parallel E2E test results improved dramatically:

Metric Before After
Tests passed 2-3 of 8 7 of 8
Runtime 644s 163s
IS locks applied 2 13
Data directory locked errors 2+ 0
Init panics/retries 2-3 0

Fixes in this PR

  1. IS lock for self-broadcast txs (87e79167) — SpvEventHandler::on_sync_event(InstantLockReceived) applies IS locks directly on WalletManager, bypassing the broken MempoolManager path
  2. UTXO double-spend prevention (d6ce0ece) — process_mempool_transaction() called inside the write lock before releasing, so concurrent callers can't select the same UTXOs
  3. Init wait order (7251b9ee) — wait_for_spv_running before wait_for_spendable_balance (was reversed, causing 3+ min wasted per retry)
  4. Cleanup timeout (c13831e8) — reduced from 10s to 1s per orphaned wallet

Remaining failure

tx_is_ours fails with "Timed out waiting for total balance" — wallet B never sees the funds. This is a bloom filter coverage issue (wallet B's address not in the filter when the tx was broadcast), not an IS lock issue. Separate investigation needed.

🤖 Co-authored by Claudius the Magnificent AI Agent

tx_is_ours test sends from wallet A to wallet B, but B's bloom
filter may not have propagated to peers yet. Peers don't relay the
tx back through B's filter, so B never sees it. Adding a 2s delay
after wallet creation gives the bloom filter time to reach peers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek
Copy link
Copy Markdown
Contributor Author

lklimek commented Apr 1, 2026

10-Run Stability Analysis

Ran E2E tests 10 times sequentially (run 1 with wiped SPV data, runs 2-10 reusing existing data).

Results

Run Pass Time Failed
1 (fresh SPV) 5/8 345s identity_create, send_funds, tx_is_ours
2 7/8 309s tx_is_ours
3 7/8 366s tx_is_ours
4 7/8 453s tx_is_ours
5 7/8 554s tx_is_ours
6 6/7 928s tx_is_ours
7 killed CPU pegged at 987%
8 6/8 466s cleanup_only, tx_is_ours
9 7/8 162s tx_is_ours
10 3/8 165s 5 tests (degraded)

Findings

Stable (runs 2-5): 7/8 consistently. Only tx_is_ours fails — every single run.

tx_is_ours root cause: The test sends A→B, but wallet B's bloom filter hasn't propagated to peers yet. The tx is never relayed back through B's filter, so B never sees it. Fixed in c2179284 — added 2s bloom filter propagation delay before broadcasting.

Runtime degradation: 309s → 366s → 453s → 554s → 928s across runs 2-6. Caused by orphaned test wallets accumulating in the persistent DB. Each run adds ~6 wallets that cleanup can't sweep (0 spendable balance). By run 6, ~30 orphaned wallets are loaded into SPV on every init, each with ~185 monitored addresses. The reconciliation loop clones AddressInfo for every address every 300ms tick across 12 worker threads → CPU saturation.

Fresh SPV wipe (run 1): 3 extra failures from cold compact filter sync. Expected — initial sync takes longer, some tests time out.

Fixes in this PR (cumulative)

Commit Fix Impact
87e79167 IS lock for self-broadcast txs Funds become spendable in seconds
d6ce0ece UTXO double-spend prevention (lock held during process_mempool_tx) Concurrent payments don't conflict
7251b9ee Init wait order (SPV sync before balance check) Eliminates 6+ min of failed init retries
c13831e8 Cleanup timeout 10s→1s Saves ~2 min on startup
c2179284 Bloom filter delay in tx_is_ours Should fix the 10/10 failure

Open issues

  1. Orphaned wallet accumulation — cleanup should delete zero-balance wallets from the DB, not just attempt to sweep. Without this, runtime degrades over repeated test runs.
  2. SPV broadcast_transaction does not notify local wallet manager — balance stale until block rust-dashcore#487 — upstream broadcast doesn't notify local wallet (our workaround works but should be removed when fixed)
  3. SPV request processor sends getcfheaders/getqrinfo to peers that don't support them rust-dashcore#616 — peers without compact filter support cause 20s retry loops

Resumption notes

  • tx_is_ours bloom filter fix (c2179284) needs validation — rerun tests to confirm
  • Orphaned wallet cleanup improvement is a follow-up task (not blocking this PR)
  • All other tests stable at 7/8 when run within ~5 runs of a fresh DB

🤖 Co-authored by Claudius the Magnificent AI Agent

Orphaned test wallets with 0 total balance were skipped during
cleanup and accumulated across runs (~10MB + 185 monitored addresses
each). By run 6, ~30 orphaned wallets caused reconciliation to
saturate all 12 CPU cores (987% CPU, 928s runtime).

Now deletes wallets with 0 total balance via remove_wallet(). Only
wallets with unconfirmed-but-unspendable funds are kept for future
cleanup attempts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@lklimek
Copy link
Copy Markdown
Contributor Author

lklimek commented Apr 1, 2026

Final Validation: 2 Runs with Clean Slate

Wiped SPV data + DB, ran tests twice (run 1 fresh sync, run 2 reusing SPV data).

Results

Run Pass Time IS Locks Panics Lock Errors Failed
1 (fresh) 7/8 395s 13 0 0 tx_is_ours
2 (reuse) 6/8 202s 11 0 0 send_funds, tx_is_ours

What's fixed

  • Zero panics, zero lock errors across both runs
  • No runtime degradation — run 2 (202s) is 2x faster than run 1 (395s) thanks to SPV data reuse
  • Orphaned wallet cleanup works — run 2 deleted 1 empty wallet. No accumulation.
  • 13 IS locks applied in run 1, 11 in run 2 — consistent
  • 7/8 tests pass on fresh SPV (was 2-3/8 before this PR)

What's still flaky

  • tx_is_ours — fails both runs. The 2s bloom filter delay isn't sufficient. Wallet B's address needs to be in the bloom filter BEFORE A→B broadcasts, but filter propagation timing is unreliable. Needs a deeper fix (trigger explicit filter rebuild + wait for peer acknowledgment). This is a pre-existing test design issue, not introduced by this PR.
  • send_funds — failed once in run 2 (testnet timing). Passes in run 1 and in previous 10-run analysis.

All commits in this PR

Commit Fix
87e79167 IS lock for self-broadcast txs via EventHandler
e995ed5e Link workarounds to upstream rust-dashcore#487
d6ce0ece UTXO double-spend prevention (process_mempool_tx under lock)
7251b9ee Init wait order fix (SPV sync before balance check)
c13831e8 Cleanup timeout 10s → 1s
c2179284 Bloom filter delay in tx_is_ours (partial fix)
41795286 Delete empty orphaned wallets from DB

🤖 Co-authored by Claudius the Magnificent AI Agent

@lklimek lklimek marked this pull request as ready for review April 8, 2026 12:44
@lklimek lklimek merged commit f4d3b3b into v1.0-dev Apr 8, 2026
6 checks passed
@thepastaclaw
Copy link
Copy Markdown
Collaborator

thepastaclaw commented Apr 8, 2026

Review Gate

Commit: 41795286

  • Debounce: 9643m ago (need 30m)

  • CI checks: checks still running (2 pending)

  • CodeRabbit review: comment found

  • Off-peak hours: peak window (5am-11am PT) — currently 05:45 AM PT Wednesday

  • Run review now (check to override)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants