Refs #42
PR: feat(cli): wire scanner → router → builder → simulator pipeline (feat/17-cli-e2e-pipeline)
Commit: latest on feat/17-cli-e2e-pipeline
File: crates/charon-cli/src/main.rs (adapter_ws and price_cache_ws construction)
Problem:
PR #42 consolidates four WS connections to fewer shared Arc<RootProvider> handles. The BlockListener has its own reconnect loop (PR #32). The shared provider does not.
If the WebSocket connection underlying the shared provider drops (network interruption, BSC node restart, Cloudflare WS timeout):
- adapter.fetch_positions() returns RPC errors every block.
- router.route() (Aave PoolDataProvider query) fails.
- simulator.simulate() (eth_call) fails.
- price_cache.refresh() fails.
The pipeline logs RPC errors every 3 seconds indefinitely. There is no backoff, no provider reconnect attempt, and no supervisor to restart the provider. BlockListener reconnects and resumes sending block events, so the drain loop stays active but all pipeline stages fail on every tick. With no Prometheus metrics yet (PR #50 pending), no alert fires.
Impact: Any network interruption leaves the bot in a permanent degraded state requiring manual restart. No observable error from the operator's perspective other than warn logs.
Fix:
- Wrap provider creation in a reconnect-aware factory matching BlockListener's backoff pattern.
- Or: on consecutive RPC failures (e.g. 3 in a row within one block interval), trigger controlled shutdown so Docker restart policy recovers the process cleanly.
- Document the reconnect strategy in a follow-up issue if deferring.
Refs #42
PR: feat(cli): wire scanner → router → builder → simulator pipeline (feat/17-cli-e2e-pipeline)
Commit: latest on feat/17-cli-e2e-pipeline
File: crates/charon-cli/src/main.rs (adapter_ws and price_cache_ws construction)
Problem:
PR #42 consolidates four WS connections to fewer shared Arc<RootProvider> handles. The BlockListener has its own reconnect loop (PR #32). The shared provider does not.
If the WebSocket connection underlying the shared provider drops (network interruption, BSC node restart, Cloudflare WS timeout):
The pipeline logs RPC errors every 3 seconds indefinitely. There is no backoff, no provider reconnect attempt, and no supervisor to restart the provider. BlockListener reconnects and resumes sending block events, so the drain loop stays active but all pipeline stages fail on every tick. With no Prometheus metrics yet (PR #50 pending), no alert fires.
Impact: Any network interruption leaves the bot in a permanent degraded state requiring manual restart. No observable error from the operator's perspective other than warn logs.
Fix: