Refs #54
Location
deploy/grafana/charon.json (no alerting section present)
Problem
The dashboard has 9 visualization panels but zero alerting rules. For a production liquidation bot, silent failure is the primary operational risk. The following conditions should generate alerts but currently have no coverage:
- Bot down: charon_scanner_blocks_total does not increase over 60s — bot has stopped scanning.
- Queue depth spike: charon_executor_queue_depth > threshold — possible stall in the executor.
- High simulation failure rate: rate(charon_executor_simulations_total{result="failure"}[5m]) / rate(charon_executor_simulations_total[5m]) > 0.5 — contract or RPC issue.
- Zero liquidations in 1h: increase(charon_executor_opportunities_queued_total[1h]) == 0 — scanner or health check broken.
- High drop rate: rate(charon_executor_opportunities_dropped_total[5m]) / rate(charon_executor_opportunities_queued_total[5m]) > 0.9 — upstream pipeline issue.
Impact
Without alerts, the operator must watch the dashboard continuously to detect bot failure. A stopped bot silently misses liquidation opportunities. Given the financial stakes (Venus liquidations), undetected downtime has direct cost.
Suggested Fix
Add Grafana unified alerting rules to the dashboard JSON for at minimum conditions 1-3. Alternatively, ship a Prometheus alerting rules YAML file at deploy/grafana/alerts.yaml alongside the dashboard JSON.
Refs #54
Location
deploy/grafana/charon.json (no alerting section present)
Problem
The dashboard has 9 visualization panels but zero alerting rules. For a production liquidation bot, silent failure is the primary operational risk. The following conditions should generate alerts but currently have no coverage:
Impact
Without alerts, the operator must watch the dashboard continuously to detect bot failure. A stopped bot silently misses liquidation opportunities. Given the financial stakes (Venus liquidations), undetected downtime has direct cost.
Suggested Fix
Add Grafana unified alerting rules to the dashboard JSON for at minimum conditions 1-3. Alternatively, ship a Prometheus alerting rules YAML file at deploy/grafana/alerts.yaml alongside the dashboard JSON.