Skip to content

feat(monitoring): comprehensive Prometheus monitoring stack v2#2014

Merged
Scottcjn merged 5 commits intoScottcjn:mainfrom
kuanglaodi2-sudo:feat/rustchain-prometheus-exporter-v2
Mar 31, 2026
Merged

feat(monitoring): comprehensive Prometheus monitoring stack v2#2014
Scottcjn merged 5 commits intoScottcjn:mainfrom
kuanglaodi2-sudo:feat/rustchain-prometheus-exporter-v2

Conversation

@kuanglaodi2-sudo
Copy link
Copy Markdown
Contributor

Summary

This PR adds a comprehensive Prometheus monitoring stack for the RustChain node with the following additions:

New Files

  • tools/monitoring/Dockerfile.exporter - Docker container for the RustChain Prometheus Exporter
  • tools/monitoring/prometheus.yml - Prometheus scrape configuration
  • tools/monitoring/grafana_dashboard.json - Pre-built Grafana dashboard with 10 panels
  • tools/monitoring/README.md - Installation and usage guide

Enhanced: prometheus_exporter.py

Added 5 new metrics:

  • rustchain_api_requests_total (Counter) - total API requests by endpoint and HTTP status
  • rustchain_scrape_duration_seconds (Gauge) - time taken for each scrape cycle
  • rustchain_epoch_block_time_avg (Gauge) - average block time in current epoch
  • rustchain_miner_antiquity_distribution (Histogram) - distribution of miner antiquity scores
  • rustchain_tx_pool_size (Gauge) - pending transaction pool size

Added _scrape_transactions() method hitting /tx/pool endpoint.

Grafana Dashboard Panels

  1. Node Health (up/down gauge)
  2. Current Epoch and Slot
  3. Active Miners count
  4. RTC Supply
  5. Epoch Pot
  6. API Response Time (by endpoint)
  7. Scrape Errors (error type breakdown)
  8. API Requests Total (by endpoint)
  9. Scrape Duration
  10. Miner Antiquity Distribution

Bounty


Auto-submitted for bounty #2000

@github-actions
Copy link
Copy Markdown
Contributor

Welcome to RustChain! Thanks for your first pull request.

Before we review, please make sure:

  • Your PR has a BCOS-L1 or BCOS-L2 label
  • New code files include an SPDX license header
  • You've tested your changes against the live node

Bounty tiers: Micro (1-10 RTC) | Standard (20-50) | Major (75-100) | Critical (100-150)

A maintainer will review your PR soon. Thanks for contributing!

@github-actions github-actions bot added documentation Improvements or additions to documentation BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) size/M PR: 51-200 lines labels Mar 30, 2026
Copy link
Copy Markdown
Contributor

@geldbert geldbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive Code Review

PR Summary

Title: feat(monitoring): comprehensive Prometheus monitoring stack v2
Changes: +836/-20 lines across 5 files

Architecture Review

Components Added:

  1. Dockerfile.exporter - Container for Prometheus exporter
  2. prometheus.yml - Prometheus scrape config
  3. grafana_dashboard.json - Pre-built dashboard
  4. README.md - Documentation
  5. prometheus_exporter.py - Enhanced with v2 metrics

Code Quality Assessment

Strengths:

  • Complete monitoring stack (Docker + Prometheus + Grafana)
  • Well-documented with clear README and metrics reference
  • TLS verification improved over baseline (pinned cert support)
  • Histogram for miner antiquity distribution is appropriate choice
  • Counter/Gauge metric types correctly applied per metric type

Observations:

  1. TLS Handling (lines 152-159):

    • Falls back gracefully when node.tls_config unavailable
    • Uses pinned cert at ~/.rustchain/node_cert.pem if available
    • Correct: Defaults to system CA bundle if no pinned cert
  2. API Request Tracking:

    • Correctly increments counter for both success (200) and failures
    • Labels include endpoint and status for proper filtering
  3. Scrape Duration (lines 253-257):

    • Uses time.time() delta - appropriate for this use case
    • Gauge type is correct for duration (not Counter)
  4. Miner Antiquity Histogram:

    • Bucket boundaries [0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, 15.0, 30.0]
    • Reasonable range for antiquity scores (0-30)
  5. Transaction Pool (lines 268-275):

    • Handles both dict and int responses - good defensive coding
    • Uses isinstance(data, dict) check before .get()

Minor Suggestions:

  1. Line 326: Method name start_scrapping should be start_scraping (typo - "scrapping" vs "scraping")

    • Note: This appears to be fixing a typo from the original file
  2. Consider adding requests.adapters.HTTPAdapter with retry logic for transient failures

  3. Dashboard uses hardcoded admin/rustchain123 credentials - recommend documenting users should change default password

Security Review

  • No secrets hardcoded (credentials are documented defaults, not actual secrets)
  • TLS verification properly configurable
  • No SQL injection vectors (no database operations)
  • API client uses timeout (10s default)

Documentation Quality

  • README is comprehensive with:
    • Quick start guide
    • Environment variables table
    • Systemd service setup
    • Dashboard import instructions
    • Full metrics reference table
    • Troubleshooting section

Recommendation

Approve - This is a well-structured monitoring stack with appropriate metric types, good documentation, and proper error handling. The v2 metrics add valuable observability for production deployments.

Assessment: Thorough review - substantial code contribution with production-ready monitoring infrastructure.


Code review per Bounty #73

@Scottcjn Scottcjn merged commit 9c80604 into Scottcjn:main Mar 31, 2026
7 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BCOS-L1 Beacon Certified Open Source tier BCOS-L1 (required for non-doc PRs) documentation Improvements or additions to documentation size/M PR: 51-200 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants