Skip to content

[refactor(infra)] Restructure Grafana dashboards and harden production compose#15

Merged
nxdun merged 6 commits into
mainfrom
nadun/promo-grafa-enchance
May 16, 2026
Merged

[refactor(infra)] Restructure Grafana dashboards and harden production compose#15
nxdun merged 6 commits into
mainfrom
nadun/promo-grafa-enchance

Conversation

@nxdun
Copy link
Copy Markdown
Owner

@nxdun nxdun commented May 13, 2026

Description

Replaces the ytdlp-health and captcha-security Grafana dashboards with three purpose-built dashboards: api-health, security-overview, and domain-services. Updates the production docker-compose.yml to enable Caddy, Prometheus, and Grafana as first-class services, and wires corresponding Terraform variables and cloud-init provisioning steps throughout the infra stack.

Key additions:

  • Add api-health.json, security-overview.json, and domain-services.json Grafana dashboards; remove ytdlp-health.json and captcha-security.json
  • Add metrics middleware (track_http_metrics) recording http_requests_total and http_request_duration_seconds per method/path/status
  • Add Prometheus counters to api_key.rs (auth_api_key_check_total) and rate_limit.rs (rate_limit_checks_total with tier/status labels)
  • Add GitHub contributions cache/API metrics (github_contributions_fetch_total, github_cache_size, github_api_duration_seconds) in contributions.rs
  • Promote Caddy, Prometheus, and Grafana from commented-out stubs to active services in docker-compose.yml with proper env/volume bindings
  • Replace YTDLP_DASHBOARD_URL and CAPTCHA_SECURITY_DASHBOARD_URL Terraform variables with API_HEALTH_DASHBOARD_URL, SECURITY_OVERVIEW_DASHBOARD_URL, and DOMAIN_SERVICES_DASHBOARD_URL across all relevant .tf files and Makefile

Types of changes

  • Refactor

Checklist

  • Updated ChangeLog
  • Terraform plan verified against updated variable definitions
  • New dashboard JSONs validated in a local Grafana instance
  • .env updated with new dashboard URL variable names

Summary by CodeRabbit

  • New Features

    • App now emits HTTP request metrics (rates and latency by route/status), API key validation metrics, rate-limit metrics, and GitHub contributions metrics.
    • Three new Grafana dashboards added: API & System Health, Security & Traffic Control, and Domain Services.
  • Updates

    • Production deployment configuration updated to run dedicated monitoring services (Prometheus, Grafana) and a Caddy frontend; responses no longer include the Server header.
  • Removed

    • Replaced older dashboard visualizations with the new dashboards above.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

Warning

Rate limit exceeded

@nxdun has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 56 minutes and 50 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c1f7dd0b-537a-4f16-bc6b-60f04526afe1

📥 Commits

Reviewing files that changed from the base of the PR and between a279223 and ca0066b.

📒 Files selected for processing (1)
  • docker-compose.yml
📝 Walkthrough

Walkthrough

Adds Prometheus instrumentation and HTTP metrics middleware; instruments API key, rate-limiter, and contribution service; replaces two Grafana dashboards with three new ones; updates Makefile/cloud-init/Terraform to presign and provision new dashboards; and reworks docker-compose for production (adds Prometheus/Grafana/Caddy, WireGuard device caps).

Changes

Application Observability and Infrastructure

Layer / File(s) Summary
Compose + cloud-init + Makefile + Terraform wiring
docker-compose.yml, infra/common/cloud-init.template, Makefile, infra/digitalocean/...
Production compose added caddy, prometheus, grafana, updated warp and app; cloud-init and Makefile now presign/upload new dashboard JSONs and set TF_VAR_*_DASHBOARD_URL; Terraform variables/locals/module inputs updated to accept three new dashboard URLs.
Grafana dashboard provisioning
infra/common/grafana/provisioning/dashboards/*
Added api-health.json, security-overview.json, domain-services.json; removed captcha-security.json (and removed prior ytdlp-health.json).
HTTP metrics middleware
src/middleware/metrics.rs, src/middleware/mod.rs, src/app.rs
New track_http_metrics middleware records request counts and latency histograms labeled by method/path/status and is wired into the Axum middleware stack.
Endpoint/service instrumentation
src/middleware/api_key.rs, src/middleware/rate_limit.rs, src/services/contributions.rs
API key validation increments auth_api_key_check_total; rate limiting increments rate_limit_checks_total with tier and status; contributions service records github_contributions_fetch_total, github_cache_size, and github_api_duration_seconds.
Caddy config updates
Caddyfile, Caddyfile.local
Added header -Server to suppress Server response header in production and local Caddy configurations.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Poem

🐰 I hopped through code, I counted the beats,
Metrics in buckets and labels so neat,
Dashboards now sparkle, alerts softly sing,
Prometheus hums while Grafana brings spring,
A twitch of my whiskers — observability, complete!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main changes: restructuring Grafana dashboards (removing old ones, adding new ones) and hardening the production docker-compose configuration.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch nadun/promo-grafa-enchance

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nxdun nxdun changed the title [refactor(infra)] Restructure Grafana dashboards and harden production compose [refactor(infra)] Restructure Grafana dashboards and harden production compose @coderabbitai May 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
src/middleware/metrics.rs (1)

11-15: ⚖️ Poor tradeoff

Reduce allocations by extracting as borrowed strings.

The current approach allocates method and path as owned Strings early, which are then cloned for the histogram and moved into the counter—resulting in 3 unnecessary allocations per label. Since this middleware executes on every HTTP request, the overhead compounds.

Consider extracting as &str and letting the metric macros handle conversion:

let method = req.method().as_str();

This avoids the upfront allocation and clone.

⚡ Proposed optimization to reduce allocations
-    let method = req.method().to_string();
-    let path = req.extensions().get::<MatchedPath>().map_or_else(
-        || req.uri().path().to_string(),
-        |matched_path| matched_path.as_str().to_string(),
-    );
+    let method = req.method().as_str();
+    let path = req.extensions()
+        .get::<MatchedPath>()
+        .map(|m| m.as_str())
+        .unwrap_or_else(|| req.uri().path());
 
     let start = Instant::now();
     let response = next.run(req).await;
     let latency = start.elapsed().as_secs_f64();
-    let status = response.status().as_u16().to_string();
+    let status = response.status().as_u16();
 
     histogram!(
         "http_request_duration_seconds",
-        "method" => method.clone(),
-        "path" => path.clone(),
-        "status" => status.clone()
+        "method" => method,
+        "path" => path,
+        "status" => status.to_string()
     )
     .record(latency);
 
     counter!(
         "http_requests_total",
         "method" => method,
         "path" => path,
-        "status" => status
+        "status" => status.to_string()
     )
     .increment(1);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/middleware/metrics.rs` around lines 11 - 15, The code currently eagerly
allocates Strings for method and path causing extra clones when used as labels;
change the extraction to borrow &str from the request: use req.method().as_str()
for method and use req.extensions().get::<MatchedPath>().map_or_else(||
req.uri().path(), |m| m.as_str()) for path so you pass &str into the
histogram/counter macros (allowing the metric macros to convert to owned strings
only when needed) while leaving identifiers method, path, MatchedPath, and req
in place.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docker-compose.yml`:
- Line 37: The volume entry "Caddyfile:/etc/caddy/Caddyfile:ro" is using a named
volume instead of a bind mount; update the docker-compose service's volumes to
bind the local Caddyfile path (e.g., ./Caddyfile or the repo-relative path) to
/etc/caddy/Caddyfile:ro so Docker Compose mounts the actual file into the Caddy
container; locate the volumes list for the Caddy service (the line containing
"Caddyfile:/etc/caddy/Caddyfile:ro") and replace the left-hand side with the
correct host path (for example ./Caddyfile) to fix the mount.
- Line 48: Prometheus mount path is incorrect and PRODUCTION_DOMAIN is missing:
update the Prometheus volume mapping in the docker-compose service to point to
the actual file by removing the extra "prometheus/" segment in the source path
(the volume line referencing the Prometheus config under the Prometheus
service), and add the environment variable
PRODUCTION_DOMAIN=REPLACE_WITH_PRODUCTION_DOMAIN to .env.example (and ensure it
is set in .env) so the Grafana service can expand ${PRODUCTION_DOMAIN}; adjust
the docker-compose Grafana service environment if needed to reference
PRODUCTION_DOMAIN.

In `@Makefile`:
- Line 191: The Makefile contains inconsistent escaping in the tr -d command:
change the double-escaped backslash sequences ("tr -d '\\r'") to the
single-escaped form ("tr -d '\r'") so they match the earlier usages; update each
occurrence that constructs TF_VAR_API_HEALTH_DASHBOARD_URL (and the other two
identical presign/tr pipelines) to use tr -d '\r' to ensure consistent shell
behavior when stripping CR characters from the aws s3 presign output.

---

Nitpick comments:
In `@src/middleware/metrics.rs`:
- Around line 11-15: The code currently eagerly allocates Strings for method and
path causing extra clones when used as labels; change the extraction to borrow
&str from the request: use req.method().as_str() for method and use
req.extensions().get::<MatchedPath>().map_or_else(|| req.uri().path(), |m|
m.as_str()) for path so you pass &str into the histogram/counter macros
(allowing the metric macros to convert to owned strings only when needed) while
leaving identifiers method, path, MatchedPath, and req in place.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: e4233bec-6076-4c40-8289-2fcdbe9814fa

📥 Commits

Reviewing files that changed from the base of the PR and between 161b6a9 and d407487.

📒 Files selected for processing (18)
  • Makefile
  • docker-compose.yml
  • infra/common/cloud-init.template
  • infra/common/grafana/provisioning/dashboards/api-health.json
  • infra/common/grafana/provisioning/dashboards/captcha-security.json
  • infra/common/grafana/provisioning/dashboards/domain-services.json
  • infra/common/grafana/provisioning/dashboards/security-overview.json
  • infra/common/grafana/provisioning/dashboards/ytdlp-health.json
  • infra/digitalocean/accounts/naduns-team/main.tf
  • infra/digitalocean/accounts/naduns-team/variables.tf
  • infra/digitalocean/components/locals.tf
  • infra/digitalocean/components/variables.tf
  • src/app.rs
  • src/middleware/api_key.rs
  • src/middleware/metrics.rs
  • src/middleware/mod.rs
  • src/middleware/rate_limit.rs
  • src/services/contributions.rs
💤 Files with no reviewable changes (2)
  • infra/common/grafana/provisioning/dashboards/ytdlp-health.json
  • infra/common/grafana/provisioning/dashboards/captcha-security.json

Comment thread docker-compose.yml
Comment thread docker-compose.yml Outdated
Comment thread Makefile
@nxdun nxdun changed the title [refactor(infra)] Restructure Grafana dashboards and harden production compose @coderabbitai [refactor(infra)] Restructure Grafana dashboards and harden production compose May 13, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
Caddyfile (1)

10-12: ⚡ Quick win

Consider adding header -Server for consistency.

Caddyfile.local includes header -Server before its redirect directive (line 15), but the production :80 redirect block omits it. For consistency and defense-in-depth, consider adding it here as well.

♻️ Suggested addition
 :80 {
+    header -Server
     redir https://${PRODUCTION_DOMAIN}{uri}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Caddyfile` around lines 10 - 12, The :80 redirect block in the Caddyfile
currently only performs redirection (the block containing ":80 { redir
https://${PRODUCTION_DOMAIN}{uri} }"); add the same header removal used in
Caddyfile.local by inserting "header -Server" into this :80 block before the
redir directive so the Server header is stripped for consistency and
defense-in-depth.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@Caddyfile`:
- Around line 10-12: The :80 redirect block in the Caddyfile currently only
performs redirection (the block containing ":80 { redir
https://${PRODUCTION_DOMAIN}{uri} }"); add the same header removal used in
Caddyfile.local by inserting "header -Server" into this :80 block before the
redir directive so the Server header is stripped for consistency and
defense-in-depth.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 909aae08-a97e-48be-ab49-301ab53010a2

📥 Commits

Reviewing files that changed from the base of the PR and between d407487 and a279223.

📒 Files selected for processing (3)
  • Caddyfile
  • Caddyfile.local
  • infra/common/cloud-init.template
🚧 Files skipped from review as they are similar to previous changes (1)
  • infra/common/cloud-init.template

@nxdun nxdun merged commit 4c89628 into main May 16, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant