feat(api): add Prometheus metrics endpoint#63
Conversation
Implements #28: Prometheus Metrics API New /metrics endpoint exposing: - nexusgate_completions_total (counter by model, status, api_format, api_key) - nexusgate_embeddings_total (counter by model, status, api_key) - nexusgate_tokens_prompt_total, nexusgate_tokens_completion_total (counters) - nexusgate_completion_duration_seconds, nexusgate_completion_ttft_seconds (histograms) - nexusgate_embedding_duration_seconds (histogram) - nexusgate_active_api_keys, nexusgate_active_providers, nexusgate_active_models (gauges) - nexusgate_info (build info gauge) Also includes: - Optional Prometheus + Grafana monitoring stack via docker-compose.monitoring.yaml - Pre-configured Grafana dashboard with request rates, latency, tokens, errors - Updated quick-start.sh with optional monitoring installation - Integration tests for metrics endpoint Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary of ChangesHello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the observability of the LLM gateway by integrating a comprehensive Prometheus and Grafana monitoring solution. It provides a dedicated API endpoint to expose key operational metrics, backed by new database queries for data aggregation. The addition of an optional monitoring stack and an updated quick-start script simplifies deployment and allows users to gain immediate insights into the system's performance, usage, and health through pre-built dashboards. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughWalkthrough新增 /metrics Prometheus 指标端点、指标生成服务(含 Redis 缓存)、数据库聚合与直方图查询、速率限制拒绝计数、监控 Stack(Prometheus+Grafana)配置、安装脚本支持及独立 Python 测试脚本。 Changes
Sequence Diagram(s)sequenceDiagram
participant Client as HTTP Client
participant API as /metrics Endpoint
participant Service as Metrics Service
participant Redis as Redis Cache
participant DB as Database
participant Formatter as Formatter
rect rgba(200,200,255,0.5)
Client->>API: GET /metrics
end
API->>Service: call generatePrometheusMetrics()
Service->>Redis: GET cached metrics
alt cache hit
Redis-->>Service: cached metrics string
else cache miss
par Parallel DB queries
Service->>DB: getCompletionMetricsByModelAndStatus()
Service->>DB: getEmbeddingMetricsByModelAndStatus()
Service->>DB: getCompletionDurationHistogram()
Service->>DB: getCompletionTTFTHistogram()
Service->>DB: getEmbeddingDurationHistogram()
Service->>DB: getActiveEntityCounts()
Service->>DB: getApiKeyRateLimitConfig()
end
DB-->>Service: aggregated rows
Service->>Formatter: render counters/gauges/histograms (ms→s, escape labels)
Formatter-->>Service: metrics text
Service->>Redis: SET cache (METRICS_CACHE_TTL_SECONDS)
end
Service-->>API: metrics text
API-->>Client: 200 + text/plain
Estimated code review effort🎯 4 (复杂) | ⏱️ ~70 分钟 Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧹 Recent nitpick comments
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2026-01-24T18:23:42.635ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
🔇 Additional comments (6)
✏️ Tip: You can disable this entire section by setting Comment |
There was a problem hiding this comment.
Code Review
This pull request introduces a new Prometheus metrics endpoint and integrates it with an optional Grafana monitoring stack. The changes include new database queries for metrics aggregation, a service to format metrics in Prometheus exposition format, updates to the main application to expose the endpoint, and a comprehensive update to the quick-start.sh script for optional monitoring setup. The Grafana dashboard configuration is also included, providing good visualization for the new metrics. The Python integration tests for the metrics endpoint are a valuable addition, ensuring the endpoint's correctness and adherence to the Prometheus format. Some minor code redundancies were identified, and a potential security concern regarding API key exposure in metrics was noted, suggesting anonymization.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/src/api/metrics.ts`:
- Around line 4-18: The /metrics endpoint (metricsApi) currently exposes
Prometheus metrics including an api_key/api_key_id label from
generatePrometheusMetrics, which leaks key identifiers; change the endpoint to
enforce an optional Bearer token: read a METRICS_BEARER_TOKEN (or similar) env
var and if it is set, validate the incoming Authorization header ("Bearer
<token>") and return 401 when missing/invalid; if the env var is unset keep
current public behavior. Additionally modify generatePrometheusMetrics (or add a
parameter like redactApiKeyLabels) to strip or omit any api_key/api_key_id
labels from the output so key identifiers are never emitted even when metrics
are public, and ensure metricsApi uses the redaction option when calling
generatePrometheusMetrics.
In `@backend/src/index.ts`:
- Around line 155-157: The current guard only checks path === "/metrics" so a
request to "/metrics/" falls through to the SPA; update the conditional that
checks the request path (the expression using path.startsWith("/api") ||
path.startsWith("/v1") || path === "/metrics") to also match the trailing-slash
variant (e.g., add path === "/metrics/" or equivalent) so requests to both
"/metrics" and "/metrics/" return the 404 via status(404).
🧹 Nitpick comments (3)
scripts/quick-start.sh (1)
456-459: 从 .env 读取配置时可能存在边界情况问题当
.env文件中ENABLE_MONITORING的值包含特殊字符或空格时,grep | cut的方式可能无法正确解析。♻️ 建议使用更健壮的解析方式
# 从现有 .env 读取监控配置 if [ -f ".env" ]; then - ENABLE_MONITORING=$(grep "ENABLE_MONITORING=" .env 2>/dev/null | cut -d '=' -f2 | tr -d ' ' || echo "false") + ENABLE_MONITORING=$(grep "^ENABLE_MONITORING=" .env 2>/dev/null | cut -d '=' -f2- | tr -d ' "'"'"'' || echo "false") fipython_test_code/test_metrics.py (1)
238-241: 移除不必要的 f-string 前缀这些字符串不包含占位符,不需要 f-string 前缀。静态分析工具标记了此问题。
♻️ 建议的修改
- print(f" Has _bucket: Yes") - print(f" Has _sum: Yes") - print(f" Has _count: Yes") - print(f" Has +Inf bucket: Yes") + print(" Has _bucket: Yes") + print(" Has _sum: Yes") + print(" Has _count: Yes") + print(" Has +Inf bucket: Yes")docker-compose.monitoring.yaml (1)
6-6: 使用固定的镜像版本标签替代:latest使用
:latest标签可能导致生产环境中出现不可预测的行为变化。建议使用固定的版本标签以确保可重复的部署。♻️ 建议的修改
- image: "prom/prometheus:latest" + image: "prom/prometheus:v3.0.1"- image: "grafana/grafana:latest" + image: "grafana/grafana:11.4.0"也适用于:第 23 行
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
.gitignorebackend/src/api/metrics.tsbackend/src/db/index.tsbackend/src/index.tsbackend/src/services/prometheus.tsdocker-compose.monitoring.yamlgrafana/provisioning/dashboards/dashboards.ymlgrafana/provisioning/dashboards/json/nexusgate-dashboard.jsongrafana/provisioning/datasources/prometheus.ymlprometheus/prometheus.ymlpython_test_code/test_metrics.pyscripts/quick-start.sh
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-24T18:23:42.635Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/api/v1/responses.ts:566-577
Timestamp: 2026-01-24T18:23:42.635Z
Learning: When using Elysia with the apiKeyPlugin, if a route option has checkApiKey: true, the apiKeyRecord parameter is guaranteed to be non-null. Do not add explicit null checks or non-null assertions for apiKeyRecord in such routes; rely on this contract and avoid unnecessary guards to improve readability. If there is any doubt about the guarantee in a specific place, add a runtime assertion at the boundary where the plugin is configured, but otherwise omit null checks for apiKeyRecord in backend code under backend/.
Applied to files:
backend/src/api/metrics.tsbackend/src/services/prometheus.tsbackend/src/db/index.tsbackend/src/index.ts
🧬 Code graph analysis (3)
backend/src/api/metrics.ts (1)
backend/src/services/prometheus.ts (1)
generatePrometheusMetrics(105-299)
backend/src/services/prometheus.ts (2)
backend/src/db/index.ts (7)
LATENCY_BUCKETS_MS(1412-1412)getCompletionMetricsByModelAndStatus(1359-1383)getEmbeddingMetricsByModelAndStatus(1389-1409)getCompletionDurationHistogram(1418-1436)getCompletionTTFTHistogram(1442-1460)getEmbeddingDurationHistogram(1466-1484)getActiveEntityCounts(1489-1528)backend/src/utils/config.ts (1)
COMMIT_SHA(104-104)
backend/src/index.ts (1)
backend/src/api/metrics.ts (1)
metricsApi(9-26)
🪛 Ruff (0.14.13)
python_test_code/test_metrics.py
238-238: f-string without any placeholders
Remove extraneous f prefix
(F541)
239-239: f-string without any placeholders
Remove extraneous f prefix
(F541)
240-240: f-string without any placeholders
Remove extraneous f prefix
(F541)
241-241: f-string without any placeholders
Remove extraneous f prefix
(F541)
290-290: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Code Quality Check
🔇 Additional comments (22)
.gitignore (1)
32-32: 已将 metrics 测试文件纳入版本控制
这能确保新增集成测试随 PR 一起进入代码库并被 CI 覆盖。backend/src/index.ts (1)
19-19: metrics 路由已正确挂载到主应用
引入并注册位置合理,路由层面集成清晰。Also applies to: 209-209
grafana/provisioning/dashboards/dashboards.yml (1)
1-16: 仪表盘自动加载配置清晰
目录路径与 JSON 仪表盘位置一致,设置合理。grafana/provisioning/datasources/prometheus.yml (1)
1-15: Prometheus 数据源预配符合预期
默认数据源与采样间隔配置合理。prometheus/prometheus.yml (1)
1-19: Prometheus 抓取配置符合监控栈预期
NexusGate 与自监控 job 都已覆盖,OK。docker-compose.monitoring.yaml (1)
4-41: 监控栈配置整体 LGTM!Docker Compose overlay 配置结构合理:
- Prometheus 配置了合适的 15 天数据保留期
- 启用了 lifecycle API 便于热重载配置
- Grafana 正确挂载了 provisioning 目录实现自动配置
- 端口通过环境变量配置,提供了灵活性
backend/src/services/prometheus.ts (5)
19-24: Prometheus 标签转义实现正确!
escapeLabelValue函数正确地转义了 Prometheus 格式要求的三种特殊字符:反斜杠、双引号和换行符。
29-37: LGTM!
formatLabels函数正确处理了标签格式化,包括跳过 null/undefined/空字符串值的逻辑。
82-100: 直方图格式化实现正确!函数正确输出了 Prometheus 直方图所需的所有组件:
- 各桶的累积计数 (
_bucket)+Inf桶(总计数)_sum和_count
105-121: 并行获取指标数据,性能良好!使用
Promise.all并行获取所有指标数据是正确的做法,避免了串行请求导致的延迟累加。
304-335: 直方图数据解析逻辑正确!函数正确处理了:
- 桶边界从毫秒到秒的转换
- 总和从毫秒到秒的转换
- 使用
bucket_inf作为计数的回退值grafana/provisioning/dashboards/json/nexusgate-dashboard.json (1)
1-1887: Grafana 仪表板配置全面且结构良好!仪表板包含了 LLM 网关监控所需的关键面板:
- 概览统计(请求数、成功率、活跃资源)
- 请求速率和吞吐量(按模型和状态分组)
- 延迟分布(P50/P95/P99 和 TTFT)
- Token 使用量
- 错误率和缓存命中率
- API 格式分布
Prometheus 查询与后端暴露的指标名称一致。
scripts/quick-start.sh (3)
70-105: 监控组件询问流程清晰友好!
ask_monitoring函数提供了清晰的描述,帮助用户了解监控组件的功能和资源占用,并提供了合理的默认选项。
157-205: 监控配置文件下载逻辑完整!正确创建了所需的目录结构并下载所有必要的配置文件:
docker-compose.monitoring.yamlprometheus/prometheus.yml- Grafana provisioning 文件
同时保持了"已存在则跳过"的幂等行为。
479-485: Docker Compose 多文件启动命令正确!使用
-f docker-compose.yaml -f docker-compose.monitoring.yaml正确实现了监控栈的叠加部署。backend/src/db/index.ts (4)
1359-1383: 补全指标查询实现正确!SQL 查询正确聚合了按 model、status、api_format 和 api_key_id 分组的数据,并使用
COALESCE处理了 NULL 值。
1411-1412: 延迟桶边界定义合理!
LATENCY_BUCKETS_MS覆盖了从 100ms 到 120s 的范围,适合 LLM 请求的典型延迟分布。
1418-1436: 直方图查询使用sql.raw构建动态 SQL虽然
sql.raw通常需要谨慎使用以防止 SQL 注入,但这里LATENCY_BUCKETS_MS是一个硬编码的数字数组常量,因此是安全的。如果将来
LATENCY_BUCKETS_MS变为可配置项,需要添加输入验证。
1489-1528: 活跃实体计数查询实现良好!使用
Promise.all并行执行四个独立的查询是高效的做法。各查询正确过滤了已删除/已撤销的实体。python_test_code/test_metrics.py (3)
30-80: Prometheus 指标解析器实现正确!
parse_prometheus_metrics函数正确解析了带标签和不带标签的指标行,并处理了# HELP和# TYPE注释行。
108-149: 测试覆盖了必要和可选指标!测试正确区分了必须存在的指标(如
nexusgate_info)和可选指标(如nexusgate_completions_total),后者可能因无数据而不存在。
282-293: 测试运行器的异常处理虽然捕获通用
Exception通常不推荐,但在测试场景中这是合理的,可以确保所有测试都能运行并报告结果,而不是因单个测试失败而中断整个测试套件。
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…d caching - Add METRICS_CACHE_TTL_SECONDS config for Redis-based metrics caching - Replace api_key_id with api_key_comment label for meaningful aggregation - Add API key rate limit metrics (rpm/tpm usage and limits) - Add rate limit rejection counter (429 errors by api_key and limit_type) - Add Redis hash operations (hincrby, hgetall) for rejection tracking - Add error handling with fallback metrics on failure - Add security documentation for public metrics endpoint - Update Grafana dashboard with: - Error Type Distribution pie chart - Rate Limit Rejections timeseries panel - API Key Rate Limit Usage gauge panel Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive Prometheus metrics endpoint, which is a fantastic addition for observability. The implementation is well-structured, including caching, fallback metrics on error, and a full monitoring stack with Docker Compose, Prometheus, and a detailed Grafana dashboard. The code is generally of high quality. I've identified one bug in the metrics generation logic and a potential performance bottleneck that could be addressed to make it even more robust. Overall, great work on this feature.
Address CodeRabbit review: requests to /metrics/ were falling through to SPA routing, returning HTML instead of 404. This caused Prometheus scraping errors when using trailing slash URLs. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Fix colon parsing in apiKeyComment by using pop() for limit type - Optimize rate limit status fetching with Promise.all for parallel execution Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Addressing Review CommentsFixed in commit 9eae9aa:
Already Addressed:
No Change Needed:
|
- Remove duplicate COUNT(*) in histogram queries (use single total_count) - Move bucketCases SQL fragments to module scope (computed once at load) - Add comment explaining why SUM is used instead of AVG for Prometheus histograms Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Prometheus scrapers may access /metrics/ with trailing slash, which would previously fall through to SPA routing and return HTML instead of metrics. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replace 4 parallel queries with a single query using subqueries. This reduces database round-trips from 4 to 1. Addresses review comment from @koitococo Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
GET /metricsendpoint returning Prometheus exposition formatMetrics Exposed
nexusgate_completions_totalnexusgate_embeddings_totalnexusgate_tokens_prompt_totalnexusgate_tokens_completion_totalnexusgate_tokens_embedding_totalnexusgate_completion_duration_secondsnexusgate_completion_ttft_secondsnexusgate_embedding_duration_secondsnexusgate_active_api_keysnexusgate_active_providersnexusgate_active_modelsnexusgate_infoFiles Added/Modified
backend/src/api/metrics.ts- Route definitionbackend/src/services/prometheus.ts- Prometheus format serializationbackend/src/db/index.ts- Added 6 query functions for metrics aggregationbackend/src/index.ts- Registered /metrics routedocker-compose.monitoring.yaml- Docker Compose override for Prometheus + Grafanagrafana/provisioning/- Grafana datasource and dashboard provisioningprometheus/prometheus.yml- Prometheus scrape configurationscripts/quick-start.sh- Added optional monitoring installationpython_test_code/test_metrics.py- Integration testsTest plan
bun run build)bun run check)bun run lint)curl http://localhost:3000/metricsUsage
Closes #28
🤖 Generated with Claude Code
Summary by CodeRabbit
新功能
文档/配置
测试
杂项
✏️ Tip: You can customize this high-level summary in your review settings.