Skip to content

feat(dashboard): add optional Grafana dashboard embedding on overview page#65

Merged
pescn merged 13 commits into
mainfrom
feat/grafana-dashboard-embedding
Jan 31, 2026
Merged

feat(dashboard): add optional Grafana dashboard embedding on overview page#65
pescn merged 13 commits into
mainfrom
feat/grafana-dashboard-embedding

Conversation

@pescn
Copy link
Copy Markdown
Contributor

@pescn pescn commented Jan 28, 2026

Summary

  • Add admin settings API (/api/admin/settings/:key) for storing configuration values
  • Allow users to configure a Grafana dashboard URL via the settings API
  • When configured, a toggle appears on the overview page to switch between built-in Recharts charts and an embedded Grafana iframe
  • Include a pre-built Grafana dashboard JSON (grafana/nexusgate-overview.json) for easy import

Test plan

  • Without Grafana URL configured: overview page shows built-in charts only, no toggle visible
  • Set Grafana URL via API:
    curl -X PUT -H "Authorization: Bearer $ADMIN_SECRET" \
      -H "Content-Type: application/json" \
      -d '{"value":"https://grafana.example.com/d/nexusgate-overview"}' \
      http://localhost:3000/api/admin/settings/grafana_dashboard_url
  • Refresh overview page: toggle appears in header
  • Switching to Grafana shows iframe, switching back shows Recharts
  • TimeRangeSelect hidden in Grafana mode, visible in built-in mode
  • Summary cards visible in both modes

🤖 Generated with Claude Code

Summary by CodeRabbit

  • 新功能

    • 概览页新增 Grafana 仪表板集成、视图切换(内置/Grafana)、前端钩子与嵌入组件;新增本地化键
    • 管理界面新增设置与仪表板的查看/更新/删除接口,支持环境变量覆盖与旧配置自动迁移
    • 导出并上报按模型/提供方/密钥的成本指标供监控
  • Bug 修复

    • 修复仪表盘成功率计算可能除以零的问题
  • 杂项

    • 更新启动脚本与监控配置,允许匿名访问与嵌入显示,新增示例与预置 Grafana 仪表盘配置

✏️ Tip: You can customize this high-level summary in your review settings.

… page

Add an admin settings API and allow users to configure a Grafana dashboard
URL. When configured, a toggle appears on the overview page to switch between
built-in Recharts charts and an embedded Grafana iframe. Includes a pre-built
Grafana dashboard JSON for easy import.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the dashboard's flexibility by allowing administrators to integrate external Grafana dashboards directly into the overview page. It provides a new backend API for storing configuration values, a frontend mechanism to fetch and display the Grafana dashboard via an iframe, and a user interface toggle to switch between the native charts and the embedded Grafana view. This enables users to leverage their existing Grafana setups for monitoring alongside the application's built-in analytics.

Highlights

  • Admin Settings API: Introduced a new backend API (/api/admin/settings/:key) for managing key-value configuration settings, allowing for GET, PUT, and DELETE operations.
  • Grafana Dashboard Embedding: Added the capability to embed a custom Grafana dashboard directly onto the overview page, providing an alternative visualization option.
  • View Mode Toggle: Implemented a user interface toggle on the overview page, enabling users to switch seamlessly between the existing built-in charts and the new embedded Grafana dashboard view.
  • Pre-built Grafana Dashboard: Included a default Grafana dashboard configuration file (grafana/nexusgate-overview.json) to facilitate easy import and setup for common NexusGate metrics.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent new feature for embedding a Grafana dashboard on the overview page. The implementation is solid, covering both backend API additions for settings and frontend components for rendering and toggling the view. I've identified a few areas for improvement, mainly around robustness and security, such as adding more specific validation for API inputs, improving error handling in the data fetching hook, and making the Grafana embedding component more resilient to invalid URLs. Addressing these points will make the feature more stable and secure.

Comment thread frontend/src/hooks/use-settings.ts Outdated
Comment thread frontend/src/pages/overview/grafana-embed.tsx Outdated
Comment thread backend/src/api/admin/settings.ts
Comment thread frontend/src/pages/overview/grafana-embed.tsx Outdated
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

新增 /admin/settings 与 /admin/dashboards 路由组并挂载;后端增加 Grafana 仪表板 schema、存取、迁移与 env 覆盖逻辑;Prometheus 指标加入成本相关 counters;前端新增 Grafana Hook、嵌入组件与视图切换,更新路由查询 schema、本地化词条及监控部署配置。

Changes

Cohort / File(s) Summary
后端 Admin 路由
backend/src/api/admin/index.ts, backend/src/api/admin/settings.ts, backend/src/api/admin/dashboards.ts
注册并挂载 adminSettingsadminDashboardssettings 提供设置项 CRUD;dashboards 提供 GET/PUT/DELETE,含输入校验、env 覆盖检测、409 与 400 错误语义。
后端 配置与工具
backend/src/utils/config.ts, backend/src/utils/dashboards.ts
新增 grafana 仪表板 zod schema/types、GRAFANA_DASHBOARDS env 解析;实现 get/set/clear、env 覆盖检测(EnvOverrideError)、以及 legacy->新格式的迁移逻辑。
后端 数据与指标
backend/src/db/index.ts, backend/src/services/prometheus.ts
新增 getCompletionCostMetrics(),并在 Prometheus 指标生成中按 model/provider/api_key_comment 汇总并输出成本 counters(prompt/completion/total USD)。
前端 Hook 与组件
frontend/src/hooks/use-settings.ts, frontend/src/pages/overview/grafana-embed.tsx, frontend/src/pages/overview/view-mode-toggle.tsx
新增 useGrafanaDashboards Hook、GrafanaEmbed 组件(iframe 嵌入并处理 kiosk 参数与无效 URL)和 ViewModeToggle 视图切换组件。
前端 路由与页面集成
frontend/src/routes/_dashboard/index.tsx, frontend/src/routes/_dashboard/route.tsx, frontend/src/pages/overview/index.tsx
overviewSearchSchema 新增 view 字段;route 与页面集成 dashboards 数据、添加 view 切换与合并式路由更新;按 view 条件渲染 GrafanaEmbed 或内置图表并调整时间范围合并行为。
国际化
frontend/src/i18n/locales/en-US.json, frontend/src/i18n/locales/zh-CN.json
新增 5 个词条:pages.overview.viewMode.builtinpages.overview.viewMode.grafanapages.overview.viewMode.labelpages.overview.grafana.titlepages.overview.grafana.invalidUrl
Grafana 仪表盘与部署配置
grafana/nexusgate-overview.json, grafana/provisioning/dashboards/json/nexusgate-dashboard.json, docker-compose.monitoring.yaml, scripts/quick-start.sh
新增完整 nexusgate-overview.json 仪表盘;provisioning 仪表盘修复 PromQL(防止除零)、扩展面板;在 docker-compose 中添加匿名与嵌入相关 env;quick-start.sh 添加示例 GRAFANA_DASHBOARDS 与 embed URL。

Sequence Diagram(s)

sequenceDiagram
    rect rgba(245,245,255,0.5)
    participant 用户 as User
    participant 前端 as Frontend
    participant 后端API as API
    participant 存储 as DB
    participant Grafana as Grafana
    end

    用户->>前端: 访问概览页面
    前端->>后端API: GET /admin/dashboards
    后端API->>存储: 读取 DASHBOARDS_KEY / legacy key 或 env
    存储-->>后端API: 返回 dashboards 或 空
    alt env override active
        后端API-->>前端: 返回 { dashboards: from env, envOverride: true }
    else
        后端API-->>前端: 返回 { dashboards, envOverride: false }
    end
    前端->>前端: 渲染 ViewModeToggle (dashboards)
    用户->>前端: 切换 view -> grafana
    前端->>前端: 渲染 GrafanaEmbed(url)
    GrafanaEmbed->>GrafanaEmbed: 确保 URL 包含 ?kiosk=
    GrafanaEmbed->>Grafana: iframe 请求 URL
    Grafana-->>GrafanaEmbed: 返回嵌入内容
    GrafanaEmbed-->>用户: 显示仪表盘
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰✨ 仪表盘上线我来跳,
后端迁移前端嵌入俏,
URL 加 ?kiosk= 不慌张,
成本指标随时来统计,
小兔挥爪监控跑。

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the main change: adding optional Grafana dashboard embedding on the overview page. It directly reflects the primary objective and matches the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/grafana-dashboard-embedding

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/src/routes/_dashboard/route.tsx (1)

27-64: Grafana 配置被移除时可能“卡死”在 grafana 视图,导致桌面端无法切回或调整时间范围。

当 URL 仍带 view=grafanagrafanaUrl 为空时,Toggle 隐藏且 TimeRangeSelect 被隐藏,用户无法恢复。建议引入 effectiveViewMode,在未配置 Grafana 时强制回退到 builtin

🩹 建议改动
   const grafanaUrl = useGrafanaDashboardUrl()

   const timeRange = range as TimeRange
-  const viewMode = view as ViewMode
+  const viewMode = view as ViewMode
+  const effectiveViewMode: ViewMode = grafanaUrl ? viewMode : 'builtin'
@@
-              {grafanaUrl && <ViewModeToggle value={viewMode} onChange={handleViewModeChange} />}
-              {viewMode !== 'grafana' && <TimeRangeSelect value={timeRange} onChange={handleTimeRangeChange} />}
+              {grafanaUrl && <ViewModeToggle value={effectiveViewMode} onChange={handleViewModeChange} />}
+              {effectiveViewMode !== 'grafana' && (
+                <TimeRangeSelect value={timeRange} onChange={handleTimeRangeChange} />
+              )}
🤖 Fix all issues with AI agents
In `@frontend/src/hooks/use-settings.ts`:
- Around line 7-20: The hook currently casts the settings value to string
without validation, which can let non-string values through and break Grafana
embedding; update grafanaDashboardUrlQueryOptions (and/or the
useGrafanaDashboardUrl return logic) to validate the runtime type: after
fetching data from api.admin.settings[GRAFANA_DASHBOARD_URL_KEY].get(), check
typeof value === 'string' and that value.trim() !== '' before returning it,
otherwise return undefined (or null normalized to undefined) so
useGrafanaDashboardUrl only ever returns a non-empty string or undefined.

In `@frontend/src/pages/overview/grafana-embed.tsx`:
- Around line 9-12: The code constructs embedUrl with new URL(url) which will
throw for invalid/undefined input; wrap the creation in a try/catch (around the
new URL(url) call in grafana-embed.tsx) or validate the input string first, and
if it fails return a safe fallback (e.g., null/placeholder component) or log the
error and skip setting kiosk; ensure any subsequent use of embedUrl is guarded
so an invalid URL does not cause a runtime crash.

In `@grafana/nexusgate-overview.json`:
- Around line 152-158: The Success Rate PromQL expression uses division by
sum(nexusgate_completions_total) and can divide by zero when there are no
completions; update the query that uses the expr value to guard against zero by
applying a safe denominator (e.g., clamp_min on sum(nexusgate_completions_total)
or using an or vector(0) fallback) so the metric
(sum(nexusgate_completions_total{status="completed"}) /
sum(nexusgate_completions_total)) cannot produce NaN or no-data; change the expr
for the target named "A" to use one of these patterns to prevent divide-by-zero.
🧹 Nitpick comments (2)
frontend/src/pages/overview/grafana-embed.tsx (1)

20-26: iframe 的 title 属性建议使用 i18n。

为保持一致性和可访问性,建议将 iframe 的 title 属性也使用国际化翻译,与 CardTitle 保持一致。

♻️ 建议的修改
         <iframe
           src={embedUrl.toString()}
           className="h-[800px] w-full rounded border-0"
           sandbox="allow-scripts allow-same-origin allow-popups allow-forms"
           loading="lazy"
-          title="Grafana Dashboard"
+          title={t('pages.overview.grafana.title')}
         />
frontend/src/pages/overview/view-mode-toggle.tsx (1)

18-33: 建议添加无障碍属性以提升可访问性。

切换按钮缺少 aria-pressed 属性,屏幕阅读器用户无法感知当前选中状态。

♿ 建议的修改
       <Button
         variant="ghost"
         size="xs"
         className={cn(value === 'builtin' && 'bg-accent')}
         onClick={() => onChange('builtin')}
+        aria-pressed={value === 'builtin'}
       >
         {t('pages.overview.viewMode.builtin')}
       </Button>
       <Button
         variant="ghost"
         size="xs"
         className={cn(value === 'grafana' && 'bg-accent')}
         onClick={() => onChange('grafana')}
+        aria-pressed={value === 'grafana'}
       >
         {t('pages.overview.viewMode.grafana')}
       </Button>
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dc8e30c and 797b0b4.

📒 Files selected for processing (12)
  • backend/src/api/admin/index.ts
  • backend/src/api/admin/settings.ts
  • frontend/src/hooks/use-settings.ts
  • frontend/src/i18n/locales/en-US.json
  • frontend/src/i18n/locales/zh-CN.json
  • frontend/src/pages/overview/grafana-embed.tsx
  • frontend/src/pages/overview/index.tsx
  • frontend/src/pages/overview/view-mode-toggle.tsx
  • frontend/src/routeTree.gen.ts
  • frontend/src/routes/_dashboard/index.tsx
  • frontend/src/routes/_dashboard/route.tsx
  • grafana/nexusgate-overview.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-24T18:23:42.635Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/api/v1/responses.ts:566-577
Timestamp: 2026-01-24T18:23:42.635Z
Learning: When using Elysia with the apiKeyPlugin, if a route option has checkApiKey: true, the apiKeyRecord parameter is guaranteed to be non-null. Do not add explicit null checks or non-null assertions for apiKeyRecord in such routes; rely on this contract and avoid unnecessary guards to improve readability. If there is any doubt about the guarantee in a specific place, add a runtime assertion at the boundary where the plugin is configured, but otherwise omit null checks for apiKeyRecord in backend code under backend/.

Applied to files:

  • backend/src/api/admin/settings.ts
  • backend/src/api/admin/index.ts
🧬 Code graph analysis (5)
frontend/src/hooks/use-settings.ts (1)
frontend/src/lib/api.ts (1)
  • api (21-21)
frontend/src/pages/overview/view-mode-toggle.tsx (2)
frontend/src/components/ui/button.tsx (1)
  • Button (50-50)
frontend/src/lib/utils.ts (1)
  • cn (4-6)
backend/src/api/admin/settings.ts (1)
backend/src/db/index.ts (2)
  • getAllSettings (583-590)
  • upsertSetting (613-625)
backend/src/api/admin/index.ts (1)
backend/src/api/admin/settings.ts (1)
  • adminSettings (4-58)
frontend/src/routes/_dashboard/route.tsx (5)
frontend/src/hooks/use-settings.ts (1)
  • useGrafanaDashboardUrl (18-21)
frontend/src/pages/overview/use-overview-stats.ts (1)
  • TimeRange (6-6)
frontend/src/pages/overview/view-mode-toggle.tsx (2)
  • ViewMode (6-6)
  • ViewModeToggle (13-36)
frontend/src/components/app/app-header.tsx (1)
  • AppHeaderPart (23-29)
frontend/src/pages/overview/time-range-select.tsx (1)
  • TimeRangeSelect (22-39)
🔇 Additional comments (9)
frontend/src/i18n/locales/en-US.json (1)

397-400: LGTM!

新增的国际化键命名规范,与现有的 pages.overview.* 命名模式保持一致,翻译内容清晰准确。

frontend/src/i18n/locales/zh-CN.json (1)

398-401: LGTM!

中文翻译准确,"内置"、"Grafana" 和 "Grafana 仪表盘" 的用词恰当,与文件中现有的术语风格一致。

grafana/nexusgate-overview.json (1)

1-367: Dashboard 整体结构良好。

Grafana 仪表盘配置完善,包含关键指标(请求数、延迟百分位、Token 使用量、成功率等),面板布局合理,为 NexusGate 提供了良好的可观测性支持。

frontend/src/pages/overview/view-mode-toggle.tsx (1)

6-11: LGTM!

ViewMode 类型定义清晰,组件接口设计简洁,导出类型方便其他组件复用。

backend/src/api/admin/index.ts (1)

10-10: LGTM!

adminSettings 模块正确集成到管理路由中,受 checkAdminApiKey 守卫保护,与现有的管理模块挂载模式一致。

Also applies to: 33-33

frontend/src/routes/_dashboard/index.tsx (1)

8-10: 视图参数纳入搜索校验,设计合理。

frontend/src/pages/overview/index.tsx (1)

62-111: Grafana/内置视图切换渲染逻辑清晰,移动端控制也处理到位。

frontend/src/routeTree.gen.ts (1)

7-312: 自动生成文件,无需人工评审。

backend/src/api/admin/settings.ts (1)

4-57: 不存在认证绕过风险。 adminSettings 已通过 .guard({ checkAdminApiKey: true }) 保护,所有设置相关的 GET、PUT、DELETE 操作均需要有效的管理员 API 密钥。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread frontend/src/hooks/use-settings.ts Outdated
Comment thread frontend/src/pages/overview/grafana-embed.tsx Outdated
Comment thread grafana/nexusgate-overview.json
Add support for multiple Grafana dashboards with environment variable
override. When GRAFANA_DASHBOARDS env var is set, it takes precedence
over database settings.

Backend changes:
- Add GRAFANA_DASHBOARDS env var schema in config.ts
- Create dashboard utility (backend/src/utils/dashboards.ts) with
  env override detection and legacy migration
- Add /api/admin/dashboards endpoints (GET/PUT/DELETE)
- Return 409 Conflict when trying to modify while env override active

Frontend changes:
- Update use-settings hook with useGrafanaDashboards()
- Convert ViewModeToggle to multi-option selector
- Support dynamic dashboard list from API
- Change URL param from enum to string for dashboard IDs

Also updates quick-start.sh to auto-configure GRAFANA_DASHBOARDS
when monitoring is enabled.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@frontend/src/routes/_dashboard/route.tsx`:
- Around line 62-66: The mobile users can get stuck in Grafana view because
ViewModeToggle and TimeRangeSelect are hidden on mobile; update the route logic
in route.tsx to detect mobile (use existing viewMode and handleViewModeChange)
and either (A) render a minimal ViewModeToggle on mobile so users can switch
back to 'builtin', or (B) enforce built-in on mobile by checking the query param
(?view=...) during route load and redirecting/resetting viewMode to 'builtin'
when on a mobile device; locate the viewMode, handleViewModeChange,
ViewModeToggle and TimeRangeSelect usage in route.tsx and implement one of these
fixes in that router/component initialization path.

In `@scripts/quick-start.sh`:
- Around line 408-410: The GRAFANA_DASHBOARD_EMBED_URL is hardcoding "localhost"
which breaks remote access; update the script to build the URL from a
configurable host (e.g., use an environment variable like GRAFANA_HOST or
GRAFANA_DASHBOARD_HOST with a fallback to "localhost" when unset) instead of
always using "localhost", and add a short note in the startup/config summary
instructing users to set that env var (or manually edit GRAFANA_DASHBOARDS in
.env) to their public IP or domain for remote iframe embedding to work;
reference the GRAFANA_DASHBOARD_EMBED_URL variable and the .env
GRAFANA_DASHBOARDS setting when making these changes.
🧹 Nitpick comments (5)
frontend/src/pages/overview/view-mode-toggle.tsx (1)

16-37: 建议添加无障碍属性。

切换组件缺少无障碍属性,屏幕阅读器用户可能难以理解其用途。

♻️ 建议的改进
   return (
-    <div className="inline-flex items-center rounded-md border p-0.5">
+    <div className="inline-flex items-center rounded-md border p-0.5" role="group" aria-label={t('pages.overview.viewMode.label')}>
       <Button
         variant="ghost"
         size="xs"
         className={cn(value === 'builtin' && 'bg-accent')}
         onClick={() => onChange('builtin')}
+        aria-pressed={value === 'builtin'}
       >
backend/src/api/admin/dashboards.ts (2)

10-16: 存在重复的 schema 定义。

dashboardSchema (TypeBox) 与 grafanaDashboardsSchema (Zod) 定义了相同的结构。虽然 TypeBox 用于 Elysia 请求验证、Zod 用于额外校验是可行的,但维护两套 schema 可能导致不一致。

可考虑统一使用 Zod schema 并通过 @elysiajs/typebox 转换,或仅依赖 Elysia 内置验证。当前实现可接受,但需注意两者同步更新。


52-60: 基于字符串匹配的错误检测较脆弱。

当前通过 error.message.includes("GRAFANA_DASHBOARDS") 识别环境变量覆盖错误。如果 dashboards.ts 中的错误消息变更,此检测会静默失效。

建议使用自定义错误类或在 setGrafanaDashboards / clearGrafanaDashboards 中抛出带有 code 属性的错误对象,以实现更可靠的错误识别。

♻️ 推荐的改进方案

dashboards.ts 中定义自定义错误类:

export class EnvOverrideError extends Error {
  readonly code = 'ENV_OVERRIDE_ACTIVE';
  constructor() {
    super('Cannot modify dashboards when GRAFANA_DASHBOARDS environment variable is set');
  }
}

然后在此处使用:

-        if (
-          error instanceof Error &&
-          error.message.includes("GRAFANA_DASHBOARDS")
-        ) {
+        if (error instanceof EnvOverrideError) {
frontend/src/hooks/use-settings.ts (1)

18-24: 建议对 API 响应数据进行运行时类型校验。

当前直接将 data 断言为 DashboardsResponse(第 23 行),若后端返回格式异常(如字段缺失或类型不匹配),可能导致运行时错误或 undefined 访问。

建议添加基本的结构验证:

♻️ 推荐的改进方案
 const dashboardsQueryOptions = queryOptions({
   queryKey: ['dashboards'],
   queryFn: async (): Promise<DashboardsResponse> => {
     const { data, error } = await api.admin.dashboards.get()
     if (error) {
       return { dashboards: [], envOverride: false }
     }
-    return data as DashboardsResponse
+    // Basic runtime validation
+    const response = data as DashboardsResponse
+    if (!Array.isArray(response?.dashboards)) {
+      return { dashboards: [], envOverride: false }
+    }
+    return response
   },
   staleTime: 5 * 60 * 1000, // 5 minutes
   retry: false,
 })
backend/src/utils/dashboards.ts (1)

38-53: 迁移逻辑存在轻微的竞态条件,但影响有限。

并发请求可能同时触发旧格式迁移:两个请求都读取到旧值后分别执行 upsertSettingdeleteSetting。由于 upsertSetting 是幂等的,结果数据一致,不会造成数据损坏。

建议添加迁移日志以便观察:

♻️ 可选的改进方案
   if (legacySetting?.value && typeof legacySetting.value === "string") {
+    console.log(`[dashboards] Migrating legacy grafana_dashboard_url to new format`);
     const migrated: GrafanaDashboard[] = [
       {
         id: "grafana",
         label: "Grafana",
         url: legacySetting.value,
       },
     ];
     // Migrate to new format
     await upsertSetting({ key: DASHBOARDS_KEY, value: migrated });
     // Remove legacy key
     await deleteSetting(LEGACY_KEY);
     return migrated;
   }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 797b0b4 and 334db75.

📒 Files selected for processing (10)
  • backend/src/api/admin/dashboards.ts
  • backend/src/api/admin/index.ts
  • backend/src/utils/config.ts
  • backend/src/utils/dashboards.ts
  • frontend/src/hooks/use-settings.ts
  • frontend/src/pages/overview/index.tsx
  • frontend/src/pages/overview/view-mode-toggle.tsx
  • frontend/src/routes/_dashboard/index.tsx
  • frontend/src/routes/_dashboard/route.tsx
  • scripts/quick-start.sh
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2026-01-24T18:23:42.635Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/api/v1/responses.ts:566-577
Timestamp: 2026-01-24T18:23:42.635Z
Learning: When using Elysia with the apiKeyPlugin, if a route option has checkApiKey: true, the apiKeyRecord parameter is guaranteed to be non-null. Do not add explicit null checks or non-null assertions for apiKeyRecord in such routes; rely on this contract and avoid unnecessary guards to improve readability. If there is any doubt about the guarantee in a specific place, add a runtime assertion at the boundary where the plugin is configured, but otherwise omit null checks for apiKeyRecord in backend code under backend/.

Applied to files:

  • backend/src/api/admin/index.ts
  • backend/src/utils/config.ts
  • backend/src/api/admin/dashboards.ts
  • backend/src/utils/dashboards.ts
📚 Learning: 2026-01-24T22:33:16.471Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/utils/streaming-context.ts:132-146
Timestamp: 2026-01-24T22:33:16.471Z
Learning: In NexusGate, for ReqId deduplication (specifically in streaming-context.ts and reqIdHandler.ts), when reqIdContext is present, finalizeReqId should use updateCompletion instead of addCompletions. This causes ReqId requests to not write srv_logs entries; errors are captured in the completion record's status (e.g., failed/aborted). This avoids duplicate srv_logs entries during retry scenarios and keeps the audit trail cleaner. Treat the completion record as the source of truth for ReqId request outcomes.

Applied to files:

  • backend/src/utils/config.ts
  • backend/src/utils/dashboards.ts
🧬 Code graph analysis (6)
backend/src/api/admin/index.ts (2)
backend/src/api/admin/settings.ts (1)
  • adminSettings (4-58)
backend/src/api/admin/dashboards.ts (1)
  • adminDashboards (18-100)
backend/src/utils/config.ts (1)
frontend/src/hooks/use-settings.ts (1)
  • GrafanaDashboard (5-9)
backend/src/api/admin/dashboards.ts (2)
backend/src/utils/dashboards.ts (4)
  • getGrafanaDashboards (23-56)
  • isEnvOverrideActive (15-17)
  • setGrafanaDashboards (62-72)
  • clearGrafanaDashboards (78-86)
backend/src/utils/config.ts (1)
  • grafanaDashboardsSchema (161-161)
frontend/src/hooks/use-settings.ts (2)
backend/src/utils/config.ts (1)
  • GrafanaDashboard (162-162)
frontend/src/lib/api.ts (1)
  • api (21-21)
backend/src/utils/dashboards.ts (2)
backend/src/utils/config.ts (3)
  • GRAFANA_DASHBOARDS (164-167)
  • GrafanaDashboard (162-162)
  • grafanaDashboardsSchema (161-161)
backend/src/db/index.ts (1)
  • upsertSetting (613-625)
frontend/src/pages/overview/index.tsx (3)
frontend/src/hooks/use-settings.ts (1)
  • useGrafanaDashboards (29-31)
frontend/src/pages/overview/use-overview-stats.ts (1)
  • TimeRange (6-6)
frontend/src/pages/overview/grafana-embed.tsx (1)
  • GrafanaEmbed (5-30)
🔇 Additional comments (14)
scripts/quick-start.sh (1)

426-428: LGTM!

JSON 配置格式正确,与后端 grafanaDashboardsSchema 定义的结构一致。kiosk 参数是 Grafana 嵌入的最佳实践,可隐藏导航栏。

frontend/src/routes/_dashboard/route.tsx (1)

29-33: LGTM!

状态管理逻辑正确。使用 replace: true 避免了历史记录污染,空数组默认值确保了 dashboards 在加载期间不会导致渲染错误。

frontend/src/routes/_dashboard/index.tsx (1)

8-11: LGTM!

Schema 扩展正确,使用 .catch('builtin') 确保向后兼容性。对于动态的仪表盘 ID,使用 z.string() 而非枚举是合理的设计。

backend/src/utils/config.ts (1)

154-167: LGTM!

Schema 定义清晰,z.string().url() 提供了 URL 格式验证。类型导出与前端 GrafanaDashboard 接口保持一致。使用 zObject 包装 optional() 允许环境变量未设置时返回 undefined

backend/src/api/admin/index.ts (2)

6-14: LGTM!

新模块正确导入并按字母顺序排列。


34-35: LGTM!

adminSettingsadminDashboards 正确注册在 admin guard 内部,所有端点都受到 checkAdminApiKey 保护。路由路径 /settings/dashboards 不会冲突。

backend/src/api/admin/dashboards.ts (1)

20-35: LGTM!

GET 端点实现清晰,正确返回仪表盘列表和环境变量覆盖状态。

frontend/src/hooks/use-settings.ts (1)

33-37: LGTM!

向后兼容的辅助函数实现简洁,使用可选链安全地处理各种边界情况。

frontend/src/pages/overview/index.tsx (3)

62-69: LGTM!

仪表盘选择逻辑实现正确:

  • 从路由参数获取 view,通过 id 匹配查找对应仪表盘
  • showGrafana 布尔值安全地基于 selectedDashboard 存在性派生
  • 数据流清晰,状态管理合理

71-77: LGTM!

handleTimeRangeChange 正确使用更新函数合并搜索状态,确保在更改时间范围时保留 view 参数。


111-113: TypeScript 类型收窄可能不完整。

showGrafana 基于 !!selectedDashboard 派生,但 TypeScript 无法通过中间布尔变量进行类型收窄。如果 TS 编译器未报错,当前代码可正常工作;否则可考虑内联条件检查。

请确认项目构建时此处无类型错误。如有问题,可改为:

-          {showGrafana ? (
-            <GrafanaEmbed url={selectedDashboard.url} />
+          {selectedDashboard ? (
+            <GrafanaEmbed url={selectedDashboard.url} />
backend/src/utils/dashboards.ts (3)

15-17: LGTM!

isEnvOverrideActive 函数逻辑正确:区分 undefined(未设置)和空数组(显式设置为空)的情况。


62-72: LGTM!

setGrafanaDashboards 实现正确:先检查环境变量覆盖状态,再执行数据库写入。错误处理清晰。


78-86: LGTM!

clearGrafanaDashboards 实现与 setGrafanaDashboards 保持一致的模式,逻辑正确。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread frontend/src/routes/_dashboard/route.tsx
Comment thread scripts/quick-start.sh
pescn and others added 2 commits January 28, 2026 20:46
Address code review feedback from CodeRabbit and Gemini:

- Add error handling for invalid URLs in GrafanaEmbed component
  with graceful fallback UI when URL parsing fails
- Add accessibility attributes (aria-pressed, role, aria-label)
  to ViewModeToggle for screen reader support
- Add runtime validation in useGrafanaDashboards hook to handle
  unexpected API response formats
- Fix divide-by-zero in Grafana dashboard PromQL queries for
  Success Rate metric using clamp_min()
- Add missing i18n translations for new UI strings
- Add comment in quick-start.sh noting localhost limitation for
  remote access

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add Grafana environment variables to enable iframe embedding:
- GF_AUTH_ANONYMOUS_ENABLED=true - Allow unauthenticated access
- GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer - Read-only access for anonymous users
- GF_SECURITY_ALLOW_EMBEDDING=true - Allow Grafana to be embedded in iframes

Without these settings, the embedded Grafana dashboard would show a
login page instead of the dashboard content.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn
Copy link
Copy Markdown
Contributor Author

pescn commented Jan 28, 2026

/gemini review

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a great new feature for embedding Grafana dashboards on the overview page. The implementation is comprehensive, covering backend APIs, frontend components, configuration management with backward compatibility, and even a pre-built Grafana dashboard. The code is generally well-written and includes good practices like iframe sandboxing and fixing potential division-by-zero errors in dashboard queries. I've added a couple of suggestions to improve code clarity and robustness in the new backend API.

Comment thread backend/src/api/admin/dashboards.ts Outdated
Comment thread backend/src/api/admin/dashboards.ts Outdated
TanStack Router auto-generated file updated after route changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docker-compose.monitoring.yaml`:
- Around line 30-34: The docker-compose config currently hardcodes
GF_AUTH_ANONYMOUS_ENABLED=true, GF_AUTH_ANONYMOUS_ORG_ROLE=Viewer and
GF_SECURITY_ALLOW_EMBEDDING=true which opens security risks; change these
entries so they are configurable and default to false (or remove the hardcoded
true values) and only enable them when explicit env variables are provided
(e.g., via an external .env or compose variable substitution), ensuring
GF_AUTH_ANONYMOUS_ENABLED and GF_SECURITY_ALLOW_EMBEDDING default to "false" and
GF_AUTH_ANONYMOUS_ORG_ROLE is omitted or set only when anonymous access is
enabled; update any relevant docs or comments near these vars so future
maintainers know to explicitly opt-in before turning them on.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d49637 and 99871d1.

📒 Files selected for processing (1)
  • docker-compose.monitoring.yaml
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 65
File: frontend/src/routes/_dashboard/route.tsx:62-66
Timestamp: 2026-01-28T12:49:25.269Z
Learning: In the NexusGate frontend (frontend/src/routes/_dashboard/route.tsx), the ViewModeToggle component is intentionally hidden on mobile devices because Grafana dashboards don't render well on small screens. The primary use case for Grafana embedding is desktop monitoring dashboards. Mobile users can still access Grafana view via URL parameter if needed, but it's not encouraged.

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread docker-compose.monitoring.yaml
pescn and others added 2 commits January 28, 2026 21:10
…dation

- Add custom EnvOverrideError class for type-safe error handling
- Remove redundant Zod validation (Elysia TypeBox already validates)
- Use instanceof check instead of string matching for errors

Addresses PR review feedback about fragile string-based error detection.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comment explaining that anonymous auth and embedding settings
are for local development convenience. Recommend using Grafana's
Public Dashboard feature for production deployments.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/src/utils/dashboards.ts`:
- Around line 28-39: isEnvOverrideActive currently returns false when
GRAFANA_DASHBOARDS is an explicitly empty array, causing the code to fall back
to DB; change the logic so that any defined GRAFANA_DASHBOARDS value counts as
an override (check !== undefined or typeof !== 'undefined'), and in
getGrafanaDashboards when that override is active return the environment value
with a safe fallback to an empty array (use GRAFANA_DASHBOARDS ?? []) to honor
an explicit []; update the functions isEnvOverrideActive and
getGrafanaDashboards to use these checks and return behavior, referencing the
GRAFANA_DASHBOARDS symbol.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 99871d1 and fa9bcf6.

📒 Files selected for processing (3)
  • backend/src/api/admin/dashboards.ts
  • backend/src/utils/dashboards.ts
  • docker-compose.monitoring.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • backend/src/api/admin/dashboards.ts
  • docker-compose.monitoring.yaml
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 65
File: frontend/src/routes/_dashboard/route.tsx:62-66
Timestamp: 2026-01-28T12:49:25.269Z
Learning: In the NexusGate frontend (frontend/src/routes/_dashboard/route.tsx), the ViewModeToggle component is intentionally hidden on mobile devices because Grafana dashboards don't render well on small screens. The primary use case for Grafana embedding is desktop monitoring dashboards. Mobile users can still access Grafana view via URL parameter if needed, but it's not encouraged.
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 65
File: docker-compose.monitoring.yaml:30-34
Timestamp: 2026-01-28T13:12:52.161Z
Learning: The docker-compose.monitoring.yaml file in NexusGate is intended for local development and quick-start scenarios where convenience is prioritized. For production deployments, users should use the Helm chart or custom configuration with appropriate security settings. The file intentionally enables GF_AUTH_ANONYMOUS_ENABLED and GF_SECURITY_ALLOW_EMBEDDING for local iframe embedding testing, but production deployments should use Grafana's Public Dashboard feature, auth proxy, or direct Grafana access instead.
📚 Learning: 2026-01-24T18:23:42.635Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/api/v1/responses.ts:566-577
Timestamp: 2026-01-24T18:23:42.635Z
Learning: When using Elysia with the apiKeyPlugin, if a route option has checkApiKey: true, the apiKeyRecord parameter is guaranteed to be non-null. Do not add explicit null checks or non-null assertions for apiKeyRecord in such routes; rely on this contract and avoid unnecessary guards to improve readability. If there is any doubt about the guarantee in a specific place, add a runtime assertion at the boundary where the plugin is configured, but otherwise omit null checks for apiKeyRecord in backend code under backend/.

Applied to files:

  • backend/src/utils/dashboards.ts
📚 Learning: 2026-01-24T22:33:16.471Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/utils/streaming-context.ts:132-146
Timestamp: 2026-01-24T22:33:16.471Z
Learning: In NexusGate, for ReqId deduplication (specifically in streaming-context.ts and reqIdHandler.ts), when reqIdContext is present, finalizeReqId should use updateCompletion instead of addCompletions. This causes ReqId requests to not write srv_logs entries; errors are captured in the completion record's status (e.g., failed/aborted). This avoids duplicate srv_logs entries during retry scenarios and keeps the audit trail cleaner. Treat the completion record as the source of truth for ReqId request outcomes.

Applied to files:

  • backend/src/utils/dashboards.ts
🧬 Code graph analysis (1)
backend/src/utils/dashboards.ts (3)
backend/src/utils/config.ts (3)
  • GRAFANA_DASHBOARDS (164-167)
  • GrafanaDashboard (162-162)
  • grafanaDashboardsSchema (161-161)
frontend/src/hooks/use-settings.ts (1)
  • GrafanaDashboard (5-9)
backend/src/db/index.ts (1)
  • upsertSetting (613-625)
🔇 Additional comments (4)
backend/src/utils/dashboards.ts (4)

14-21: EnvOverrideError 设计清晰
错误码与信息明确,便于上层区分“环境覆盖”场景。


42-66: 迁移与回退逻辑清晰
新格式优先、兼容旧键并迁移到新结构的流程可读性很好。


75-83: 写入前覆盖校验合理
覆盖开启时禁止写入的处理符合预期。


89-95: 清理逻辑符合预期
覆盖开启时阻止清理、否则删除新键,行为一致。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread backend/src/utils/dashboards.ts Outdated
pescn and others added 2 commits January 28, 2026 21:20
When GRAFANA_DASHBOARDS is set to [], it should disable dashboards
without falling back to DB. Previously required length > 0.

Now isEnvOverrideActive() returns true if env var is defined at all,
allowing ops to explicitly disable dashboards via GRAFANA_DASHBOARDS=[].

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1. Auto-select first Grafana dashboard when dashboards are configured
   - Hide built-in option by default when Grafana dashboards exist
   - Auto-redirect from 'builtin' to first dashboard on load

2. Improve iframe height
   - Use min-h-[calc(100vh-280px)] for responsive height
   - Iframe now fills available viewport space

3. Sync theme with NexusGate
   - Pass &theme=light or &theme=dark to Grafana URL
   - Uses resolvedTheme from next-themes

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@frontend/src/pages/overview/index.tsx`:
- Around line 66-80: The current auto-select useEffect only triggers when view
=== 'builtin', causing inconsistencies if view is missing/invalid; update the
effect in the component that references dashboards, selectedDashboard, view,
useEffect and navigate to also handle a missing/invalid view by checking if
dashboards.length > 0 and (view === 'builtin' || !selectedDashboard) and then
call navigate to set search.view to dashboards[0].id (keep replace: true). Also
include selectedDashboard in the effect dependencies to ensure the fallback runs
when selection becomes undefined.
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3122693 and 62dead7.

📒 Files selected for processing (3)
  • frontend/src/pages/overview/grafana-embed.tsx
  • frontend/src/pages/overview/index.tsx
  • frontend/src/pages/overview/view-mode-toggle.tsx
🚧 Files skipped from review as they are similar to previous changes (1)
  • frontend/src/pages/overview/grafana-embed.tsx
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 65
File: frontend/src/routes/_dashboard/route.tsx:62-66
Timestamp: 2026-01-28T12:49:25.269Z
Learning: In the NexusGate frontend (frontend/src/routes/_dashboard/route.tsx), the ViewModeToggle component is intentionally hidden on mobile devices because Grafana dashboards don't render well on small screens. The primary use case for Grafana embedding is desktop monitoring dashboards. Mobile users can still access Grafana view via URL parameter if needed, but it's not encouraged.
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 65
File: docker-compose.monitoring.yaml:30-34
Timestamp: 2026-01-28T13:12:52.161Z
Learning: The docker-compose.monitoring.yaml file in NexusGate is intended for local development and quick-start scenarios where convenience is prioritized. For production deployments, users should use the Helm chart or custom configuration with appropriate security settings. The file intentionally enables GF_AUTH_ANONYMOUS_ENABLED and GF_SECURITY_ALLOW_EMBEDDING for local iframe embedding testing, but production deployments should use Grafana's Public Dashboard feature, auth proxy, or direct Grafana access instead.
📚 Learning: 2026-01-28T12:49:25.269Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 65
File: frontend/src/routes/_dashboard/route.tsx:62-66
Timestamp: 2026-01-28T12:49:25.269Z
Learning: In the NexusGate frontend (frontend/src/routes/_dashboard/route.tsx), the ViewModeToggle component is intentionally hidden on mobile devices because Grafana dashboards don't render well on small screens. The primary use case for Grafana embedding is desktop monitoring dashboards. Mobile users can still access Grafana view via URL parameter if needed, but it's not encouraged.

Applied to files:

  • frontend/src/pages/overview/index.tsx
  • frontend/src/pages/overview/view-mode-toggle.tsx
🔇 Additional comments (4)
frontend/src/pages/overview/view-mode-toggle.tsx (1)

14-50: 实现清晰,切换逻辑一致。

默认隐藏内置选项、仅在有必要时展示切换按钮的行为与目标一致,ARIA 标注也到位。

frontend/src/pages/overview/index.tsx (3)

82-88: 合并 search 的方式合理。

保留既有查询参数并只更新 range,能避免切换时间范围时丢失 view


108-113: 移动端仅在内置模式显示时间范围选择器,逻辑清晰。


122-174: Grafana 与内置图表分支渲染结构清楚。

SummaryCards 在两种模式下保持可见,符合需求。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread frontend/src/pages/overview/index.tsx Outdated
Comment on lines +66 to +80
const dashboards = dashboardsData?.dashboards ?? []
const selectedDashboard = dashboards.find((d) => d.id === view)
const timeRange = range as TimeRange
const showGrafana = !!selectedDashboard

// Auto-select first Grafana dashboard when dashboards are configured
useEffect(() => {
if (dashboards.length > 0 && view === 'builtin') {
navigate({
to: '/',
search: (prev) => ({ ...prev, view: dashboards[0].id }),
replace: true,
})
}
}, [dashboards, view, navigate])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

处理无效/缺失 view 时的自动回落。

当 dashboards 存在但 view 为空或不匹配任何 dashboard 时,selectedDashboardundefinedshowGrafanafalse,但内置选项默认隐藏,容易出现“展示内置图表却没有内置按钮”的不一致。建议在自动选择逻辑中同时处理无效/缺失的 view

🛠️ 建议修复
-  useEffect(() => {
-    if (dashboards.length > 0 && view === 'builtin') {
-      navigate({
-        to: '/',
-        search: (prev) => ({ ...prev, view: dashboards[0].id }),
-        replace: true,
-      })
-    }
-  }, [dashboards, view, navigate])
+  useEffect(() => {
+    if (dashboards.length === 0) return
+    const isBuiltin = view === 'builtin' || !view
+    const isInvalid = !isBuiltin && !selectedDashboard
+    if (isBuiltin || isInvalid) {
+      navigate({
+        to: '/',
+        search: (prev) => ({ ...prev, view: dashboards[0].id }),
+        replace: true,
+      })
+    }
+  }, [dashboards, view, selectedDashboard, navigate])
🤖 Prompt for AI Agents
In `@frontend/src/pages/overview/index.tsx` around lines 66 - 80, The current
auto-select useEffect only triggers when view === 'builtin', causing
inconsistencies if view is missing/invalid; update the effect in the component
that references dashboards, selectedDashboard, view, useEffect and navigate to
also handle a missing/invalid view by checking if dashboards.length > 0 and
(view === 'builtin' || !selectedDashboard) and then call navigate to set
search.view to dashboards[0].id (keep replace: true). Also include
selectedDashboard in the effect dependencies to ensure the fallback runs when
selection becomes undefined.

@pescn pescn requested a review from koitococo January 28, 2026 14:06
Add cost metrics to Prometheus and enhance Grafana dashboard with
advanced analytics panels for Issue #53:

Backend changes:
- Add getCompletionCostMetrics() DB query joining completions with
  models table to calculate costs based on pricing
- Add 3 new Prometheus counters: nexusgate_cost_prompt_usd_total,
  nexusgate_cost_completion_usd_total, nexusgate_cost_total_usd_total
  with labels: model, provider, api_key_comment

Grafana dashboard enhancements:
- Add P75/P90 quantiles to latency panels (Duration & TTFT)
- Add "Cost Analysis" row: Cost Over Time, Cost by Provider/API Key
  pie charts, Cost Breakdown table
- Add "Trend Analysis" row: Request/Cost comparison with previous
  period, Change % stats
- Add "Model Comparison" row: P95 Latency, Error Rate, Cost per 1K
  Tokens bar gauges, Model Performance Matrix table

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn
Copy link
Copy Markdown
Contributor Author

pescn commented Jan 29, 2026

/gemini review

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@backend/src/db/index.ts`:
- Around line 1553-1561: The total_cost_usd aggregation currently requires both
m.input_price and m.output_price to be non-null which makes it inconsistent with
prompt_cost_usd and completion_cost_usd; update the SUM/CASE expression that
defines total_cost_usd so it independently adds the prompt and completion
contributions (use separate CASE terms that check m.input_price != NULL with
c.prompt_tokens and m.output_price != NULL with c.completion_tokens) rather than
gating the entire sum on both prices being present, ensuring total_cost_usd
equals prompt_cost_usd + completion_cost_usd.
🧹 Nitpick comments (4)
grafana/provisioning/dashboards/json/nexusgate-dashboard.json (4)

1367-1376: 建议:Error Rate 查询缺少除零保护

此处的错误率计算 sum(...{status="failed"}) / sum(...) 在某个模型没有请求时可能产生 NaN。建议与 Line 214 的修复保持一致,添加 clamp_min 保护。

♻️ 建议的修复
-          "expr": "sum(rate(nexusgate_completions_total{status=\"failed\"}[$__rate_interval])) by (model) / sum(rate(nexusgate_completions_total[$__rate_interval])) by (model)",
+          "expr": "sum(rate(nexusgate_completions_total{status=\"failed\"}[$__rate_interval])) by (model) / clamp_min(sum(rate(nexusgate_completions_total[$__rate_interval])) by (model), 0.001)",

1455-1464: 建议:Cache Hit Rate 查询缺少除零保护

与上述问题类似,缓存命中率的计算在没有请求时可能产生 NaN。

♻️ 建议的修复
-          "expr": "sum(rate(nexusgate_completions_total{status=\"cache_hit\"}[$__rate_interval])) / sum(rate(nexusgate_completions_total[$__rate_interval]))",
+          "expr": "sum(rate(nexusgate_completions_total{status=\"cache_hit\"}[$__rate_interval])) / clamp_min(sum(rate(nexusgate_completions_total[$__rate_interval])), 0.001)",

3217-3226: 建议:Model Comparison 中的 Error Rate 同样缺少除零保护

此处与 Line 1373 的问题相同,建议添加 clamp_min 保护以保持一致性。

♻️ 建议的修复
-          "expr": "sum(rate(nexusgate_completions_total{status=\"failed\"}[$__rate_interval])) by (model) / sum(rate(nexusgate_completions_total[$__rate_interval])) by (model)",
+          "expr": "sum(rate(nexusgate_completions_total{status=\"failed\"}[$__rate_interval])) by (model) / clamp_min(sum(rate(nexusgate_completions_total[$__rate_interval])) by (model), 0.001)",

3556-3566: 建议:Model Performance Matrix 表格中的 ErrorRate 查询也需要除零保护

为保持仪表盘整体一致性,此处的错误率查询也应添加 clamp_min 保护。

♻️ 建议的修复
-          "expr": "sum(rate(nexusgate_completions_total{status=\"failed\"}[$__rate_interval])) by (model) / sum(rate(nexusgate_completions_total[$__rate_interval])) by (model)",
+          "expr": "sum(rate(nexusgate_completions_total{status=\"failed\"}[$__rate_interval])) by (model) / clamp_min(sum(rate(nexusgate_completions_total[$__rate_interval])) by (model), 0.001)",
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62dead7 and a8fa60e.

📒 Files selected for processing (3)
  • backend/src/db/index.ts
  • backend/src/services/prometheus.ts
  • grafana/provisioning/dashboards/json/nexusgate-dashboard.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2026-01-24T18:23:42.635Z
Learnt from: pescn
Repo: EM-GeekLab/NexusGate PR: 59
File: backend/src/api/v1/responses.ts:566-577
Timestamp: 2026-01-24T18:23:42.635Z
Learning: When using Elysia with the apiKeyPlugin, if a route option has checkApiKey: true, the apiKeyRecord parameter is guaranteed to be non-null. Do not add explicit null checks or non-null assertions for apiKeyRecord in such routes; rely on this contract and avoid unnecessary guards to improve readability. If there is any doubt about the guarantee in a specific place, add a runtime assertion at the boundary where the plugin is configured, but otherwise omit null checks for apiKeyRecord in backend code under backend/.

Applied to files:

  • backend/src/services/prometheus.ts
  • backend/src/db/index.ts
🧬 Code graph analysis (1)
backend/src/services/prometheus.ts (1)
backend/src/db/index.ts (1)
  • getCompletionCostMetrics (1534-1577)
🔇 Additional comments (9)
grafana/provisioning/dashboards/json/nexusgate-dashboard.json (6)

214-214: 除零保护修复,LGTM!

使用 clamp_min(sum(increase(nexusgate_completions_total[$__range])), 1) 保护除数,当没有数据时避免显示 NaN 或 Inf,是一个良好的实践。


743-780: 新增 p75/p90 百分位延迟指标,LGTM!

新增的 p75 和 p90 百分位数能更全面地展示延迟分布,对于监控尾部延迟很有帮助。refId 的更新 (B→D, C→E) 也保持了正确的顺序。


2166-2265: 新增 Cost Analysis 区域,整体设计良好!

成本分析面板配置合理:

  • 使用 currencyUSD 单位格式化货币显示
  • 采用堆叠模式 ("mode": "normal") 展示各模型成本累计
  • 图例配置显示 sum 值便于快速查看总成本

2871-2880: 除零保护使用了 clamp_min(... , 1),LGTM!

Request Change % 计算正确使用了 clamp_min(..., 1) 避免除零错误。


2948-2957: clamp_min 使用 0.001 作为最小值是合理的

对于成本变化百分比,使用 clamp_min(..., 0.001) 而非 1 是合理的,因为成本金额可能很小(小于 $1),使用较小的最小值可以得到更准确的百分比计算。


3080-3092: 新增 Model Comparison 区域,设计合理!

Model Comparison 部分提供了很好的多维度对比视图:

  • P95 延迟、错误率、每千 Token 成本、请求数的横向对比
  • 使用 bargauge 可视化便于快速比较
  • Model Performance Matrix 表格汇总了关键指标

这对于模型选型和性能优化非常有帮助。

backend/src/db/index.ts (1)

1534-1577: 函数整体实现良好

新增的 getCompletionCostMetrics 函数:

  • 正确使用 LEFT JOIN 获取模型定价和 provider 信息
  • 使用 COALESCE 处理 NULL 值
  • 类型断言合理,与文件中其他类似函数保持一致
  • 按 model、provider、api_key_comment 分组适合 Prometheus 标签维度
backend/src/services/prometheus.ts (2)

10-10: 导入和数据获取结构良好

  • 新增导入与现有导入组织方式一致
  • 使用 Promise.all 并行获取 cost metrics,保持性能最优

Also applies to: 176-186


305-361: Cost metrics 实现遵循现有模式

实现亮点:

  • 仅当 cost > 0 时才添加指标,避免无定价模型产生噪声数据
  • 标签维度 (model, provider, api_key_comment) 与 Grafana dashboard 需求匹配
  • 使用 formatCounter 与其他计数器保持一致

代码结构清晰,与文件中其他指标处理逻辑风格统一。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment thread backend/src/db/index.ts
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that adds a significant and well-implemented feature for Grafana dashboard embedding. The backend changes, including the new settings and dashboards APIs, are robust and include a thoughtful migration path for legacy configurations. The frontend implementation provides a great user experience with features like view mode toggling and theme synchronization. The inclusion of a pre-built Grafana dashboard and the fix for the potential division-by-zero error in the success rate calculation are also highly appreciated. I have one suggestion to further improve the robustness of the frontend data fetching logic.

Comment on lines +23 to +28
// Runtime validation to handle unexpected API responses
const response = data as DashboardsResponse
if (!Array.isArray(response?.dashboards)) {
return { dashboards: [], envOverride: false }
}
return response
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The runtime validation here is a good step towards defensive programming. However, it only checks if response.dashboards is an array, but doesn't validate the shape of the objects within the array. To make this more robust against unexpected API responses, I suggest using a schema validation library like Zod, which is already in use elsewhere in the frontend.

This ensures that the entire data structure matches expectations, preventing potential runtime errors if the API contract changes or returns malformed data. You'll need to add import { z } from 'zod' at the top of the file.

    const { z } = await import('zod')
    const GrafanaDashboardSchema = z.object({
      id: z.string(),
      label: z.string(),
      url: z.string().url(),
    })
    const DashboardsResponseSchema = z.object({
      dashboards: z.array(GrafanaDashboardSchema),
      envOverride: z.boolean(),
    })

    const parsed = DashboardsResponseSchema.safeParse(data)
    if (!parsed.success) {
      console.error('Failed to parse dashboards response:', parsed.error)
      return { dashboards: [], envOverride: false }
    }
    return parsed.data
References
  1. When parsing data from external API responses, prioritize robust, verbose type guarding over concise functional approaches (like filter with type predicates) to defensively handle unexpected data shapes and prevent runtime errors.

- Fix total_cost_usd calculation to independently sum prompt and
  completion costs (previously required both prices to be non-null)
- Add clamp_min(0.001) protection to 4 division queries in Grafana
  dashboard to prevent NaN when no requests exist:
  - Error Rate by Model panel
  - Cache Hit Rate panel
  - Model Comparison Error Rate bar gauge
  - Model Performance Matrix ErrorRate column

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn
Copy link
Copy Markdown
Contributor Author

pescn commented Jan 29, 2026

Thanks @coderabbitai for the thorough review!

Fixed in commit c2f31b3:

  1. total_cost_usd inconsistency (backend/src/db/index.ts): Fixed the SQL query to independently sum prompt and completion costs. Now total_cost_usd = prompt_cost_usd + completion_cost_usd even when only one price is configured.

  2. Divide-by-zero protection (Grafana dashboard): Added clamp_min(..., 0.001) to all 4 identified queries:

    • Error Rate by Model panel (line 1373)
    • Cache Hit Rate panel (line 1461)
    • Model Comparison Error Rate bar gauge (line 3223)
    • Model Performance Matrix ErrorRate (line 3562)

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 29, 2026

Tip

For best results, initiate chat on the files or code changes.

@pescn Thank you for addressing those issues! The fixes look great:

SQL query fix - Independently summing prompt and completion costs ensures total_cost_usd is calculated correctly regardless of pricing configuration.

Divide-by-zero protection - Adding clamp_min(..., 0.001) to all four identified queries prevents potential NaN/Infinity values in the Grafana dashboard panels.

Great attention to detail in fixing all four instances! 🎉

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@grafana/provisioning/dashboards/json/nexusgate-dashboard.json`:
- Around line 2526-2558: The joinByField transformation currently uses "model"
as the join key (id: "joinByField", options.byField: "model") which can drop
rows when a model has multiple provider or api_key_comment combinations; update
the transformations to either use a composite key (concatenate
model+provider+api_key_comment as the join field) or replace the joinByField
step with a seriesToRows transformation so each provider/api_key_comment pair is
preserved before the organize transformation (id: "organize") and renames.
🧹 Nitpick comments (1)
grafana/provisioning/dashboards/json/nexusgate-dashboard.json (1)

1743-1754: 建议:API Key Rate Limit Usage 面板也添加除零保护。

此处的除法查询未使用 clamp_min 保护:

100 * nexusgate_api_key_rpm_usage / nexusgate_api_key_rpm_limit
100 * nexusgate_api_key_tpm_usage / nexusgate_api_key_tpm_limit

limit 为 0 或指标缺失时会产生 NaN。为保持一致性,建议添加保护:

♻️ 建议修改
-          "expr": "100 * nexusgate_api_key_rpm_usage / nexusgate_api_key_rpm_limit",
+          "expr": "100 * nexusgate_api_key_rpm_usage / clamp_min(nexusgate_api_key_rpm_limit, 1)",
-          "expr": "100 * nexusgate_api_key_tpm_usage / nexusgate_api_key_tpm_limit",
+          "expr": "100 * nexusgate_api_key_tpm_usage / clamp_min(nexusgate_api_key_tpm_limit, 1)",
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a8fa60e and c2f31b3.

📒 Files selected for processing (2)
  • backend/src/db/index.ts
  • grafana/provisioning/dashboards/json/nexusgate-dashboard.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • backend/src/db/index.ts
🔇 Additional comments (6)
grafana/provisioning/dashboards/json/nexusgate-dashboard.json (6)

214-214: 除零保护实现正确。

使用 clamp_min(sum(increase(nexusgate_completions_total[$__range])), 1) 正确地防止了在没有请求时除以零的情况。对于计数类指标使用下限 1 是合理的选择。


743-780: 新增 p75/p90 分位数指标,增强延迟监控粒度。

新增的百分位数查询结构与现有的 p50/p95/p99 保持一致,refId 命名有序(A→E),面板标题也已正确更新。


1373-1373: 速率类指标的除零保护实现正确。

对于 rate() 计算的指标使用 clamp_min(0.001) 是合理的,因为速率值可能非常小但不为零。这与 PR commit 中描述的修复一致。

Also applies to: 1461-1461


2871-2880: 趋势对比查询的除零保护一致性良好。

  • Request Change % 使用 clamp_min(..., 1) 保护计数类指标
  • Cost Change % 使用 clamp_min(..., 0.001) 保护金额类指标

保护阈值的选择逻辑清晰:计数至少为 1,成本金额可以很小但正数。

Also applies to: 2948-2957


3223-3223: Model Comparison 部分的除零保护实现正确。

  • Error Rate (bargauge): clamp_min(0.001) 用于速率
  • Cost per 1K Tokens: clamp_min(1) 用于 token 计数
  • Model Performance Matrix: clamp_min(0.001) 用于速率

Also applies to: 3283-3283, 3562-3562


2165-2178: 新增的 Cost Analysis、Trend Analysis 和 Model Comparison 部分提供了全面的成本和模型性能监控。

整体结构清晰,面板配置合理:

  • Cost Analysis:成本趋势、供应商/API Key 分布、详细成本明细表
  • Trend Analysis:同比趋势对比,使用 offset $__range 实现周期比较
  • Model Comparison:多维度模型对比(延迟、错误率、成本效率)

这些新增内容与 PR 目标一致,为 Grafana 仪表板嵌入功能提供了丰富的监控视图。

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

Comment on lines +2526 to +2558
"transformations": [
{
"id": "joinByField",
"options": {
"byField": "model",
"mode": "outer"
}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Time 1": true,
"Time 2": true,
"Time 3": true,
"api_key_comment 2": true,
"api_key_comment 3": true,
"provider 2": true,
"provider 3": true
},
"indexByName": {},
"renameByName": {
"Value #Completion": "Completion Cost",
"Value #Prompt": "Prompt Cost",
"Value #Total": "Total Cost",
"api_key_comment 1": "API Key",
"model": "Model",
"provider 1": "Provider"
}
}
}
],
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Cost Breakdown 表格的 join 转换可能导致数据丢失。

当同一 model 存在多个 providerapi_key_comment 组合时,joinByField 使用 model 作为连接字段会导致数据行被覆盖或丢失。建议改用复合键(如 model + provider + api_key_comment)或使用 seriesToRows 转换。

不过,如果当前业务场景中每个 model 只会关联一个 provider 和 api_key,则此实现可以正常工作。

🤖 Prompt for AI Agents
In `@grafana/provisioning/dashboards/json/nexusgate-dashboard.json` around lines
2526 - 2558, The joinByField transformation currently uses "model" as the join
key (id: "joinByField", options.byField: "model") which can drop rows when a
model has multiple provider or api_key_comment combinations; update the
transformations to either use a composite key (concatenate
model+provider+api_key_comment as the join field) or replace the joinByField
step with a seriesToRows transformation so each provider/api_key_comment pair is
preserved before the organize transformation (id: "organize") and renames.

pescn and others added 2 commits January 31, 2026 18:35
…irect

Wrap dashboards array derivation with useMemo to prevent unnecessary
useEffect re-runs caused by new array references on each render.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Retain Grafana conditional wrapper and ViewModeToggle import
from feature branch while incorporating main's formatting changes.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@pescn pescn merged commit f823a58 into main Jan 31, 2026
2 checks passed
@pescn pescn linked an issue Jan 31, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Advanced Analytics

1 participant