feat: add Grafana integration and alert system#67
Conversation
Add a complete alert system with notification channels (webhook, email, Feishu), configurable alert rules (budget, error rate, latency, quota), and alert history tracking. Add Grafana integration for syncing alert rules as Prometheus-based Grafana alerts via the Provisioning API. When Grafana is connected, the built-in alert engine defers to Grafana for evaluation. Restructure frontend navigation: model configuration moves to /models (Providers + Registry sub-nav), system settings moves to /settings (Alerts + Grafana sub-nav) as separate sidebar items. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
📝 WalkthroughWalkthrough新增完整告警子系统:数据库枚举与表、Drizzle schema 与快照、后端 CRUD/API、告警引擎与多通道分发(Webhook/Email/Feishu)、Grafana 同步客户端与服务,以及对应前端页面、路由、hooks 与国际化条目。 Changes
Sequence Diagram(s)sequenceDiagram
participant Engine as AlertEngine
participant DB as Database
participant Redis as Redis
participant Dispatcher as AlertDispatcher
participant GrafanaSvc as GrafanaSync
participant HTTP as External HTTP / SMTP
loop Every 60s
Engine->>DB: fetch enabled alert rules
DB-->>Engine: rules list
loop per rule
Engine->>Redis: check/set cooldown(ruleId)
Redis-->>Engine: cooldown status
Engine->>DB: query metrics / evaluate condition
DB-->>Engine: metrics/current value
alt triggered & not cooled down
Engine->>DB: fetch associated channels
DB-->>Engine: channel configs
loop per channel
Engine->>Dispatcher: dispatchToChannel(type, config, payload)
alt Grafana-managed
Dispatcher->>GrafanaSvc: skip local dispatch (Grafana manages)
GrafanaSvc-->>Dispatcher: acknowledged
else Local dispatch
Dispatcher->>HTTP: send webhook/email/feishu
HTTP-->>Dispatcher: response
end
Dispatcher-->>Engine: result(success/failure)
end
Engine->>DB: insert alert_history(record)
end
end
end
sequenceDiagram
participant AdminAPI as Admin API
participant Sync as GrafanaSync
participant DB as Database
participant Client as GrafanaClient
participant Grafana as Grafana API
AdminAPI->>Sync: syncAllToGrafana()
Sync->>DB: fetch enabled rules & channels
DB-->>Sync: items
loop per item
Sync->>Client: build payload
Client->>Grafana: create/update resource
Grafana-->>Client: response(uid/status)
Client-->>Sync: result(uid)
Sync->>DB: update grafana sync fields(uid, timestamp, error?)
end
Sync-->>AdminAPI: SyncResult
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @pescn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the platform's operational monitoring capabilities by introducing a complete alert system and deep integration with Grafana. It empowers administrators to proactively manage system health and performance through configurable alerts and leverages Grafana's robust features for advanced visualization and alerting, streamlining the overall management experience. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a comprehensive alert system and Grafana integration, which is a significant and well-executed feature addition. The backend implementation is robust, with new database schemas, APIs, and services for alerting and Grafana synchronization. The frontend is also updated with new pages and restructured navigation to accommodate these features. My review focuses on improving data integrity in the database schema, enhancing API validation for better type safety, optimizing performance in the alert evaluation engine, and improving TypeScript type usage on the frontend. Overall, this is a great contribution.
There was a problem hiding this comment.
Actionable comments posted: 11
🤖 Fix all issues with AI agents
In `@backend/drizzle/0013_flowery_maria_hill.sql`:
- Around line 14-35: The foreign key constraint
alert_history_rule_id_alert_rules_id_fk on table alert_history currently uses ON
DELETE no action which blocks deleting alert_rules rows; update the ALTER TABLE
statement that defines alert_history_rule_id_alert_rules_id_fk to either use ON
DELETE CASCADE to cascade deletes to alert_history, or alter
alert_history.rule_id to be nullable and change the foreign key to ON DELETE SET
NULL so histories are retained but dissociated from deleted alert_rules.
In `@backend/package.json`:
- Around line 28-34: When creating the nodemailer transport
(nodemailer.createTransport / transporter), explicitly set tls.servername to the
SMTP host to enable SNI under Bun (e.g., set tls.servername = the same host used
in host field) so Bun's tls.connect will present the correct SNI; if STARTTLS
(port 587) still fails, try using SMTPS (port 465) as an alternative.
In `@backend/src/api/admin/grafana.ts`:
- Around line 96-167: The Grafana connection test lacks request timeouts and
when it fails the old datasourceUid from config can be retained; update the
logic in the POST "/connection/test" handler (referencing getGrafanaConnection,
the fetch calls, and upsertSetting with GRAFANA_CONNECTION_KEY) to use
AbortController-based timeouts for the health and datasources fetches (abort
after a reasonable timeout) and ensure in the failure branch you explicitly
clear datasourceUid (e.g., set to undefined/null) when upserting verified:
false/verifiedAt: null so stale datasourceUid is not preserved.
In `@backend/src/services/alertDispatcher.ts`:
- Around line 160-172: 在 dispatchToChannel 中目前缺少 default 分支,导致传入未知 channelType
时静默返回,增加排查成本;请在 switch (channelType) 的末尾添加一个 default 分支(或在 switch 之后)抛出明确的错误,例如
throw new Error(`Unsupported channelType: ${channelType}`),以便在 dispatchToChannel
调用者能立即感知并定位问题;引用符号:dispatchToChannel, channelType, AlertChannelTypeEnumType。
- Around line 42-66: Replace the raw fetch calls with the project's
timeout-aware helper: import fetchWithTimeout from backend/src/services/failover
and in both dispatchWebhook and dispatchFeishu use fetchWithTimeout instead of
fetch, passing the same request options (method, headers, body) plus a timeout
(use a configured value if present on the channel config, otherwise a sensible
default like 10000 ms), keep the same response.ok/error handling and HMAC header
logic in dispatchWebhook.
In `@backend/src/services/alertEngine.ts`:
- Around line 104-152: In evaluateQuota, avoid calling listApiKeys() twice by
fetching the API keys once and reusing them: call listApiKeys() at the start of
evaluateQuota (store result in a local variable) and use that variable both when
condition.apiKeyId is present (to find the single apiKey) and in the "check all
active API keys" branch; update references to apiKeys and remove the second
listApiKeys() call so the function uses the cached result throughout.
In `@backend/src/services/grafanaSync.ts`:
- Around line 91-101: The PromQL in the "latency" case produces invalid
selectors like `{,model="xxx"}` because modelFilter is prefixed with a comma;
update the modelFilter construction in the latency branch (where `const c =
rule.condition as LatencyCondition` and the returned `expr` is built) to follow
the same trailing-comma style used in the `error_rate` implementation (e.g.,
`model="${c.model}",` when c.model exists) so the selector becomes
`{${modelFilter}}` and avoids a leading comma; ensure threshold/forDuration
logic remains unchanged.
In `@backend/src/utils/grafanaClient.ts`:
- Around line 61-88: The request<T> method currently uses fetch without a
cancel/timeout mechanism; add an AbortController-based timeout and allow callers
to provide their own signal: create a default timeout (e.g. DEFAULT_TIMEOUT_MS),
inside request<T> create an AbortController, if options.signal is provided wire
it so that caller signal aborts the controller, set a timer that calls
controller.abort() after the timeout, pass controller.signal to fetch, and clear
the timer after fetch completes; ensure the Authorization/headers merge remains
and surface fetch abort errors as normal.
In `@frontend/src/pages/settings/alerts-settings-page.tsx`:
- Around line 1005-1028: The UI for the FormField named "channelIds" currently
renders a single-select Select (in alerts-settings-page.tsx) while the
backend/schema expects a comma-separated string (z.string().min(1)) that is
later split in createMutation; either update the UI to allow multiple selections
(replace the single Select with a multi-select component or checkbox list that
stores a comma-separated string in form.control.field.value) or simplify the
schema/type to a single selection (change the schema from z.string() to a
single-valued type and remove the split logic in createMutation); locate and
modify the FormField render block for "channelIds" and correspondingly update
the schema definition and createMutation handling so the UI and data model
match.
In `@frontend/src/pages/settings/grafana-settings-page.tsx`:
- Around line 199-207: The Test button uses the wrong i18n key
(t('pages.settings.alerts.grafana.Syncing')) causing the raw key to render; in
the GrafanaSettingsPage component update the translation call used in the Button
with isTesting (the onClick that calls testMutation.mutate()) to the consistent
namespace t('pages.settings.grafana.Syncing') so it matches the other keys
(e.g., t('pages.settings.grafana.TestConnection')) and renders the correct
localized label.
In `@frontend/src/routes/settings/grafana.tsx`:
- Around line 12-20: The local grafanaConnectionQueryOptions duplicate should be
removed and the shared implementation from use-settings should be imported and
used instead; replace the local definition of grafanaConnectionQueryOptions in
grafana.tsx with an import of the same-named export from the use-settings
module, ensure the import name matches (grafanaConnectionQueryOptions) and that
any surrounding references (e.g., query usage or types like
GrafanaConnectionResponse) remain unchanged so behavior and typing are
preserved.
🧹 Nitpick comments (13)
frontend/src/hooks/use-settings.ts (1)
70-79: 添加retry: false以与同文件其他 Grafana 查询保持一致。
grafanaSyncStatusQueryOptions当前缺少显式重试配置,会使用 React Query 默认的 3 次重试策略。建议禁用重试,避免 Grafana 后端异常时产生不必要的额外请求,与grafanaConnectionQueryOptions和dashboardsQueryOptions的配置保持一致。♻️ 建议调整
export const grafanaSyncStatusQueryOptions = () => queryOptions({ queryKey: ['grafanaSyncStatus'], queryFn: async () => { const { data, error } = await api.admin.grafana.sync.status.get() if (error) return null return data }, staleTime: 30 * 1000, + retry: false, })frontend/src/routes/models/route.tsx (1)
74-80: 屏幕阅读器文本应使用 i18n 国际化
sr-only元素和SheetHeader中的字符串是硬编码的英文,应该使用 i18n 翻译以保持一致性。♻️ 建议的修复
<Button variant="ghost" size="icon" className="h-7 w-7"> <MenuIcon className="size-4" /> - <span className="sr-only">Toggle Models Menu</span> + <span className="sr-only">{t('routes.models.ToggleMenu')}</span> </Button> </SheetTrigger> <SheetContent side="left" className="w-56 p-0"> <SheetHeader className="sr-only"> - <SheetTitle>Models Navigation</SheetTitle> - <SheetDescription>Navigate between model pages</SheetDescription> + <SheetTitle>{t('routes.models.Navigation')}</SheetTitle> + <SheetDescription>{t('routes.models.NavigationDescription')}</SheetDescription> </SheetHeader>backend/drizzle/meta/0013_snapshot.json (1)
77-152: 考虑为alert_history表添加索引
alert_history表目前没有定义索引。对于以下常见查询场景,建议在 schema 定义中添加索引:
rule_id上的索引:用于按规则查询历史记录triggered_at上的索引:用于按时间范围查询历史记录由于这是自动生成的快照文件,实际更改应在
backend/src/db/schema.ts中进行。frontend/src/routes/settings/alerts.tsx (1)
64-74: 避免使用any[]类型转换使用
eslint-disable禁用@typescript-eslint/no-explicit-any并强制转换为any[]会损失类型安全性。建议为 API 响应定义正确的类型,或从 API 客户端推断类型。♻️ 建议的改进方向
// 定义或导入正确的类型 import type { AlertChannel, AlertRule, AlertHistory } from '@/types/alerts' // 在 RouteComponent 中使用正确的类型 return ( <AlertsSettingsPage channels={channels as AlertChannel[]} rules={rules as AlertRule[]} history={history as AlertHistory[]} grafanaConnected={grafanaConnected} grafanaApiUrl={grafanaApiUrl} /> )backend/src/services/alertEngine.ts (2)
155-187: 循环中的顺序await可能影响性能当检查所有 API keys 的配额时,每个 key 都是顺序等待
getRateLimitStatus。如果有大量 API keys,这可能导致评估延迟。考虑使用
Promise.all或Promise.allSettled进行并行处理。♻️ 建议的并行处理方案
// 并行获取所有 key 的状态 const statuses = await Promise.all( apiKeys.map(async (key) => ({ key, status: await getRateLimitStatus(key.id, { rpmLimit: key.rpmLimit, tpmLimit: key.tpmLimit, }), })) ); let maxUsagePercent = 0; for (const { status } of statuses) { // 计算 usagePercent... }
363-365:void evaluateAlerts()可能导致未处理的 Promise 拒绝使用
void忽略 Promise 返回值时,如果evaluateAlerts()内部抛出未捕获的异常(尽管当前有 try-catch),可能不会被正确处理。建议添加
.catch()处理或确保所有异常都在函数内部被捕获。♻️ 建议的改进
intervalId = setInterval(() => { - void evaluateAlerts(); + evaluateAlerts().catch((error) => { + logger.error("Uncaught error in alert evaluation", { + error: error instanceof Error ? error.message : String(error), + }); + }); }, ALERT_CHECK_INTERVAL_MS);backend/src/api/admin/alerts.ts (2)
25-159: 建议对 config/condition 做类型化校验,避免写入无效结构。
目前使用 Unknown 直接入库,后续调度/同步阶段更容易失败且难排查;建议依据 channel/rule 的 type 做结构校验或在写入前做显式验证。Also applies to: 163-287
291-309: 历史查询建议限制 offset/limit 范围。
负数或过大 limit 会带来性能风险,可在处理时做下限/上限约束。🧩 示例约束方式
- const offset = query.offset ?? 0; - const limit = query.limit ?? 50; + const offset = Math.max(0, query.offset ?? 0); + const limit = Math.min(200, Math.max(1, query.limit ?? 50));backend/drizzle/0013_flowery_maria_hill.sql (1)
14-33: 建议为历史查询增加索引以支撑分页/过滤。
按 rule_id 与触发时间过滤/排序的历史查询会受益于索引。📈 示例索引
CREATE INDEX alert_history_rule_id_idx ON alert_history (rule_id); CREATE INDEX alert_history_triggered_at_idx ON alert_history (triggered_at DESC);backend/src/services/grafanaSync.ts (1)
242-291: 建议:增加对已删除 Grafana 规则的清理逻辑当前
syncRulesToGrafana只同步启用的规则(enabledRules),但如果某个规则被禁用或删除后,其对应的 Grafana 规则不会被移除,可能导致 Grafana 中存在孤立的告警规则。需要我帮助实现一个清理逻辑,删除 Grafana 中不再对应本地规则的孤立告警吗?
frontend/src/pages/settings/alerts-settings-page.tsx (2)
399-402: 类型断言可以改进使用
as any进行类型断言会丢失类型安全性。考虑为syncStatus定义明确的接口类型。💡 建议定义明确的 SyncStatus 响应类型
interface GrafanaSyncStatusResponse { channels: SyncStatusItem[] rules: SyncStatusItem[] } const getChannelSyncStatus = (id: number): SyncStatusItem | undefined => { return (syncStatus as GrafanaSyncStatusResponse | undefined)?.channels?.find( (c) => c.id === id ) }
97-113: 表单验证 schema 未对必填字段进行条件校验当前
channelSchema将所有类型特定字段(webhook、email、feishu)都标记为optional(),这意味着用户可以提交一个没有任何配置的 webhook 渠道。建议使用 zod 的
discriminatedUnion或superRefine添加条件验证,确保根据选择的type验证相应的必填字段。♻️ 使用 superRefine 添加条件验证示例
const channelSchema = z.object({ name: z.string().min(1).max(100), type: z.enum(CHANNEL_TYPES), webhookUrl: z.string().optional(), webhookSecret: z.string().optional(), // ... other fields }).superRefine((data, ctx) => { if (data.type === 'webhook' && !data.webhookUrl) { ctx.addIssue({ code: z.ZodIssueCode.custom, message: 'Webhook URL is required', path: ['webhookUrl'], }) } if (data.type === 'email' && !data.emailTo) { ctx.addIssue({ code: z.ZodIssueCode.custom, message: 'Email recipients are required', path: ['emailTo'], }) } // ... feishu validation })backend/src/db/index.ts (1)
1901-1927: Grafana 同步辅助函数缺少返回值确认
updateAlertRuleGrafanaSync和updateAlertChannelGrafanaSync返回Promise<void>,没有返回更新后的记录或受影响的行数。调用方无法确认更新是否成功。虽然在当前使用场景(
grafanaSync.ts)中这不会造成问题,但考虑返回更新结果会更健壮。💡 建议返回更新后的记录
export async function updateAlertRuleGrafanaSync( id: number, fields: { grafanaUid?: string | null; grafanaSyncedAt?: Date | null; grafanaSyncError?: string | null; }, -): Promise<void> { - await db +): Promise<AlertRule | null> { + const r = await db .update(schema.AlertRulesTable) .set(fields) - .where(eq(schema.AlertRulesTable.id, id)); + .where(eq(schema.AlertRulesTable.id, id)) + .returning(); + const [first] = r; + return first ?? null; }
There was a problem hiding this comment.
Pull request overview
Adds an end-to-end alerting system (channels/rules/history + engine) and integrates with Grafana (connection + syncing alerts/contact points), while restructuring the frontend navigation to separate Models from Settings.
Changes:
- Introduces alert channel/rule/history APIs, DB schema/migrations, and a periodic alert evaluation/dispatch engine.
- Adds Grafana connection management plus sync services/APIs to provision Grafana alert rules/contact points.
- Restructures frontend routes/navigation and adds new Settings pages for Alerts and Grafana.
Reviewed changes
Copilot reviewed 34 out of 35 changed files in this pull request and generated 12 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/src/routes/settings/route.tsx | Updates Settings sub-nav to Alerts/Grafana. |
| frontend/src/routes/settings/index.tsx | Redirects /settings/ to /settings/alerts. |
| frontend/src/routes/settings/grafana.tsx | Adds Grafana settings route with data preloading. |
| frontend/src/routes/settings/alerts.tsx | Adds Alerts settings route with data preloading. |
| frontend/src/routes/models/route.tsx | Adds /models layout route with Providers/Registry sub-nav. |
| frontend/src/routes/models/index.tsx | Redirects /models/ to /models/providers. |
| frontend/src/routes/models/providers.tsx | Moves providers route from /settings/providers to /models/providers. |
| frontend/src/routes/models/registry.tsx | Moves registry route from /settings/models to /models/registry. |
| frontend/src/routeTree.gen.ts | Regenerates route tree for new/relocated routes. |
| frontend/src/pages/settings/grafana-settings-page.tsx | Adds Grafana connection + dashboard embed management UI. |
| frontend/src/pages/settings/alerts-settings-page.tsx | Adds Alerts UI (channels/rules/history + Grafana sync status). |
| frontend/src/i18n/locales/en-US.json | Adds/updates strings for new navigation and pages. |
| frontend/src/i18n/locales/zh-CN.json | Adds/updates strings for new navigation and pages. |
| frontend/src/hooks/use-settings.ts | Adds Grafana connection + sync status query helpers/types. |
| frontend/src/hooks/use-copy.tsx | Fixes hook dependency list to include t. |
| frontend/src/components/ui/alert.tsx | Adds new Alert UI primitive. |
| frontend/src/components/app/app-sidebar.tsx | Splits sidebar into separate Models and Settings entries. |
| bun.lock | Adds nodemailer + types to lockfile. |
| backend/src/utils/grafanaClient.ts | Adds Grafana Provisioning API client wrapper. |
| backend/src/services/grafanaSync.ts | Implements Grafana sync logic + PromQL mapping for rule types. |
| backend/src/services/alertEngine.ts | Adds periodic alert evaluation engine + cooldown + history. |
| backend/src/services/alertDispatcher.ts | Adds dispatchers for webhook/email/Feishu (nodemailer). |
| backend/src/index.ts | Starts the alert engine at server startup. |
| backend/src/db/schema.ts | Adds alert-related enums/types/tables + Grafana sync columns. |
| backend/src/db/index.ts | Adds alert CRUD, history queries, and aggregation helpers. |
| backend/src/api/admin/index.ts | Wires new admin routes for alerts and Grafana. |
| backend/src/api/admin/grafana.ts | Adds Grafana connection/test/sync/status endpoints. |
| backend/src/api/admin/alerts.ts | Adds alert channels/rules/history endpoints. |
| backend/src/adapters/upstream/anthropic.ts | Modifies Anthropic request build logic (tool_choice). |
| backend/package.json | Adds nodemailer + types dependencies. |
| backend/drizzle/meta/_journal.json | Records new drizzle migrations. |
| backend/drizzle/meta/0013_snapshot.json | Adds snapshot for initial alert tables/enums migration. |
| backend/drizzle/meta/0014_snapshot.json | Adds snapshot for Grafana sync columns migration. |
| backend/drizzle/0013_flowery_maria_hill.sql | Creates alert enums/tables + FK. |
| backend/drizzle/0014_opposite_dragon_man.sql | Adds Grafana sync columns to alert tables. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Fix PromQL label selector syntax (no leading/trailing commas) - Add fetch timeouts (AbortSignal.timeout) to all external HTTP calls - Add ON DELETE CASCADE to alert_history FK referencing alert_rules - Deduplicate grafanaConnectionQueryOptions (reuse from use-settings hook) - Add default case to dispatchToChannel switch statement - Fix duplicate listApiKeys() call in evaluateQuota - Add missing i18n key pages.settings.grafana.Testing - Fix wrong i18n key reference in grafana-settings-page.tsx - Clear datasourceUid on connection test failure Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/src/services/alertDispatcher.ts`:
- Around line 86-95: The HTML email template in alertDispatcher.ts directly
injects unescaped payload fields (subject, html using payload.ruleName,
payload.ruleType, payload.message, payload.currentValue, payload.threshold, and
payload.details) creating an XSS risk; fix by creating and using an
HTML-escaping helper (e.g., escapeHtml) and apply it to every dynamic insertion
before building the html string (also escape the result of
JSON.stringify(payload.details) if present), or import a vetted escaper from a
utils module, ensuring all payload.* values are escaped in the template and the
subject is sanitized as well.
In `@backend/src/services/grafanaSync.ts`:
- Around line 110-125: The quota PromQL built in the quota branch of buildPromQL
(in grafanaSync.ts) doesn't filter by apiKeyId; modify the case "quota" code so
when rule.condition.apiKeyId is present you append a label matcher like
{apiKeyId="<value>"} to each metric identifier used
(nexusgate_api_key_rpm_usage, nexusgate_api_key_rpm_limit,
nexusgate_api_key_tpm_usage, nexusgate_api_key_tpm_limit), otherwise leave the
raw metric names unchanged; ensure the apiKeyId value is properly quoted/escaped
when inserted into the label matcher and keep the rest of the expression and
returned object (expr, threshold, forDuration) unchanged.
🧹 Nitpick comments (7)
backend/src/db/schema.ts (1)
35-38:AlertChannelConfig联合类型缺少类型区分字段。当前联合类型
AlertChannelConfig没有类型区分器(discriminator),在运行时从 JSON 反序列化时可能难以确定具体类型。建议在每个配置类型中添加type字段作为区分器,或依赖外部的AlertChannelTypeEnum进行类型判断。当前实现依赖
AlertChannelsTable.type字段进行区分,这是可行的设计选择。backend/src/services/alertDispatcher.ts (1)
77-84: 每次调用都创建新的 SMTP transport 可能影响性能。
dispatchEmail每次调用时都创建新的nodemailer.createTransport。对于高频告警场景,建议复用 transport 实例或使用连接池。不过对于当前告警系统的使用频率,这可能不是紧迫问题。♻️ 可选优化:transport 缓存
// 可以考虑使用 WeakMap 或简单缓存来复用 transport const transportCache = new Map<string, nodemailer.Transporter>(); function getOrCreateTransport(config: EmailChannelConfig) { const key = `${config.host}:${config.port}:${config.user}`; let transport = transportCache.get(key); if (!transport) { transport = createTransport({ ... }); transportCache.set(key, transport); } return transport; }backend/src/utils/grafanaClient.ts (1)
118-122: 列表方法未处理分页,大型 Grafana 实例可能返回不完整数据。
listAlertRules()和listContactPoints()直接返回 API 响应,未处理 Grafana API 的分页。对于告警规则和联系点数量较多的 Grafana 实例,可能无法获取完整列表。建议在后续迭代中考虑添加分页支持,或在文档中说明此限制。
Also applies to: 160-164
backend/src/services/grafanaSync.ts (1)
198-237:buildContactPoint缺少 default 分支。虽然 TypeScript 可以通过类型检查确保
channel.type覆盖所有枚举值,但添加 default 分支可以在运行时捕获意外情况,与dispatchToChannel保持一致。♻️ 建议添加 default 分支
case "feishu": { const c = channel.config as FeishuChannelConfig; return { name: `[NexusGate] ${channel.name}`, type: "webhook", settings: { url: c.webhookUrl, httpMethod: "POST", }, }; } + default: { + const _exhaustive: never = channel.type; + throw new Error(`Unsupported channel type: ${channel.type}`); + } }backend/src/api/admin/grafana.ts (1)
20-26:GrafanaConnection接口在多处重复定义。此接口在
backend/src/api/admin/grafana.ts和backend/src/services/grafanaSync.ts中都有定义。建议提取到共享位置(如@/db/schema或新建类型文件)以保持一致性。frontend/src/routes/settings/grafana.tsx (1)
12-20: 查询选项定义模式不一致。
dashboardsQueryOptions定义为函数() => queryOptions({...}),而导入的grafanaConnectionQueryOptions是直接的queryOptions({...})对象。这导致使用方式不一致:queryClient.ensureQueryData(grafanaConnectionQueryOptions) // 不带括号 queryClient.ensureQueryData(dashboardsQueryOptions()) // 带括号建议统一为相同模式以提高代码可读性。
♻️ 建议统一为直接对象模式
-const dashboardsQueryOptions = () => - queryOptions({ +const dashboardsQueryOptions = queryOptions({ queryKey: ['dashboards'], queryFn: async () => { const { data, error } = await api.admin.dashboards.get() if (error) throw formatError(error, i18n.t('pages.settings.grafana.FetchError')) return data as DashboardsResponse }, }) // 然后在使用处: -queryClient.ensureQueryData(dashboardsQueryOptions()) +queryClient.ensureQueryData(dashboardsQueryOptions)Also applies to: 24-26, 34-35
frontend/src/i18n/locales/en-US.json (1)
448-449: 键名与值的单复数不一致。键名
NotifyChannels是复数形式,但值"Notify Channel"是单数。如果 UI 支持选择多个通知渠道,建议改为"Notify Channels"以保持一致性。建议修改
- "pages.settings.alerts.NotifyChannels": "Notify Channel", + "pages.settings.alerts.NotifyChannels": "Notify Channels",
- Add escapeHtml() helper and apply to all dynamic fields in email template - Add apiKeyId label selector to quota PromQL when a specific key is configured Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/src/services/alertDispatcher.ts`:
- Around line 150-162: The Feishu webhook signature is computed incorrectly:
change the call in alertDispatcher where signature is built (currently const
signature = await computeHmacSha256(config.secret, stringToSign)) to pass the
HMAC key as stringToSign (timestamp + "\n" + secret) and the message as an empty
string, and update computeHmacSha256 to return a Base64-encoded HMAC (not hex)
so the returned value is the Base64 signature Feishu expects; ensure the symbol
names referenced are computeHmacSha256, stringToSign, timestamp, config.secret
and the assignment that adds { timestamp, sign: signature } to body remain
consistent.
In `@backend/src/services/grafanaSync.ts`:
- Around line 372-378: The isGrafanaConnected function currently returns true
when config.verified is true even if datasourceUid is missing, which can cause
the alert engine to skip evaluation while Grafana is unusable; change
isGrafanaConnected to mirror getGrafanaClient’s requirements by checking both
that config.verified is true and that config.datasourceUid (or the exact
property used in getGrafanaClient) is present/non-empty before returning true so
the connectivity check only succeeds when datasourceUid exists and Grafana is
truly reachable.
Per Feishu docs, the signature must use `timestamp\nsecret` as the HMAC key and sign an empty string, then Base64-encode the result. The previous implementation incorrectly used `secret` as key, `timestamp\nsecret` as data, and hex-encoded the output. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@backend/src/services/alertDispatcher.ts`:
- Around line 102-135: The email subject assembled in dispatchEmail (variable
subject using payload.ruleName and payload.ruleType) is vulnerable to header
injection because it may contain CR/LF; sanitize or validate those fields before
building subject by stripping or replacing CR and LF (e.g., remove \r and \n
from payload.ruleName and payload.ruleType or run the full subject through a
validation/sanitizer) and then use the sanitized values when assigning subject
and calling transport.sendMail; ensure the sanitization logic is applied to any
other user-provided pieces used in the subject.
- Around line 50-69: In computeFeishuSignature replace the browser-only btoa
usage with Node-compatible Base64 encoding: take the ArrayBuffer result
(signature), convert to a Uint8Array and call
Buffer.from(uint8Array).toString('base64') so the function works on Node.js
^12.22.0 and ^14.17.0; keep the existing TextEncoder/crypto.subtle importKey and
sign steps and only change the final encoding step to use
Buffer.from(...).toString('base64').
- Replace manual Array.from hex encoding with Buffer.from().toString("hex")
- Add SIGINT/SIGTERM handlers to stop alert engine and server on shutdown
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
/models(Providers + Registry), system settings at/settings(Alerts + Grafana) as separate sidebar itemsBackend changes
/api/admin/alerts) with channels, rules, history, and toggle endpoints/api/admin/grafana) with connection management, test, and sync endpointsFrontend changes
/settingsto/modelswith sub-nav (Providers, Registry)/settingswith sub-nav (Alerts, Grafana)Test plan
/settings/grafana, configure Grafana API URL + token, test connection/settings/alerts, create channels and alert rules/models/providersand/models/registryto verify model config pagesbun run check && bun run lintpass with 0 errors🤖 Generated with Claude Code
Summary by CodeRabbit
新功能
改进
其他
✏️ Tip: You can customize this high-level summary in your review settings.