Skip to content

feat: Alert System #55

@pescn

Description

@pescn

Summary

Implement a comprehensive alert system with multiple channels and alert types.

Requirements

Alert Channels

Channel Configuration
Webhook URL, Headers, Secret
Email (SMTP) Host, Port, User, Password, From
Feishu (飞书) Webhook URL, Secret

Alert Types

  • Budget Alert: Triggered when thresholds are reached
  • Error Rate Alert: Error rate exceeds threshold
  • Latency Alert: P95 latency exceeds threshold
  • Quota Alert: RPM/TPM approaching limit

Alert Debounce

  • Same alert type + same target: Minimum interval 1 hour (configurable)
  • Alert aggregation: Multiple triggers in short time merged into one

Database Design

CREATE TABLE alert_channels (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  type VARCHAR(20) NOT NULL,  -- 'webhook', 'email', 'feishu'
  config JSONB NOT NULL,
  enabled BOOLEAN DEFAULT true,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE alert_rules (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  type VARCHAR(50) NOT NULL,   -- 'budget', 'error_rate', 'latency'
  condition JSONB NOT NULL,
  channel_ids INTEGER[],
  cooldown_minutes INTEGER DEFAULT 60,
  enabled BOOLEAN DEFAULT true
);

CREATE TABLE alert_history (
  id SERIAL PRIMARY KEY,
  rule_id INTEGER REFERENCES alert_rules(id),
  triggered_at TIMESTAMP DEFAULT NOW(),
  payload JSONB,
  status VARCHAR(20)           -- 'sent', 'failed', 'suppressed'
);

概要

实现支持多渠道和多告警类型的综合告警系统。

需求

告警渠道

渠道 配置项
Webhook URL, Headers, Secret
邮件 (SMTP) Host, Port, User, Password, From
飞书 Webhook URL, Secret

告警类型

  • 预算告警:达到阈值时触发
  • 错误率告警:错误率超过阈值
  • 延迟告警:P95 延迟超过阈值
  • 配额告警:RPM/TPM 接近限制

告警防抖

  • 同一告警类型 + 同一目标:最小间隔 1 小时(可配置)
  • 告警聚合:短时间内多次触发合并为一条

数据库设计

CREATE TABLE alert_channels (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  type VARCHAR(20) NOT NULL,  -- 'webhook', 'email', 'feishu'
  config JSONB NOT NULL,
  enabled BOOLEAN DEFAULT true,
  created_at TIMESTAMP DEFAULT NOW()
);

CREATE TABLE alert_rules (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  type VARCHAR(50) NOT NULL,   -- 'budget', 'error_rate', 'latency'
  condition JSONB NOT NULL,
  channel_ids INTEGER[],
  cooldown_minutes INTEGER DEFAULT 60,
  enabled BOOLEAN DEFAULT true
);

CREATE TABLE alert_history (
  id SERIAL PRIMARY KEY,
  rule_id INTEGER REFERENCES alert_rules(id),
  triggered_at TIMESTAMP DEFAULT NOW(),
  payload JSONB,
  status VARCHAR(20)           -- 'sent', 'failed', 'suppressed'
);

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions