diff --git a/docs/api-reference.md b/docs/api-reference.md index 255c2803..0f105bec 100644 --- a/docs/api-reference.md +++ b/docs/api-reference.md @@ -316,6 +316,17 @@ curl http://localhost:9100/v1/auth/keys curl -X DELETE http://localhost:9100/v1/auth/keys/key-abc123 ``` +### Rotate API Key + +```bash +curl -X POST http://localhost:9100/v1/auth/keys/key-abc123/rotate \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"ttlDays": 365}' +``` + +Rotates an API key. Admin-only. Optionally set TTL in days. Returns the updated key metadata. + ### Create SSE Token ```bash diff --git a/docs/enterprise.md b/docs/enterprise.md index 8b71117f..6b01bbb9 100644 --- a/docs/enterprise.md +++ b/docs/enterprise.md @@ -437,8 +437,52 @@ Returns aggregate health of all active sessions including stalled and idle detec --- ### Production Alerting -Today, Aegis has no built-in alerting system (issue #1418). Until that ships, you can build alerting on top of the existing observability endpoints. +Aegis includes an **AlertManager** that tracks failure events and fires webhook +notifications when configurable thresholds are exceeded. +#### Alert Endpoints + +```bash +# Test webhook configuration (admin/operator only) +curl -X POST http://localhost:9100/v1/alerts/test \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" + +# Get alert statistics (admin/operator/viewer) +curl -X GET http://localhost:9100/v1/alerts/stats \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" +``` + +**AlertManager monitors:** +- Session failures (crashes, unexpected exits) +- Dead sessions (tmux process gone) +- Tmux crashes +- API error rate threshold breaches + +**Authorization Requirements:** +- `POST /v1/alerts/test` — requires `admin` or `operator` role +- `GET /v1/alerts/stats` — requires `admin`, `operator`, or `viewer` role + +**Configuration:** + +Set webhook URLs via environment variable: +```bash +export AEGIS_ALERT_WEBHOOKS="https://example.com/alerts,https://backup.com/alerts" +export AEGIS_ALERT_FAILURE_THRESHOLD=5 +export AEGIS_ALERT_COOLDOWN_MS=600000 +``` + +Or via config.yaml: +```yaml +alerting: + webhooks: + - https://example.com/alerts + failureThreshold: 5 + cooldownMs: 600000 +``` + +Webhook payloads include severity, event type, session ID, and timestamp. + +**Recommended external alerting setup:** **Recommended alerting setup:** ```bash