From 00d42778d7844bc6141462cf5dbe9397178ac1b7 Mon Sep 17 00:00:00 2001 From: Argus Date: Sat, 11 Apr 2026 22:40:14 +0200 Subject: [PATCH 1/2] docs: update alerting docs (AlertManager) and add key rotation API --- docs/api-reference.md | 8 ++++++++ docs/enterprise.md | 23 ++++++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/docs/api-reference.md b/docs/api-reference.md index 255c2803..0f8a9827 100644 --- a/docs/api-reference.md +++ b/docs/api-reference.md @@ -316,6 +316,14 @@ curl http://localhost:9100/v1/auth/keys curl -X DELETE http://localhost:9100/v1/auth/keys/key-abc123 ``` +### Rotate API Key + +```bash +curl -X POST http://localhost:9100/v1/auth/keys/key-abc123/rotate -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" -H "Content-Type: application/json" -d '{"expiresAt":"2025-12-31T23:59:59Z"}' +``` + +Rotates an API key. Admin-only. Optionally set a new expiry date. Returns the updated key metadata. + ### Create SSE Token ```bash diff --git a/docs/enterprise.md b/docs/enterprise.md index 8b71117f..5e5fc02c 100644 --- a/docs/enterprise.md +++ b/docs/enterprise.md @@ -437,8 +437,29 @@ Returns aggregate health of all active sessions including stalled and idle detec --- ### Production Alerting -Today, Aegis has no built-in alerting system (issue #1418). Until that ships, you can build alerting on top of the existing observability endpoints. +Aegis includes an **AlertManager** that tracks failure events and fires webhook +notifications when configurable thresholds are exceeded. +#### Alert Endpoints + +```bash +# Test webhook configuration +curl -X POST http://localhost:9100/v1/alerts/test \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"webhookUrl":"https://example.com/alerts","secret":"test-secret"}' +# Get alert statistics +curl http://localhost:9100/v1/alerts/stats \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" +``` +**AlertManager monitors:** +- Session failures (crashes, unexpected exits) +- Dead sessions (tmux process gone) +- Tmux crashes + +Configure via `config.yaml` or environment variables. Webhook payloads include severity, event type, session ID, and timestamp. + +**Recommended external alerting setup:** **Recommended alerting setup:** ```bash From 16e7622104492ea4b6e98de88661736dae7dbc58 Mon Sep 17 00:00:00 2001 From: OneStepAt4time Date: Sun, 12 Apr 2026 00:46:17 +0200 Subject: [PATCH 2/2] fix: correct AlertManager and key rotation API documentation - Fix API key rotation parameter from expiresAt (ISO timestamp) to ttlDays (integer) - Remove false request body parameters from /v1/alerts/test endpoint - Add authorization requirements for both alert endpoints - Add missing api_error_rate alert type to monitoring list - Add environment variable and config.yaml configuration examples - Format cURL examples with proper line breaks and comments --- docs/api-reference.md | 7 +++++-- docs/enterprise.md | 37 ++++++++++++++++++++++++++++++------- 2 files changed, 35 insertions(+), 9 deletions(-) diff --git a/docs/api-reference.md b/docs/api-reference.md index 0f8a9827..0f105bec 100644 --- a/docs/api-reference.md +++ b/docs/api-reference.md @@ -319,10 +319,13 @@ curl -X DELETE http://localhost:9100/v1/auth/keys/key-abc123 ### Rotate API Key ```bash -curl -X POST http://localhost:9100/v1/auth/keys/key-abc123/rotate -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" -H "Content-Type: application/json" -d '{"expiresAt":"2025-12-31T23:59:59Z"}' +curl -X POST http://localhost:9100/v1/auth/keys/key-abc123/rotate \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \ + -H "Content-Type: application/json" \ + -d '{"ttlDays": 365}' ``` -Rotates an API key. Admin-only. Optionally set a new expiry date. Returns the updated key metadata. +Rotates an API key. Admin-only. Optionally set TTL in days. Returns the updated key metadata. ### Create SSE Token diff --git a/docs/enterprise.md b/docs/enterprise.md index 5e5fc02c..6b01bbb9 100644 --- a/docs/enterprise.md +++ b/docs/enterprise.md @@ -443,21 +443,44 @@ notifications when configurable thresholds are exceeded. #### Alert Endpoints ```bash -# Test webhook configuration +# Test webhook configuration (admin/operator only) curl -X POST http://localhost:9100/v1/alerts/test \ - -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \ - -H "Content-Type: application/json" \ - -d '{"webhookUrl":"https://example.com/alerts","secret":"test-secret"}' -# Get alert statistics -curl http://localhost:9100/v1/alerts/stats \ + -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" + +# Get alert statistics (admin/operator/viewer) +curl -X GET http://localhost:9100/v1/alerts/stats \ -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" ``` + **AlertManager monitors:** - Session failures (crashes, unexpected exits) - Dead sessions (tmux process gone) - Tmux crashes +- API error rate threshold breaches + +**Authorization Requirements:** +- `POST /v1/alerts/test` — requires `admin` or `operator` role +- `GET /v1/alerts/stats` — requires `admin`, `operator`, or `viewer` role + +**Configuration:** + +Set webhook URLs via environment variable: +```bash +export AEGIS_ALERT_WEBHOOKS="https://example.com/alerts,https://backup.com/alerts" +export AEGIS_ALERT_FAILURE_THRESHOLD=5 +export AEGIS_ALERT_COOLDOWN_MS=600000 +``` + +Or via config.yaml: +```yaml +alerting: + webhooks: + - https://example.com/alerts + failureThreshold: 5 + cooldownMs: 600000 +``` -Configure via `config.yaml` or environment variables. Webhook payloads include severity, event type, session ID, and timestamp. +Webhook payloads include severity, event type, session ID, and timestamp. **Recommended external alerting setup:** **Recommended alerting setup:**