Skip to content

docs: update alerting docs (AlertManager) and add key rotation API#1680

Merged
OneStepAt4time merged 2 commits intodevelopfrom
docs/v050-docs-update
Apr 11, 2026
Merged

docs: update alerting docs (AlertManager) and add key rotation API#1680
OneStepAt4time merged 2 commits intodevelopfrom
docs/v050-docs-update

Conversation

@OneStepAt4time
Copy link
Copy Markdown
Owner

Summary

Updates docs for v0.5.0-alpha:

docs/enterprise.md β€” AlertManager

  • Replaced outdated "no built-in alerting" statement with actual AlertManager docs
  • Added endpoint documentation
  • Added endpoint documentation
  • Documented what AlertManager monitors: session failures, dead sessions, tmux crashes

docs/api-reference.md β€” Key Rotation

  • Added endpoint (admin-only)
  • Documents optional parameter
  • Returns updated key metadata

Changes

  • 2 files changed, +30/-1 lines
  • No code examples changed
  • Version: 0.5.0-alpha

Checklist

  • AlertManager documented with actual endpoints
  • Key rotation endpoint added to API reference
  • Old "no built-in alerting" statement removed
  • Version: 0.5.0-alpha

Aegis version: 0.5.0-alpha
Milestone: Documentation
Assignee: Scribe

@OneStepAt4time OneStepAt4time self-assigned this Apr 11, 2026
@OneStepAt4time OneStepAt4time added the approved-minor-bump Approved for minor version bump (feat: PRs) label Apr 11, 2026
@OneStepAt4time
Copy link
Copy Markdown
Owner Author

πŸ“‹ Documentation Review: Critical Issues Found

πŸ”΄ CRITICAL ERRORS - Must Fix Before Merge

1. API Key Rotation Parameter Mismatch

Problem: Docs show incorrect parameter format

  • Docs claim: {"expiresAt":"2025-12-31T23:59:59Z"} (ISO timestamp)
  • Actual code: Expects ttlDays (positive integer)

Current cURL example (WRONG):

curl -X POST http://localhost:9100/v1/auth/keys/key-abc123/rotate \
  -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"expiresAt":"2025-12-31T23:59:59Z"}'

Should be:

curl -X POST http://localhost:9100/v1/auth/keys/key-abc123/rotate \
  -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"ttlDays": 365}'

Code reference: src/server.ts line 660-670 defines rotateKeySchema with ttlDays: z.number().int().positive().optional()


2. Alert Test Endpoint Request Body is Incorrect

Problem: Documentation falsely claims test endpoint accepts webhookUrl and secret parameters

  • Docs show: {"webhookUrl":"https://example.com/alerts","secret":"test-secret"}
  • Actual code: Takes NO request body β€” fires test alert to pre-configured webhooks only

Current cURL example (WRONG):

curl -X POST http://localhost:9100/v1/alerts/test \
  -H "Authorization: Bearer $AEGIS_AUTH_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"webhookUrl":"https://example.com/alerts","secret":"test-secret"}'

Should be:

curl -X POST http://localhost:9100/v1/alerts/test \
  -H "Authorization: Bearer $AEGIS_AUTH_TOKEN"

Code reference: src/server.ts line 563-572 shows no body schema validation, calls alertManager.fireTestAlert() which ignores request body


🟑 IMPORTANT GAPS - Should Fix

3. Missing Authorization Requirements Documentation

Add to AlertManager section:

  • POST /v1/alerts/test requires: admin or operator role
  • GET /v1/alerts/stats requires: admin, operator, or viewer role

Code reference: src/server.ts lines 563 and 576


4. Incomplete Alert Type Coverage

Current docs list: session failures, dead sessions, tmux crashes
Actual supported types: session_failure, tmux_crash, api_error_rate (missing from docs)

Add to "AlertManager monitors" list:

  • Session failures (crashes, unexpected exits)
  • Dead sessions (tmux process gone)
  • Tmux crashes
  • API error rate threshold breaches

Code reference: src/alerting.ts line 14 defines all three alert types


5. Missing Environment Variable Configuration Details

Current docs: "Configure via config.yaml or environment variables"
Should specify: The actual environment variable name and format

Add documentation for:

export AEGIS_ALERT_WEBHOOKS="https://example.com/alerts,https://backup.com/alerts"

Or via config.yaml:

alerting:
  webhooks:
    - https://example.com/alerts
  failureThreshold: 5
  cooldownMs: 600000

βœ… What's Correct

  • Endpoint paths are accurate
  • Admin-only role mention for key rotation is correct
  • Overall AlertManager documentation structure is good
  • Webhook payload description matches implementation

Summary

Status: β›” Needs Changes

  • 2 critical parameter mismatches (API rotation + alert test)
  • 3 documentation gaps (permissions, alert types, env vars)
  • Tests will fail if users copy-paste these examples

Requesting changes on:

  1. docs/api-reference.md β€” Fix rotation endpoint request format
  2. docs/enterprise.md β€” Fix test endpoint example + add permissions + add api_error_rate + add env var docs

@OneStepAt4time
Copy link
Copy Markdown
Owner Author

βœ… Documentation Issues Fixed

I've corrected all the issues identified in the review:

Fixed (2 Critical Issues):

  1. βœ… API Key Rotation Parameter β€” Changed from {"expiresAt":"2025-12-31T23:59:59Z"} to {"ttlDays": 365}
  2. βœ… Alert Test Endpoint β€” Removed false request body parameters (webhookUrl, secret)

Added (3 Documentation Gaps):

  1. βœ… Authorization Requirements β€” Documented role requirements for both endpoints
  2. βœ… API Error Rate Alert Type β€” Added to "AlertManager monitors" list
  3. βœ… Configuration Examples β€” Added env vars and config.yaml format

Status: All corrections pushed to branch. PR now contains accurate, working documentation.

Files modified:

  • docs/api-reference.md β€” API key rotation endpoint fixed
  • docs/enterprise.md β€” AlertManager documentation completed

Argus and others added 2 commits April 12, 2026 00:49
- Fix API key rotation parameter from expiresAt (ISO timestamp) to ttlDays (integer)
- Remove false request body parameters from /v1/alerts/test endpoint
- Add authorization requirements for both alert endpoints
- Add missing api_error_rate alert type to monitoring list
- Add environment variable and config.yaml configuration examples
- Format cURL examples with proper line breaks and comments
@OneStepAt4time OneStepAt4time force-pushed the docs/v050-docs-update branch from b0fa22c to 16e7622 Compare April 11, 2026 22:49
@OneStepAt4time
Copy link
Copy Markdown
Owner Author

βœ… PR Rebased and Ready to Merge

The PR has been rebased onto the latest develop branch to resolve staleness issues.

Status:

  • βœ… Rebased to latest develop (9f30530)
  • βœ… No merge conflicts
  • βœ… All documentation fixes preserved (2 commits)
  • βœ… CI checks running

Commits on this PR:

  1. 00d4277 - docs: update alerting docs (AlertManager) and add key rotation API
  2. 16e7622 - fix: correct AlertManager and key rotation API documentation

What was fixed:

  • API key rotation parameter corrected to ttlDays
  • Alert test endpoint body parameters removed
  • Authorization requirements documented
  • Missing alert type added
  • Configuration examples included

The PR is now mergeable and ready for approval.

@OneStepAt4time OneStepAt4time merged commit 0085f40 into develop Apr 11, 2026
10 checks passed
@OneStepAt4time OneStepAt4time deleted the docs/v050-docs-update branch April 11, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved-minor-bump Approved for minor version bump (feat: PRs)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant