Skip to content

feat(rook): document operational health probes#764

Merged
yacosta738 merged 6 commits into
mainfrom
feat/rook-operational-health-682
May 3, 2026
Merged

feat(rook): document operational health probes#764
yacosta738 merged 6 commits into
mainfrom
feat/rook-operational-health-682

Conversation

@yacosta738
Copy link
Copy Markdown
Contributor

Related Issues

Fixes #682


Summary

  • Formalizes Rook operational health semantics for /api/health/live and /api/health/ready in the gateway/admin API spec.
  • Adds focused coverage for liveness independence and readiness failures across critical startup dependencies.
  • Documents operator usage for supervised/containerized deployments and registers the new docs page.

Tested Information

  • cargo test health::tests from clients/rook
  • cargo test server::tests:: from clients/rook
  • cargo fmt --check from clients/rook
  • pnpm --dir clients/web/apps/docs build

Documentation Impact

  • Docs updated in:
    • openspec/specs/gateway/spec.md
    • clients/web/apps/docs/src/content/docs/rook/operational-health.md
    • clients/web/apps/docs/astro.config.mjs
  • I verified the documentation matches the current behavior with the docs build and focused Rook health tests.

Breaking Changes

None.


Checklist

  • I have checked that there isn’t already a PR solving the same problem.
  • I have read the Contributing Guidelines.
  • I ensured my code follows the project's style guidelines.
  • I have added or updated tests that prove my fix is effective or that my feature works.
  • I have updated the documentation, or I explained above why no documentation update is needed.
  • I verified the documentation matches the current behavior.
  • I have documented any breaking changes in the Breaking Changes section.
  • I have linked the related issue (if any).

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 3, 2026

Warning

Rate limit exceeded

@yacosta738 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 38 minutes and 13 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: ca578ca8-f4c8-47a4-8ec9-cd9416e0a190

📥 Commits

Reviewing files that changed from the base of the PR and between 4672c6b and fbc8026.

📒 Files selected for processing (3)
  • clients/agent-runtime/src/channels/telegram.rs
  • clients/web/apps/docs/src/content/docs/es/rook/operational-health.md
  • clients/web/apps/docs/src/content/docs/rook/operational-health.md
📝 Walkthrough

Walkthrough

This PR introduces distinct /api/health/live and /api/health/ready endpoints for Rook with liveness and readiness semantics. Readiness returns structured JSON with per-dependency checks and 503 when critical dependencies (config, database, router) are unavailable; liveness remains 200 OK when the process serves requests. Tests, formal specification, and operator documentation are added.

Changes

Health & Readiness Endpoints

Layer / File(s) Summary
Specification & Contracts
openspec/specs/gateway/spec.md
Formal spec defines /api/health/live (liveness-only, always 200 OK), /api/health/ready (readiness with per-check JSON, 503 on critical failure), and /api/health compatibility (plain text, unchanged). Critical dependencies: config, database, router; non-critical: assets.
Core Readiness Logic Tests
clients/rook/src/health.rs
New test cases verify StartupDependencyState::readiness() returns HealthStatus::Fail when config_ready or router_ready is false, validating checks.*.reason strings.
Endpoint Integration Tests
clients/rook/src/server/mod.rs
New test cases verify /api/health/live returns 200 OK regardless of dependency state, /api/health/ready returns 503 with aggregate "fail" for critical failures and 200 with "degraded" for asset-only failures, including per-check reason strings. Assets readiness flag is now asserted in compatibility test.
Operator Runbook
clients/web/apps/docs/src/content/docs/rook/operational-health.md
Comprehensive documentation of liveness/readiness endpoints, response formats, critical vs. non-critical checks, Kubernetes probe examples, and guidance to prefer /ready over legacy /health.
Documentation Navigation
clients/web/apps/docs/astro.config.mjs
Sidebar configuration adds new "Rook" section under "Agent Runtime" with "Operational Health" nav item.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Review focus: verify test assertions align with spec requirements (per-check reason strings, status codes, degraded vs. fail semantics); confirm documentation accurately reflects endpoint behavior and is operator-friendly; ensure spec precision on critical vs. non-critical dependency semantics.

Possibly related PRs

Suggested labels

area:rust, area:docs, area:web

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed Title follows Conventional Commits style with 'feat(rook):' prefix and clearly describes the main change: documenting operational health probes.
Description check ✅ Passed Description is complete with all required sections: Related Issues, Summary, Tested Information, Documentation Impact, Breaking Changes, and checked Checklist items.
Linked Issues check ✅ Passed All acceptance criteria from issue #682 are met: liveness/readiness are distinguishable [openspec/specs/gateway/spec.md], readiness fails on critical dependency unavailability [clients/rook tests], responses are structured and documented [operational-health.md], and tests cover core scenarios [health.rs, server/mod.rs].
Out of Scope Changes check ✅ Passed All changes are directly aligned with #682 objectives: API spec updates, health endpoint tests, documentation, and sidebar registration for the new docs page.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/rook-operational-health-682

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 38 minutes and 13 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the size/m Denotes a medium change size label May 3, 2026
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 3, 2026

Deploying corvus with  Cloudflare Pages  Cloudflare Pages

Latest commit: fbc8026
Status: ✅  Deploy successful!
Preview URL: https://4fabe6ba.corvus-42x.pages.dev
Branch Preview URL: https://feat-rook-operational-health.corvus-42x.pages.dev

View logs

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/web/apps/docs/src/content/docs/rook/operational-health.md`:
- Around line 1-9: The Spanish locale file for the document titled "Rook
Operational Health" is missing; either add a new Spanish markdown counterpart
named operational-health.md under the docs/es/rook area containing a full
translation of the EN content and matching frontmatter (title, summary, owner,
status, lastReviewed, appliesTo, docType) with appropriate lang metadata, or if
translation will be deferred, create a tracking issue and update the English
frontmatter to mark the ES translation as pending (e.g., add a
translationPending or translations section indicating missing es) so the
pipeline no longer flags this as an untracked gap.
- Around line 1-9: The frontmatter block for the Rook Operational Health doc is
missing the required description field; add a description: "..." entry inside
the YAML frontmatter (between the leading and trailing ---) so metadata
validation passes; update the top-level frontmatter in this file (the block
containing title: Rook Operational Health, summary:, owner:, etc.) by inserting
a descriptive one-line description value that summarizes the page (e.g.,
"Liveness, readiness, and compatibility health endpoints for Rook deployments").

In `@openspec/specs/gateway/spec.md`:
- Around line 3943-3948: Update the health-probe examples to qualify auth
preconditions: change the `/api/health/live` acceptance criteria to state that a
200 OK with body `{ "status": "ok" }` is guaranteed only when inbound auth is
disabled or when the probe supplies a valid bearer token; add an additional
clause that in auth-enabled deployments an unauthenticated probe may receive
`401 Unauthorized` before reaching the health handler. Modify both the
`/api/health/live` and the related `/api/health/ready` blocks (refs: the literal
path strings `/api/health/live` and `/api/health/ready` in the spec) and add a
short note describing expected 401 behavior for auth-enabled deployments so the
contract is not contradictory.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3597132a-e728-48a1-9e84-39c0d6f6be4c

📥 Commits

Reviewing files that changed from the base of the PR and between 8381b6b and 4672c6b.

📒 Files selected for processing (5)
  • clients/rook/src/health.rs
  • clients/rook/src/server/mod.rs
  • clients/web/apps/docs/astro.config.mjs
  • clients/web/apps/docs/src/content/docs/rook/operational-health.md
  • openspec/specs/gateway/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: submit-gradle
  • GitHub Check: sonar
  • GitHub Check: pr-checks
  • GitHub Check: semgrep-cloud-platform/scan
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{md,mdx}

⚙️ CodeRabbit configuration file

**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.

Files:

  • clients/web/apps/docs/src/content/docs/rook/operational-health.md
  • openspec/specs/gateway/spec.md
**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

  • clients/web/apps/docs/src/content/docs/rook/operational-health.md
  • clients/rook/src/health.rs
  • clients/web/apps/docs/astro.config.mjs
  • openspec/specs/gateway/spec.md
  • clients/rook/src/server/mod.rs
**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

  • clients/rook/src/health.rs
  • clients/rook/src/server/mod.rs
🪛 GitHub Actions: Docs Quality
clients/web/apps/docs/src/content/docs/rook/operational-health.md

[error] 1-1: Documentation metadata validation failed: missing required field 'description'.


[error] 1-1: Documentation metadata validation failed: missing locale counterpart 'clients/web/apps/docs/src/content/docs/es/rook/operational-health.md'.

🔇 Additional comments (8)
clients/rook/src/health.rs (1)

147-187: Good coverage addition for critical readiness branches.

These tests correctly lock the Fail status and per-check reason semantics for config/router startup dependency failures.

clients/rook/src/server/mod.rs (1)

930-931: Health endpoint contract tests are strengthened appropriately.

Great additions: explicit liveness independence, per-check readiness reasons, and degraded-assets assertions.

Also applies to: 933-955, 979-983, 985-1041, 1065-1068

clients/web/apps/docs/astro.config.mjs (1)

242-256: Sidebar registration looks clean and consistent.

The new Rook section and localized item wiring are consistent with the existing Starlight sidebar structure.

clients/web/apps/docs/src/content/docs/rook/operational-health.md (5)

16-22: LGTM - endpoint table is accurate.

The three endpoints align with the tested implementation.


24-36: LGTM - liveness semantics are correct.

The independence from dependency checks is the correct pattern for liveness probes.


79-83: LGTM - provider exclusion is the correct design.

Separating operational readiness from provider availability prevents false negatives and aligns with cloud-native patterns.


84-97: LGTM - Kubernetes probe configuration is correct.

The probe paths and structure follow Kubernetes best practices.


99-108: LGTM - compatibility endpoint guidance is clear.

The deprecation guidance appropriately steers users toward the structured readiness endpoint.

Comment thread clients/web/apps/docs/src/content/docs/rook/operational-health.md
Comment on lines 3943 to 3948
- GIVEN the Rook process is running
- WHEN a client requests the liveness endpoint
- THEN the response status MUST be successful
- AND the JSON body MUST indicate a live state
- AND the result MUST NOT depend on database or upstream provider reachability
- WHEN a client requests `GET /api/health/live`
- THEN the response status MUST be `200 OK`
- AND the JSON body MUST equal `{ "status": "ok" }`
- AND the result MUST NOT depend on database, asset, account, or upstream provider reachability

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Clarify auth preconditions for health probes to avoid a contradictory contract.

This section currently reads as unconditional 200 behavior, but /api/* can be gated by inbound auth; unauthenticated probe calls may get 401 before hitting health handlers. Please qualify these scenarios (e.g., auth disabled or valid bearer token provided), and add a 401 probe behavior note for auth-enabled deployments.

Suggested spec wording update
- - WHEN a client requests `GET /api/health/live`
- - THEN the response status MUST be `200 OK`
+ - WHEN a client requests `GET /api/health/live` with valid inbound auth when required
+ - THEN the response status MUST be `200 OK`

- `GET /api/health` MUST continue to return `200 OK` with plain text body `ok` for a running process.
+ `GET /api/health` MUST continue to return `200 OK` with plain text body `ok` for a running process
+ when request authentication succeeds (or when inbound auth is disabled).

+ When inbound auth is enabled and credentials are missing/invalid, health endpoints under `/api/*`
+ MUST return `401 Unauthorized` per the inbound auth contract.

As per coding guidelines **/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.

Also applies to: 3995-4008

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@openspec/specs/gateway/spec.md` around lines 3943 - 3948, Update the
health-probe examples to qualify auth preconditions: change the
`/api/health/live` acceptance criteria to state that a 200 OK with body `{
"status": "ok" }` is guaranteed only when inbound auth is disabled or when the
probe supplies a valid bearer token; add an additional clause that in
auth-enabled deployments an unauthenticated probe may receive `401 Unauthorized`
before reaching the health handler. Modify both the `/api/health/live` and the
related `/api/health/ready` blocks (refs: the literal path strings
`/api/health/live` and `/api/health/ready` in the spec) and add a short note
describing expected 401 behavior for auth-enabled deployments so the contract is
not contradictory.

yacosta738 and others added 5 commits May 3, 2026 19:07
… path traversal

Fix path traversal vulnerability in Telegram attachment path resolution by rejecting absolute paths.

## Changes
- Added explicit validation to reject absolute paths in `resolve_attachment_path`
- All attachment paths must now be relative to the configured attachment root
- Removed conditional logic that allowed absolute paths to bypass the root directory join

## Why
The previous implementation allowed absolute paths to be used directly as candidates, which could potentially allow path traversal attacks. Although there was a `starts_with` check after canonicalization, rejecting absolute paths early provides defense-in-depth and makes the security boundary explicit. By requiring all paths to be relative, we ensure they are always resolved within the attachment root directory.

## Semgrep Finding Details
The application builds a file path from potentially untrusted data, which can lead to a path traversal vulnerability. An attacker can manipulate the path which the application uses to access files. If the application does not validate user input and sanitize file paths, sensitive files such as configuration or user data can be accessed, potentially creating or overwriting files. To prevent this vulnerability, validate and sanitize any input that is used to create references to file paths. Also, enforce strict file access controls. For example, choose privileges allowing public-facing applications to access only the required files.

Semgrep generated this pull request to fix [a finding](https://semgrep.dev/orgs/yap/findings/772954974) from the detection rule [rust.actix.path-traversal.tainted-path.tainted-path](https://semgrep.dev/r/rust.actix.path-traversal.tainted-path.tainted-path).
fix: reject absolute paths in Telegram attachment resolver to prevent path traversal
@yacosta738 yacosta738 merged commit 133d341 into main May 3, 2026
16 of 17 checks passed
@yacosta738 yacosta738 deleted the feat/rook-operational-health-682 branch May 3, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add readiness/liveness and operational health reporting

1 participant