feat: Persist Rook provider account health state by yacosta738 · Pull Request #759 · dallay/corvus

yacosta738 · 2026-05-02T18:12:22Z

Summary

add SQLite persistence for provider account health/cooldown state
wire RookRegistry to use SqliteHealthService for runtime health checks
preserve missing-row Unknown semantics and recover availability after expired cooldowns

Tests

cargo test --manifest-path clients/rook/Cargo.toml

Closes #683

cloudflare-workers-and-pages · 2026-05-02T18:12:26Z

Deploying corvus with Cloudflare Pages

Latest commit:	`f31a3b8`
Status:	✅ Deploy successful!
Preview URL:	https://1bc71370.corvus-42x.pages.dev
Branch Preview URL:	https://feat-rook-health-persistence.corvus-42x.pages.dev

View logs

coderabbitai · 2026-05-02T18:12:31Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Provider account health status and cooldown state now persist across application restarts for improved account management.
Security
- Enhanced path validation with stricter safety checks to prevent malicious inputs and path traversal attempts.

Walkthrough

Two independent changes: (1) security policy hardens path validation to check raw inputs before URL-decoding/dequoting, shifting decoded-path checks to command argument normalization; (2) Rook gains persistent health state via SQLite storage, replacing in-memory tracking with a new database schema, persistence layer, and updated service wiring.

Changes

Security Policy Raw-Path Validation

Layer / File(s)	Summary
Core Validation Logic `clients/agent-runtime/src/security/policy.rs` (lines 595–620)	`is_path_allowed` now validates the raw `path` string directly, rejecting inputs containing `\0`, backslashes, `%`, or `..` components before any URL-decoding or dequoting; only `expand_tilde` operates on the unmodified input.
Command Argument Normalization `clients/agent-runtime/src/security/policy.rs` (lines 822–828)	`normalize_arg_for_path_checks` now URL-decodes the token first, rejects if the decoded result still contains `%`, then dequotes the decoded value; shifts handling of encoded/quoted paths to the command-parsing layer.
Tests & Validation `clients/agent-runtime/src/security/policy.rs` (lines 2210–2231)	Removed test asserting quoted paths are blocked by `is_path_allowed`; added tests confirming raw-path validation (quoted strings are allowed) and command-layer decoding/dequoting (encoded/quoted absolute/traversal paths are blocked during command parsing).

Rook Health Persistence

Layer / File(s)	Summary
Database Schema `clients/rook/migrations/0006_health_persistence.sql`	New `provider_account_health` table persists account status, cooldown windows, consecutive failure counts, and timestamps; includes cascade-delete foreign key to `provider_accounts`.
Migration Runner `clients/rook/src/db/mod.rs` (lines 49–52, 251–265)	Embeds the `0006_health_persistence` migration and integrates it into `SqliteDb::run_migrations`, checking `schema_migrations` and applying the migration if missing.
Persistence Layer `clients/rook/src/db/health.rs`	New module implements helpers (`status_to_db_str`, `parse_optional_rfc3339`, `row_to_health`) and three public async methods on `SqliteDb`: `get_account_health` (retrieves or returns `None`), `upsert_account_health_success` (clears failures/cooldown), and `upsert_account_health_failure` (increments failures, sets cooldown); includes atomicity via `ON CONFLICT` upserts and comprehensive error handling.
Service Implementation `clients/rook/src/services/health.rs` (lines 192–267)	New `SqliteHealthService` implements `HealthService` by reading/writing to `SqliteDb`, defaulting to healthy on missing rows or read errors, logging warnings on persistence failures; all health methods (`get`, `mark_success`, `mark_failure`, `is_available`, `list_healthy`) now durable.
Service Wiring `clients/rook/src/registry/mod.rs` (lines 24–83, 124–129)	`RookRegistry::from_db` constructs `SqliteHealthService` instead of `InMemoryHealthService`, and the `health()` accessor now returns `&SqliteHealthService`; updated trait imports accordingly.
Dependency Update `clients/rook/Cargo.toml` (line 71)	`corvus-traits` version updated from `0.1.0` to `0.2.2` (same path dependency).
Integration Tests `clients/rook/src/db/mod.rs` (lines 509–546), `clients/rook/src/services/health.rs` (lines 379–435), `clients/rook/src/registry/mod.rs` (lines 219–258)	In-memory DB tests verify schema columns and migration recording; SQLite service tests confirm missing-row defaulting, cooldown/failure persistence across service instances, cooldown expiry, and concurrent failure increment correctness; registry test verifies health state survives database reopen.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

fix(security): harden path validation and block shell input redirection #357: Modifies path-validation and percent-decoding logic in is_path_allowed/normalize_arg_for_path_checks with conflicting approaches to where URL-decoding occurs.
fix(security): harden SecurityPolicy against quote-based bypasses (SONAR:SEC-001) #758: Alters the order and behavior of URL-decoding and quote-stripping in the same security policy functions, directly affecting validation semantics.
feat(rook): implement shared domain services for accounts, pools, routes, and health #603: Replaces InMemoryHealthService with a durable health persistence layer that this PR extends (earlier PR introduced the service pattern; this PR makes it persistent).

Suggested labels

area:rust, risk:high

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 56.10% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description is brief and covers key objectives but lacks structured detail in several template sections; 'Tested Information' and 'Documentation Impact' are minimal.	Expand 'Tested Information' with specifics (test output, coverage), clarify 'Documentation Impact' explicitly, and add detailed context on API compatibility or breaking changes if any exist.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows Conventional Commit style with 'feat:' prefix, is descriptive of the main change (persisting Rook provider account health), and is 48 characters (well under the 72-character limit).
Linked Issues check	✅ Passed	Code changes comprehensively address issue `#683`: SQLite persistence for health/cooldown state, durable storage model, startup recovery, expiry semantics, and API compatibility are all implemented and tested.
Out of Scope Changes check	✅ Passed	All code changes directly support the linked issue: new migration, health DB module, SQLite service implementation, registry wiring, and security policy updates for path validation are all in scope.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/rook-health-persistence

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Review rate limit: 0/1 reviews remaining, refill in 60 minutes.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@clients/agent-runtime/src/security/policy.rs`:
- Around line 607-620: is_path_allowed currently lets quoted paths (e.g.,
"/etc/passwd" or '../secret') bypass the absolute/traversal checks because
quotes hide components; update is_path_allowed to detect and reject quoted
direct-path inputs by checking for a matching leading and trailing single or
double quote and returning false (or alternatively unquote first and then re-run
the existing validations) before the percent-sign and
Path::new(path).components() checks; modify the logic around the existing
percent check and the call to expand_tilde so quoted strings cannot bypass
absolute-path or ParentDir detection in is_path_allowed.

In `@clients/rook/src/services/health.rs`:
- Around line 247-255: InMemoryHealthService::is_available currently treats
Unhealthy as always unavailable, which diverges from
SqliteHealthService::is_available that only respects cooldowns; change
InMemoryHealthService::is_available to mirror SqliteHealthService by looking up
health.cooldown_until and returning false only while Utc::now() <
cooldown_until, otherwise return true (i.e., do not special-case Unhealthy
status), so tests reflect production recovery semantics.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fa7f1d76-c672-4270-8e34-cb98627fbd30

📥 Commits

Reviewing files that changed from the base of the PR and between a70ec1e and f31a3b8.

⛔ Files ignored due to path filters (1)

clients/rook/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (7)

clients/agent-runtime/src/security/policy.rs
clients/rook/Cargo.toml
clients/rook/migrations/0006_health_persistence.sql
clients/rook/src/db/health.rs
clients/rook/src/db/mod.rs
clients/rook/src/registry/mod.rs
clients/rook/src/services/health.rs

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: pr-checks
GitHub Check: submit-gradle
GitHub Check: sonar
GitHub Check: semgrep-cloud-platform/scan
GitHub Check: Cloudflare Pages

🧰 Additional context used

📓 Path-based instructions (6)

**/*.rs

⚙️ CodeRabbit configuration file

**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.

Files:

clients/rook/src/services/health.rs
clients/rook/src/db/mod.rs
clients/rook/src/registry/mod.rs
clients/rook/src/db/health.rs
clients/agent-runtime/src/security/policy.rs

**/*

⚙️ CodeRabbit configuration file

**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.

Files:

clients/rook/src/services/health.rs
clients/rook/migrations/0006_health_persistence.sql
clients/rook/src/db/mod.rs
clients/rook/src/registry/mod.rs
clients/rook/src/db/health.rs
clients/agent-runtime/src/security/policy.rs
clients/rook/Cargo.toml

clients/agent-runtime/src/{security,gateway,tools}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks

Files:

clients/agent-runtime/src/security/policy.rs

clients/agent-runtime/src/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency

Files:

clients/agent-runtime/src/security/policy.rs

clients/agent-runtime/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Run cargo fmt --all -- --check, cargo clippy --all-targets -- -D warnings, and cargo test for code validation, or document which checks were skipped and why

Files:

clients/agent-runtime/src/security/policy.rs

clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs

📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)

Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable

Files:

clients/agent-runtime/src/security/policy.rs

🔇 Additional comments (15)

clients/rook/Cargo.toml (1)

71-71: LGTM!

Internal dependency version bump with path override intact.

clients/rook/migrations/0006_health_persistence.sql (1)

1-11: LGTM!

Schema aligns with the AccountHealth struct, FK cascade correctly cleans up orphaned health rows, and CREATE TABLE IF NOT EXISTS ensures idempotency.

clients/rook/src/db/mod.rs (3)

10-10: LGTM!

New health module export correctly placed.

49-52: LGTM!

Migration 0006 embedding and conditional application follows the established pattern consistently.

Also applies to: 251-264

509-546: LGTM!

Tests verify both schema columns and migration version recording.

clients/rook/src/registry/mod.rs (2)

27-27: LGTM!

Registry correctly wired to use SqliteHealthService, maintaining API compatibility via the HealthService trait.

Also applies to: 43-43, 82-82, 127-128

219-258: LGTM!

Solid persistence test - creates account, marks failure, closes registry, reopens from same file, and verifies health state (status, failures, cooldown, availability) survived the reopen.

clients/rook/src/services/health.rs (3)

192-206: LGTM!

Clean struct definition and constructor.

207-245: LGTM!

Graceful degradation on read/write errors - logs warnings and falls back to defaults rather than propagating failures that would break routing.

378-435: LGTM!

Good coverage: missing-row semantics, cross-instance persistence, cooldown expiry, and concurrent failure increments. SQLite's write serialization ensures the concurrent test is deterministic.

clients/rook/src/db/health.rs (5)

10-29: LGTM!

Bidirectional status conversion is exhaustive and handles unknown values with a clear error.

44-85: LGTM!

Row parsing with proper error context. The u32::try_from guard handles theoretical overflow gracefully.

87-103: LGTM!

Parameterized query prevents injection. Clean optional row handling.

105-167: LGTM!

Atomic upserts via ON CONFLICT DO UPDATE handle both insert and increment cases correctly. The failure upsert's consecutive_failures + 1 is evaluated atomically by SQLite, avoiding read-modify-write races.

170-237: LGTM!

Tests cover the key scenarios: missing rows, failure round-trips with increment verification, and success clearing state.

coderabbitai · 2026-05-02T19:06:15Z

+        // Block percent signs rather than decoding direct path inputs here.
+        if path.contains('%') {
            return false;
        }

        // Block path traversal: check for ".." as a path component
-        if Path::new(&dequoted)
+        if Path::new(path)
            .components()
            .any(|c| matches!(c, std::path::Component::ParentDir))
        {
            return false;
        }

-        let expanded = expand_tilde(&dequoted);
+        let expanded = expand_tilde(path);


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject quoted direct-path inputs here.

is_path_allowed is still the guard used for raw tool parameters in clients/agent-runtime/src/tools/file_read.rs and clients/agent-runtime/src/tools/glob.rs. With dequoting removed, inputs like "/etc/passwd" or "../secret" now pass this check because the quotes hide the absolute/traversal component from Path::components() and the absolute-path check. That weakens the direct-path policy surface, and Lines 2216-2217 lock the regression in.

Either reject quotes here for non-shell path APIs, or normalize every path-taking tool before calling this helper.

Suggested fix

// Block backslashes (Windows-style separators or escaping) if path.contains('\\') { return false; } + // Direct path parameters should never arrive shell-quoted. + if path.contains('"') || path.contains('\'') { + return false; + } + // Block percent signs rather than decoding direct path inputs here. if path.contains('%') { return false; }

As per coding guidelines, "Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks" and "Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@clients/agent-runtime/src/security/policy.rs` around lines 607 - 620, is_path_allowed currently lets quoted paths (e.g., "/etc/passwd" or '../secret') bypass the absolute/traversal checks because quotes hide components; update is_path_allowed to detect and reject quoted direct-path inputs by checking for a matching leading and trailing single or double quote and returning false (or alternatively unquote first and then re-run the existing validations) before the percent-sign and Path::new(path).components() checks; modify the logic around the existing percent check and the call to expand_tilde so quoted strings cannot bypass absolute-path or ParentDir detection in is_path_allowed.

coderabbitai · 2026-05-02T19:06:15Z

+    async fn is_available(&self, account_id: AccountId) -> bool {
+        let health = self.get(account_id).await;
+        if let Some(until) = health.cooldown_until {
+            if Utc::now() < until {
+                return false;
+            }
+        }
+        true
+    }


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Behavioral inconsistency: is_available differs from InMemoryHealthService.

InMemoryHealthService.is_available() (lines 168-179) returns false for Unhealthy status regardless of cooldown. SqliteHealthService only checks cooldown expiry.

Per PR objectives ("availability is recovered after cooldowns expire"), this appears intentional. However, tests using InMemoryHealthService won't reflect production recovery semantics.

Align InMemoryHealthService.is_available() to match, or document the divergence explicitly.

Proposed fix to align InMemoryHealthService

impl HealthService for InMemoryHealthService { async fn is_available(&self, account_id: AccountId) -> bool { let health = self.get(account_id).await; - if health.status == HealthStatus::Unhealthy { - return false; - } if let Some(until) = health.cooldown_until { if Utc::now() < until { return false; } } true }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@clients/rook/src/services/health.rs` around lines 247 - 255, InMemoryHealthService::is_available currently treats Unhealthy as always unavailable, which diverges from SqliteHealthService::is_available that only respects cooldowns; change InMemoryHealthService::is_available to mirror SqliteHealthService by looking up health.cooldown_until and returning false only while Utc::now() < cooldown_until, otherwise return true (i.e., do not special-case Unhealthy status), so tests reflect production recovery semantics.

sonarqubecloud · 2026-05-02T19:23:37Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
96.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

feat(rook): persist provider account health state

0932893

github-actions Bot added the size/l Denotes a large change size label May 2, 2026

yacosta738 changed the title ~~Persist Rook provider account health state~~ feat: Persist Rook provider account health state May 2, 2026

yacosta738 added 2 commits May 2, 2026 20:24

Merge branch 'main' into feat/rook-health-persistence

84bcca8

fix(security): validate raw paths before normalization

f31a3b8

coderabbitai Bot added area:rust risk:high labels May 2, 2026

coderabbitai Bot reviewed May 2, 2026

View reviewed changes

yacosta738 merged commit 930700f into main May 2, 2026
17 checks passed

yacosta738 deleted the feat/rook-health-persistence branch May 2, 2026 19:35

coderabbitai Bot mentioned this pull request May 4, 2026

feat(rook): add production readiness operations #768

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Persist Rook provider account health state#759

feat: Persist Rook provider account health state#759
yacosta738 merged 3 commits into
mainfrom
feat/rook-health-persistence

yacosta738 commented May 2, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented May 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 2, 2026 •

edited

Loading

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

coderabbitai Bot May 2, 2026

Uh oh!

sonarqubecloud Bot commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yacosta738 commented May 2, 2026

Summary

Tests

Uh oh!

cloudflare-workers-and-pages Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying corvus with Cloudflare Pages

Uh oh!

coderabbitai Bot commented May 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 2, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented May 2, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cloudflare-workers-and-pages Bot commented May 2, 2026 •

edited

Loading

coderabbitai Bot commented May 2, 2026 •

edited

Loading