feat: Persist Rook provider account health state#759
Conversation
Deploying corvus with
|
| Latest commit: |
f31a3b8
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://1bc71370.corvus-42x.pages.dev |
| Branch Preview URL: | https://feat-rook-health-persistence.corvus-42x.pages.dev |
📝 WalkthroughSummary by CodeRabbit
WalkthroughTwo independent changes: (1) security policy hardens path validation to check raw inputs before URL-decoding/dequoting, shifting decoded-path checks to command argument normalization; (2) Rook gains persistent health state via SQLite storage, replacing in-memory tracking with a new database schema, persistence layer, and updated service wiring. ChangesSecurity Policy Raw-Path Validation
Rook Health Persistence
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 60 minutes.Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/agent-runtime/src/security/policy.rs`:
- Around line 607-620: is_path_allowed currently lets quoted paths (e.g.,
"/etc/passwd" or '../secret') bypass the absolute/traversal checks because
quotes hide components; update is_path_allowed to detect and reject quoted
direct-path inputs by checking for a matching leading and trailing single or
double quote and returning false (or alternatively unquote first and then re-run
the existing validations) before the percent-sign and
Path::new(path).components() checks; modify the logic around the existing
percent check and the call to expand_tilde so quoted strings cannot bypass
absolute-path or ParentDir detection in is_path_allowed.
In `@clients/rook/src/services/health.rs`:
- Around line 247-255: InMemoryHealthService::is_available currently treats
Unhealthy as always unavailable, which diverges from
SqliteHealthService::is_available that only respects cooldowns; change
InMemoryHealthService::is_available to mirror SqliteHealthService by looking up
health.cooldown_until and returning false only while Utc::now() <
cooldown_until, otherwise return true (i.e., do not special-case Unhealthy
status), so tests reflect production recovery semantics.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: fa7f1d76-c672-4270-8e34-cb98627fbd30
⛔ Files ignored due to path filters (1)
clients/rook/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
clients/agent-runtime/src/security/policy.rsclients/rook/Cargo.tomlclients/rook/migrations/0006_health_persistence.sqlclients/rook/src/db/health.rsclients/rook/src/db/mod.rsclients/rook/src/registry/mod.rsclients/rook/src/services/health.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: pr-checks
- GitHub Check: submit-gradle
- GitHub Check: sonar
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (6)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/rook/src/services/health.rsclients/rook/src/db/mod.rsclients/rook/src/registry/mod.rsclients/rook/src/db/health.rsclients/agent-runtime/src/security/policy.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/rook/src/services/health.rsclients/rook/migrations/0006_health_persistence.sqlclients/rook/src/db/mod.rsclients/rook/src/registry/mod.rsclients/rook/src/db/health.rsclients/agent-runtime/src/security/policy.rsclients/rook/Cargo.toml
clients/agent-runtime/src/{security,gateway,tools}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Treat
src/security/,src/gateway/,src/tools/as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Files:
clients/agent-runtime/src/security/policy.rs
clients/agent-runtime/src/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
clients/agent-runtime/src/**/*.rs: Never log secrets, tokens, raw credentials, or sensitive payloads in any logging statements
Avoid unnecessary allocations, clones, and blocking operations to maintain performance and efficiency
Files:
clients/agent-runtime/src/security/policy.rs
clients/agent-runtime/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Run
cargo fmt --all -- --check,cargo clippy --all-targets -- -D warnings, andcargo testfor code validation, or document which checks were skipped and why
Files:
clients/agent-runtime/src/security/policy.rs
clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs
📄 CodeRabbit inference engine (clients/agent-runtime/AGENTS.md)
Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Files:
clients/agent-runtime/src/security/policy.rs
🔇 Additional comments (15)
clients/rook/Cargo.toml (1)
71-71: LGTM!Internal dependency version bump with path override intact.
clients/rook/migrations/0006_health_persistence.sql (1)
1-11: LGTM!Schema aligns with the
AccountHealthstruct, FK cascade correctly cleans up orphaned health rows, andCREATE TABLE IF NOT EXISTSensures idempotency.clients/rook/src/db/mod.rs (3)
10-10: LGTM!New health module export correctly placed.
49-52: LGTM!Migration 0006 embedding and conditional application follows the established pattern consistently.
Also applies to: 251-264
509-546: LGTM!Tests verify both schema columns and migration version recording.
clients/rook/src/registry/mod.rs (2)
27-27: LGTM!Registry correctly wired to use
SqliteHealthService, maintaining API compatibility via theHealthServicetrait.Also applies to: 43-43, 82-82, 127-128
219-258: LGTM!Solid persistence test - creates account, marks failure, closes registry, reopens from same file, and verifies health state (status, failures, cooldown, availability) survived the reopen.
clients/rook/src/services/health.rs (3)
192-206: LGTM!Clean struct definition and constructor.
207-245: LGTM!Graceful degradation on read/write errors - logs warnings and falls back to defaults rather than propagating failures that would break routing.
378-435: LGTM!Good coverage: missing-row semantics, cross-instance persistence, cooldown expiry, and concurrent failure increments. SQLite's write serialization ensures the concurrent test is deterministic.
clients/rook/src/db/health.rs (5)
10-29: LGTM!Bidirectional status conversion is exhaustive and handles unknown values with a clear error.
44-85: LGTM!Row parsing with proper error context. The
u32::try_fromguard handles theoretical overflow gracefully.
87-103: LGTM!Parameterized query prevents injection. Clean optional row handling.
105-167: LGTM!Atomic upserts via
ON CONFLICT DO UPDATEhandle both insert and increment cases correctly. The failure upsert'sconsecutive_failures + 1is evaluated atomically by SQLite, avoiding read-modify-write races.
170-237: LGTM!Tests cover the key scenarios: missing rows, failure round-trips with increment verification, and success clearing state.
| // Block percent signs rather than decoding direct path inputs here. | ||
| if path.contains('%') { | ||
| return false; | ||
| } | ||
|
|
||
| // Block path traversal: check for ".." as a path component | ||
| if Path::new(&dequoted) | ||
| if Path::new(path) | ||
| .components() | ||
| .any(|c| matches!(c, std::path::Component::ParentDir)) | ||
| { | ||
| return false; | ||
| } | ||
|
|
||
| let expanded = expand_tilde(&dequoted); | ||
| let expanded = expand_tilde(path); |
There was a problem hiding this comment.
Reject quoted direct-path inputs here.
is_path_allowed is still the guard used for raw tool parameters in clients/agent-runtime/src/tools/file_read.rs and clients/agent-runtime/src/tools/glob.rs. With dequoting removed, inputs like "/etc/passwd" or "../secret" now pass this check because the quotes hide the absolute/traversal component from Path::components() and the absolute-path check. That weakens the direct-path policy surface, and Lines 2216-2217 lock the regression in.
Either reject quotes here for non-shell path APIs, or normalize every path-taking tool before calling this helper.
Suggested fix
// Block backslashes (Windows-style separators or escaping)
if path.contains('\\') {
return false;
}
+ // Direct path parameters should never arrive shell-quoted.
+ if path.contains('"') || path.contains('\'') {
+ return false;
+ }
+
// Block percent signs rather than decoding direct path inputs here.
if path.contains('%') {
return false;
}As per coding guidelines, "Treat src/security/, src/gateway/, src/tools/ as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks" and "Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/agent-runtime/src/security/policy.rs` around lines 607 - 620,
is_path_allowed currently lets quoted paths (e.g., "/etc/passwd" or '../secret')
bypass the absolute/traversal checks because quotes hide components; update
is_path_allowed to detect and reject quoted direct-path inputs by checking for a
matching leading and trailing single or double quote and returning false (or
alternatively unquote first and then re-run the existing validations) before the
percent-sign and Path::new(path).components() checks; modify the logic around
the existing percent check and the call to expand_tilde so quoted strings cannot
bypass absolute-path or ParentDir detection in is_path_allowed.
| async fn is_available(&self, account_id: AccountId) -> bool { | ||
| let health = self.get(account_id).await; | ||
| if let Some(until) = health.cooldown_until { | ||
| if Utc::now() < until { | ||
| return false; | ||
| } | ||
| } | ||
| true | ||
| } |
There was a problem hiding this comment.
Behavioral inconsistency: is_available differs from InMemoryHealthService.
InMemoryHealthService.is_available() (lines 168-179) returns false for Unhealthy status regardless of cooldown. SqliteHealthService only checks cooldown expiry.
Per PR objectives ("availability is recovered after cooldowns expire"), this appears intentional. However, tests using InMemoryHealthService won't reflect production recovery semantics.
Align InMemoryHealthService.is_available() to match, or document the divergence explicitly.
Proposed fix to align InMemoryHealthService
impl HealthService for InMemoryHealthService {
async fn is_available(&self, account_id: AccountId) -> bool {
let health = self.get(account_id).await;
- if health.status == HealthStatus::Unhealthy {
- return false;
- }
if let Some(until) = health.cooldown_until {
if Utc::now() < until {
return false;
}
}
true
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@clients/rook/src/services/health.rs` around lines 247 - 255,
InMemoryHealthService::is_available currently treats Unhealthy as always
unavailable, which diverges from SqliteHealthService::is_available that only
respects cooldowns; change InMemoryHealthService::is_available to mirror
SqliteHealthService by looking up health.cooldown_until and returning false only
while Utc::now() < cooldown_until, otherwise return true (i.e., do not
special-case Unhealthy status), so tests reflect production recovery semantics.
|



Summary
Tests
Closes #683