feat(rook): operational parity – undercover mode, debug diagnostics, multi-provider routing controls#792
Conversation
…multi-provider routing controls
- Add OperationalConfig { undercover, debug_diagnostics } with TOML parsing,
env overlay (ROOK_UNDERCOVER, ROOK_DEBUG_DIAGNOSTICS), partial overlay,
and export view
- Thread OperationalConfig through RookConfig → ServerConfig → GatewayState
and AdminState
- Add emit_gateway_debug / debug_diagnostics_enabled helpers in
gateway/handlers.rs; instrument buffered and streaming execution paths
- Sanitise tracing warn/info with normalize_model_label on all hot paths
- Expose OperationalStatusView { undercover, debug_diagnostics,
redaction_baseline: always_on } nested in OperatorStatusView via /status
- Add regression tests:
operational_config_propagates_to_admin_state (server/mod.rs)
emit_gateway_debug_is_silent_when_debug_diagnostics_disabled (gateway/handlers.rs)
- Update openspec/specs/gateway/spec.md with #538 requirements
Closes #538
|
Warning Rate limit exceeded
To continue reviewing without waiting, purchase usage credits in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughThis PR implements operational control infrastructure for Rook by introducing an ChangesOperational Controls Infrastructure
Sequence DiagramsequenceDiagram
participant Config as Config System
participant State as Runtime State
participant Gateway as Gateway Handler
participant Logger as Structured Log
Config->>Config: Load OperationalConfig from defaults,<br/>files, environment variables
Config->>State: Propagate to ServerConfig,<br/>GatewayState, AdminState
State->>Gateway: expose debug_diagnostics flag
Gateway->>Gateway: check debug_diagnostics_enabled()
alt debug_diagnostics enabled
Gateway->>Logger: emit_gateway_debug(route_resolved,<br/>upstream_attempt_success, etc.)
else debug_diagnostics disabled
Gateway->>Gateway: skip debug logging
end
Gateway->>State: include operational in<br/>OperatorStatusView for admin
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Deploying corvus with
|
| Latest commit: |
6daf45c
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://121f0421.corvus-42x.pages.dev |
| Branch Preview URL: | https://feature-operational-parity-5.corvus-42x.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
clients/rook/src/config/mod.rs (1)
979-986:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRedact
db_pathwhen undercover mode is active.Line 985 still exports the full filesystem path even when
operational.undercoveris true. The new undercover boundary explicitly treats local paths as sensitive operator output, sorook config exportwill leak deployment layout instead of failing closed.Proposed fix
Self { host: config.host.clone(), port: config.port, enable_tui: config.enable_tui, - db_path: config.db_path.display().to_string(), + db_path: if config.operational.undercover { + "[redacted]".to_string() + } else { + config.db_path.display().to_string() + }, operational: OperationalExportView::from(&config.operational), inbound_auth: InboundAuthExportView {As per coding guidelines,
**/*: Security first, performance second. Validate input boundaries, auth/authz implications, and secret management.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@clients/rook/src/config/mod.rs` around lines 979 - 986, The export currently always includes the full filesystem path from config.db_path in RookConfigExportView::from_config; change it to redact that value when the source config indicates undercover mode by checking config.operational.undercover and returning a safe placeholder (e.g. "<redacted>" or an empty string) instead of config.db_path.display().to_string() so local paths are not leaked; update the RookConfigExportView::from_config function to branch on config.operational.undercover (referencing RookConfig, OperationalExportView::from, and the db_path field).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@clients/rook/src/admin/handlers.rs`:
- Around line 279-283: The OperationalStatusView currently exposes
state.operational.undercover in the /status response but there is no runtime
enforcement; either remove undercover from the status payload or implement
enforcement where redaction happens. To fix: locate OperationalStatusView
construction in handlers.rs (the block setting operational:
OperationalStatusView { undercover: state.operational.undercover, ... }) and
either delete the undercover field and related struct member, or add enforcement
in the redaction/request flow (where requests are redacted or metadata is
filtered) to consult state.operational.undercover and apply the intended
behavior; update the OperationalStatusView definition and any callers/tests
accordingly (search for OperationalStatusView, state.operational, and redaction
code paths) so the status accurately reflects implemented behavior.
In `@clients/rook/src/config/mod.rs`:
- Around line 1061-1067: The Default implementation for OperationalConfig
currently sets undercover: true which violates the new operational config
contract; update the impl Default for OperationalConfig (the default() function)
to set undercover to false (and keep debug_diagnostics as false) so both new
flags default to false, ensuring operator-visible/exported config remains
unchanged for non-opted-in users.
In `@clients/rook/src/gateway/handlers.rs`:
- Around line 1787-1816: The test
emit_gateway_debug_is_silent_when_debug_diagnostics_disabled currently only
asserts debug_diagnostics_enabled and never calls emit_gateway_debug; update the
test to actually exercise emit_gateway_debug by hooking a tracing subscriber (or
test tracing collector) before calling super::emit_gateway_debug(&state, ...),
then assert that no "gateway.debug" event is recorded when
state.operational.debug_diagnostics is false; alternatively, if you prefer the
current assertions only, rename the test to reflect it only checks
debug_diagnostics_enabled and fix the stale inline comment about undercover
default (change or remove the mention of undercover=true) and keep references to
test_state(), OperationalConfig, debug_diagnostics_enabled, and
emit_gateway_debug so reviewers can find the code.
- Around line 571-577: The current emit_gateway_debug call that logs
"upstream_stream_success" fires when the upstream SSE opens but before the
stream completes; change this to emit a neutral event like
"upstream_stream_opened" here (replace the call at the emit_gateway_debug
invocation), then move the terminal "success" vs "failure" emission into the SSE
completion and error handlers where the stream abort is detected (the code path
referenced around the abort at line ~1279) so that a "upstream_stream_success"
is only emitted on clean completion and a corresponding failure event is emitted
on errors/abort; apply the same change to the similar block around lines 590-595
so both openings are logged neutrally and terminal outcomes are recorded from
the completion/error hooks.
In `@clients/rook/src/server/mod.rs`:
- Around line 2692-2720: Replace the current test body in
operational_config_propagates_to_admin_state so it exercises the real
propagation path: create the OperationalConfig, open the in-memory registry
(RookRegistry::open_in_memory), call build_app_with_registry_and_startup_state
(or the app bootstrap helper used in this module) to construct the running app
with that registry and startup state, then call the app's status endpoint (e.g.
GET /api/status) or otherwise retrieve the AdminState from the running app and
assert AdminState.operational.undercover and
AdminState.operational.debug_diagnostics match the original OperationalConfig
values instead of instantiating AdminState directly.
---
Outside diff comments:
In `@clients/rook/src/config/mod.rs`:
- Around line 979-986: The export currently always includes the full filesystem
path from config.db_path in RookConfigExportView::from_config; change it to
redact that value when the source config indicates undercover mode by checking
config.operational.undercover and returning a safe placeholder (e.g.
"<redacted>" or an empty string) instead of config.db_path.display().to_string()
so local paths are not leaked; update the RookConfigExportView::from_config
function to branch on config.operational.undercover (referencing RookConfig,
OperationalExportView::from, and the db_path field).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7f3f9710-2891-46e8-8900-eb5a449128e7
📒 Files selected for processing (9)
clients/rook/src/admin/handlers.rsclients/rook/src/admin/mod.rsclients/rook/src/admin/types.rsclients/rook/src/config/mod.rsclients/rook/src/gateway/handlers.rsclients/rook/src/gateway/mod.rsclients/rook/src/main.rsclients/rook/src/server/mod.rsopenspec/specs/gateway/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: sonar
- GitHub Check: pr-checks
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/rook/src/main.rsclients/rook/src/admin/mod.rsclients/rook/src/gateway/mod.rsclients/rook/src/admin/types.rsclients/rook/src/gateway/handlers.rsclients/rook/src/server/mod.rsclients/rook/src/admin/handlers.rsclients/rook/src/config/mod.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/rook/src/main.rsclients/rook/src/admin/mod.rsclients/rook/src/gateway/mod.rsclients/rook/src/admin/types.rsclients/rook/src/gateway/handlers.rsopenspec/specs/gateway/spec.mdclients/rook/src/server/mod.rsclients/rook/src/admin/handlers.rsclients/rook/src/config/mod.rs
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
openspec/specs/gateway/spec.md
| emit_gateway_debug( | ||
| state, | ||
| "upstream_stream_success", | ||
| &request.model, | ||
| &metric_context, | ||
| Some("success"), | ||
| ); |
There was a problem hiding this comment.
Delay upstream_stream_success until the SSE actually finishes.
This event is emitted as soon as the upstream stream opens, but Line 1279 shows the stream can still abort mid-flight. In that case diagnostics will report success even though the stream never completed, and there is no corresponding terminal failure event for the abort. Emit a neutral “stream opened” event here, then record success/failure from the completion/error hook instead.
As per coding guidelines, **/*: Look for behavioral regressions, missing tests, and contract breaks across modules.
Also applies to: 590-595
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@clients/rook/src/gateway/handlers.rs` around lines 571 - 577, The current
emit_gateway_debug call that logs "upstream_stream_success" fires when the
upstream SSE opens but before the stream completes; change this to emit a
neutral event like "upstream_stream_opened" here (replace the call at the
emit_gateway_debug invocation), then move the terminal "success" vs "failure"
emission into the SSE completion and error handlers where the stream abort is
detected (the code path referenced around the abort at line ~1279) so that a
"upstream_stream_success" is only emitted on clean completion and a
corresponding failure event is emitted on errors/abort; apply the same change to
the similar block around lines 590-595 so both openings are logged neutrally and
terminal outcomes are recorded from the completion/error hooks.
- config: fix OperationalConfig default – undercover now false (was true) - config: redact db_path in export view when undercover mode is active - admin: remove undercover from OperationalStatusView – no enforcement exists; expose only debug_diagnostics and redaction_baseline in /status - gateway: rename upstream_stream_success → upstream_stream_opened at stream open site; success/failure semantics belong at completion, not open - gateway: rename guard-only test to debug_diagnostics_enabled_guard_returns_correct_value; fix stale comment that referenced undercover=true default - server: rewrite operational_config_propagates_to_admin_state to exercise the real ServerConfig → build_app_with_registry → /api/status path - config: update test name and assertion to match new undercover=false default
|



Related Issues
Closes #538
Summary
Implements operational parity for the Rook gateway (#538): undercover/redaction mode, debug diagnostics, and multi-provider routing controls.
What changed and why:
OperationalConfig(config/mod.rs) — new struct{ undercover: bool, debug_diagnostics: bool }with TOML parsing, env overlay (ROOK_UNDERCOVER,ROOK_DEBUG_DIAGNOSTICS), partial overlay support, and a safe export view. Defaults: bothfalse.OperationalConfigthreads throughRookConfig → ServerConfig → GatewayStateandAdminStateso every layer has access without global state.emit_gateway_debug/debug_diagnostics_enabled(gateway/handlers.rs) — safe debug emission helpers that gate ondebug_diagnosticsflag. Instrumented on both buffered and streaming execution paths. All events usenormalize_model_label/normalize_vendor_label— bounded, never raw prompts, bodies, tokens, credentials, or paths./statusendpoint — extendedOperatorStatusViewwith nestedOperationalStatusView { undercover, debug_diagnostics, redaction_baseline: "always_on" }. Redaction baseline is a static string — it cannot be toggled off.openspec/specs/gateway/spec.mdupdated with 🛡️ Operational Parity: Undercover, Multi-Provider, and Debugging #538 requirements as source of truth.Tested Information
cargo test --manifest-path clients/rook/Cargo.toml --libmetrics_route_counts_upstream_success_http_error_and_route_rejected_outcomes) confirmed broken onmainbefore this branch — unrelated to these changes.operational_config_propagates_to_admin_state— verifies config flows fromServerConfigthrough toAdminStateemit_gateway_debug_is_silent_when_debug_diagnostics_disabled— verifies no debug events fire when flag is offcargo checkclean; pre-commit and pre-push hooks pass.Documentation Impact
openspec/specs/gateway/spec.mdupdated with 🛡️ Operational Parity: Undercover, Multi-Provider, and Debugging #538 operational parity requirements./statusresponse is extended (additive, not breaking).Breaking Changes
None. All changes are additive:
false— existing deployments behave identically without config changes.OperatorStatusViewgains a new nestedoperationalfield — existing consumers ignoring unknown fields are unaffected.Checklist