feat(rook): add production observability metrics#713
Conversation
Add bounded Prometheus metrics for Rook request volume, latency, upstream failures, rate-limit outcomes, and idempotency behavior across /api/* and /v1/*. Tighten label safety, preserve /api/metrics as the scrape endpoint, and add verification coverage for production observability paths.
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Review skipped — only excluded labels are configured. (2)
Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds production-grade observability and metrics across Rook’s gateway: normalized metric labels, outcome-based counters for rate-limit/upstream/idempotency, routing logical_model exposure, DB test serialization and tempdir-backed helpers, dashboard override serialization, and comprehensive docs/specs/tasks/verification artifacts. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant TransportMW as Transport<br/>Middleware
participant RateLimitMW as Rate-Limit<br/>Middleware
participant GatewayHandler as Gateway<br/>Handler
participant RoutingEngine as Routing<br/>Engine
participant UpstreamService as Upstream<br/>Service
participant Observability as Observability<br/>Registry
Client->>TransportMW: HTTP Request
TransportMW->>RateLimitMW: Forward Request
alt Allow
RateLimitMW->>Observability: rate_limit_outcomes_total<br/>(surface, endpoint, outcome=allow)
RateLimitMW->>GatewayHandler: Forward Request
else Reject
RateLimitMW->>Observability: rate_limit_outcomes_total<br/>(surface, endpoint, outcome=reject)
RateLimitMW->>Client: 429 Response
end
GatewayHandler->>RoutingEngine: Resolve Route
alt Resolved
RoutingEngine-->>GatewayHandler: RoutingDecision (includes logical_model)
GatewayHandler->>UpstreamService: Forward to Provider
alt Upstream Success
UpstreamService-->>GatewayHandler: Success
GatewayHandler->>TransportMW: Response
else Upstream Failure
UpstreamService-->>GatewayHandler: Error
GatewayHandler->>Observability: upstream_failures_total<br/>(vendor, account, model, outcome)
GatewayHandler->>TransportMW: Error Response
end
else Route Rejected
RoutingEngine-->>GatewayHandler: Route Rejected
GatewayHandler->>Observability: upstream_failures_total<br/>(vendor=unrouted, outcome=route_rejected)
GatewayHandler->>TransportMW: Error Response
end
TransportMW->>Observability: request_total / duration (surface, endpoint, status_class)
TransportMW->>Client: Final Response
sequenceDiagram
participant Client
participant GatewayHandler as Gateway Handler<br/>(POST /v1/chat/completions)
participant IdempotencyMW as Idempotency<br/>Middleware
participant Storage as Storage<br/>Backend
participant Observability as Observability<br/>Registry
participant UpstreamService as Upstream<br/>Service
Client->>GatewayHandler: POST with Idempotency-Key
GatewayHandler->>IdempotencyMW: Check Idempotency Key
alt Not Cached
IdempotencyMW->>Storage: Reserve Entry
alt Reservation Success
Storage-->>IdempotencyMW: Reserved
IdempotencyMW->>UpstreamService: Forward Request
UpstreamService-->>IdempotencyMW: Response
IdempotencyMW->>Storage: Store Response
alt Store Success
Storage-->>IdempotencyMW: Stored
IdempotencyMW->>Observability: idempotency_outcomes_total(outcome=pass)
IdempotencyMW->>Client: Response
else Store Failure
Storage-->>IdempotencyMW: Error
IdempotencyMW->>Observability: idempotency_outcomes_total(outcome=unavailable)
IdempotencyMW->>Client: Response + Error
end
else Reservation Failure
Storage-->>IdempotencyMW: Error
IdempotencyMW->>Observability: idempotency_outcomes_total(outcome=unavailable)
IdempotencyMW->>Client: Error Response
end
else Cached
IdempotencyMW->>Storage: Fetch Cached Response
Storage-->>IdempotencyMW: Cached Response
IdempotencyMW->>Observability: idempotency_outcomes_total(outcome=replay)
IdempotencyMW->>Client: Cached Response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Contributor ReportUser: @yacosta738
Contributor Report evaluates based on public GitHub activity. Analysis period: 2025-04-28 to 2026-04-28 |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
clients/rook/src/idempotency/middleware.rs (1)
175-183:⚠️ Potential issue | 🟠 MajorKeep idempotency outcomes mutually exclusive.
ReservedNewstill records"pass"beforefinalize_response()persists the response. With the new"unavailable"increments below, any store/body-read failure now emits both"pass"and"unavailable"for the same request, which makes the counter inaccurate.Suggested fix
ReserveResult::ReservedNew => { - state - .observability - .idempotency_outcomes_total() - .inc(IDEMPOTENCY_SURFACE, "pass"); parts.extensions.insert(raw_body.clone()); let request = Request::from_parts(parts, Body::from(raw_body.clone())); let response = next.run(request).await; finalize_response(state, scope, request_hash, response).await } @@ if state .service .inner() .complete_chat_completion( @@ return gateway_idempotency_error_response( StatusCode::SERVICE_UNAVAILABLE, "idempotency storage unavailable", "idempotency_unavailable", ); } + state + .observability + .idempotency_outcomes_total() + .inc(IDEMPOTENCY_SURFACE, "pass"); Response::from_parts(parts, Body::from(body_bytes)) }Also applies to: 195-244
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/rook/src/idempotency/middleware.rs` around lines 175 - 183, ReserveResult::ReservedNew currently increments observability.idempotency_outcomes_total("pass") before calling finalize_response, which can lead to duplicate/inaccurate labels when finalize_response later fails and increments "unavailable"; move or remove the pre-finalize "pass" increment so that outcomes remain mutually exclusive and only finalized results increment the counter. Specifically, in the ReserveResult::ReservedNew branch (and the similar sections around lines 195-244), avoid calling state.observability.idempotency_outcomes_total().inc(... "pass") before calling finalize_response(request_hash, ...); instead let finalize_response (or its success path) be responsible for incrementing "pass" and ensure all failure paths (store/body-read errors) increment "unavailable" (or the appropriate label) so each request produces exactly one outcome increment.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/rook/src/observability.rs`:
- Around line 206-215: The secret-detection function looks_secret_like currently
checks lowercase.contains("api_key") and contains("api-key") but misses the
hyphenless form; update looks_secret_like to also check for
lowercase.contains("apikey") (and any other common concatenated variants you
deem necessary) so the function will catch tokens like "apikey" in addition to
the existing patterns (bearer, sk-, token, api_key, api-key).
In `@clients/rook/src/routing/mod.rs`:
- Around line 213-217: Add a regression test that asserts the fallback
logical_model is set on RoutingDecision when the upstream-failure path is taken:
simulate the upstream-failure route (the same test harness that currently
asserts account/pool/route) and add an assertion that decision.logical_model
equals the expected fallback value; target the code paths constructing
RoutingDecision (check where RoutingDecision is returned and where
decision.logical_model is read) so the test will fail if future changes silently
relabel metrics for the fallback case.
In `@openspec/changes/rook-680-production-observability-and-metrics/state.yaml`:
- Around line 4-19: The top-level status is "completed" but the
"phases.explore.status" is still "pending", creating an inconsistent state;
update the "explore" phase under "phases" so its status value is "completed"
(i.e., change phases.explore.status from pending to completed) to make all phase
statuses consistent with the overall status.
---
Outside diff comments:
In `@clients/rook/src/idempotency/middleware.rs`:
- Around line 175-183: ReserveResult::ReservedNew currently increments
observability.idempotency_outcomes_total("pass") before calling
finalize_response, which can lead to duplicate/inaccurate labels when
finalize_response later fails and increments "unavailable"; move or remove the
pre-finalize "pass" increment so that outcomes remain mutually exclusive and
only finalized results increment the counter. Specifically, in the
ReserveResult::ReservedNew branch (and the similar sections around lines
195-244), avoid calling state.observability.idempotency_outcomes_total().inc(...
"pass") before calling finalize_response(request_hash, ...); instead let
finalize_response (or its success path) be responsible for incrementing "pass"
and ensure all failure paths (store/body-read errors) increment "unavailable"
(or the appropriate label) so each request produces exactly one outcome
increment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 787801d1-87f1-4fe3-bd62-33492b51d76a
📒 Files selected for processing (16)
clients/rook/src/dashboard/mod.rsclients/rook/src/doctor.rsclients/rook/src/gateway/handlers.rsclients/rook/src/idempotency/middleware.rsclients/rook/src/observability.rsclients/rook/src/routing/mod.rsclients/rook/src/server/mod.rsclients/rook/src/transport/middleware.rsclients/rook/src/transport/rate_limit.rsclients/rook/tests/doctor_operational_diagnostics.rsopenspec/changes/rook-680-production-observability-and-metrics/design.mdopenspec/changes/rook-680-production-observability-and-metrics/proposal.mdopenspec/changes/rook-680-production-observability-and-metrics/specs/gateway/spec.mdopenspec/changes/rook-680-production-observability-and-metrics/state.yamlopenspec/changes/rook-680-production-observability-and-metrics/tasks.mdopenspec/changes/rook-680-production-observability-and-metrics/verify-report.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: report / Contributor Quality Report
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: submit-gradle
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (3)
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
openspec/changes/rook-680-production-observability-and-metrics/state.yamlclients/rook/src/routing/mod.rsopenspec/changes/rook-680-production-observability-and-metrics/tasks.mdopenspec/changes/rook-680-production-observability-and-metrics/proposal.mdopenspec/changes/rook-680-production-observability-and-metrics/design.mdopenspec/changes/rook-680-production-observability-and-metrics/specs/gateway/spec.mdopenspec/changes/rook-680-production-observability-and-metrics/verify-report.mdclients/rook/src/transport/rate_limit.rsclients/rook/src/dashboard/mod.rsclients/rook/src/transport/middleware.rsclients/rook/src/idempotency/middleware.rsclients/rook/tests/doctor_operational_diagnostics.rsclients/rook/src/doctor.rsclients/rook/src/gateway/handlers.rsclients/rook/src/server/mod.rsclients/rook/src/observability.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/rook/src/routing/mod.rsclients/rook/src/transport/rate_limit.rsclients/rook/src/dashboard/mod.rsclients/rook/src/transport/middleware.rsclients/rook/src/idempotency/middleware.rsclients/rook/tests/doctor_operational_diagnostics.rsclients/rook/src/doctor.rsclients/rook/src/gateway/handlers.rsclients/rook/src/server/mod.rsclients/rook/src/observability.rs
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
openspec/changes/rook-680-production-observability-and-metrics/tasks.mdopenspec/changes/rook-680-production-observability-and-metrics/proposal.mdopenspec/changes/rook-680-production-observability-and-metrics/design.mdopenspec/changes/rook-680-production-observability-and-metrics/specs/gateway/spec.mdopenspec/changes/rook-680-production-observability-and-metrics/verify-report.md
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes
Applied to files:
openspec/changes/rook-680-production-observability-and-metrics/specs/gateway/spec.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
openspec/changes/rook-680-production-observability-and-metrics/verify-report.md
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/rook/src/dashboard/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/rook/src/idempotency/middleware.rsclients/rook/src/gateway/handlers.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/rook/src/server/mod.rs
🪛 LanguageTool
openspec/changes/rook-680-production-observability-and-metrics/proposal.md
[style] ~15-~15: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ather than only individual log lines. - Expose upstream failure metrics for routed pro...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[style] ~16-~16: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...e already available and safe to emit. - Expose counters for rate-limit outcomes and id...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
openspec/changes/rook-680-production-observability-and-metrics/design.md
[style] ~253-~253: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...ams or request bodies. - status_class MUST stay coarse (2xx, 4xx, etc.). ### ...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🪛 markdownlint-cli2 (0.22.1)
openspec/changes/rook-680-production-observability-and-metrics/proposal.md
[warning] 11-11: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 20-20: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
openspec/changes/rook-680-production-observability-and-metrics/design.md
[warning] 1-1: First line in a file should be a top-level heading
(MD041, first-line-heading, first-line-h1)
[warning] 8-8: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 17-17: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 32-32: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 46-46: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 60-60: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 71-71: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 81-81: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 98-98: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 112-112: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 124-124: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🔇 Additional comments (20)
clients/rook/src/dashboard/mod.rs (3)
30-33: LGTM — serial mutex for test isolation.The dual-mutex approach cleanly separates concerns:
ASSETS_READY_OVERRIDEprotects the value,ASSETS_READY_OVERRIDE_SERIALserializes test execution. Pattern aligns withDOCTOR_TEST_SERIALindoctor.rs.
42-47: Poison recovery is appropriate here.For test-only infrastructure, recovering from poisoned mutexes via
into_inner()prevents cascading test failures. This matches the existing pattern at line 39.
58-69: Guard design ensures test serialization.The
_serial_guardis held for the struct's lifetime, ensuring exclusive access during the test. Lock ordering is safe: serial acquired first innew(), override lock is only held transiently (lines 65, 74). Drop order correctly clears the override before releasing serialization.openspec/changes/rook-680-production-observability-and-metrics/proposal.md (1)
1-77: LGTM — Proposal is well-structured with appropriate risk mitigations.The proposal correctly scopes bounded metrics, addresses label cardinality risks, and includes a rollback strategy for the additive observability changes. The explicit exclusion of secrets/PII from metric labels aligns with the coding guidelines' security-first principle.
clients/rook/src/doctor.rs (2)
316-347: LGTM — Clean test serialization pattern.The
InitializedDbEnvwrapper correctly owns theTempDirto ensure the database file remains valid throughout test execution. TheOnceLock<Mutex<()>>pattern for lazy-init serialization is idiomatic and appropriate for preventing test interference.
349-366: LGTM — Tests correctly acquire serialization guard.The tests now properly await the serial guard before accessing shared state, preventing race conditions in DB-backed doctor tests.
clients/rook/src/server/mod.rs (5)
508-569: LGTM — Clean test helper extraction.The
seed_route_with_display_namehelper provides configurable account naming for metrics label assertions without duplicating seeding logic.
724-764: LGTM — Comprehensive endpoint label coverage with secret leak prevention.Good coverage of both
/v1/modelsand/v1/chat/completionsmetrics with stable surface labels. The assertions at lines 762-763 explicitly verify that request body content and model names don't leak into metric labels.
800-811: LGTM — Rate-limit outcome metrics properly tested.Assertions correctly validate the new outcome-based metrics (
allow/reject) for bothadmin_apiandgateway_v1surfaces.
1515-1673: LGTM — Thorough upstream failure metrics coverage.Excellent coverage of failure outcomes (
http_error,account_misconfigured,network_error,route_rejected) with explicit assertions that secret-like account names ("Bearer sk-secret") are normalized tounlabeled. The test at line 1669-1670 properly verifies safe bounded labels.
2148-2176: LGTM — Idempotency unavailable outcome now observable.The test correctly validates that storage failures emit the
unavailableoutcome metric, closing the observability gap for idempotency service failures.openspec/changes/rook-680-production-observability-and-metrics/design.md (1)
1-351: LGTM — Comprehensive design with clear safety boundaries.The design correctly documents the bounded label contract, instrumentation placement decisions, and explicitly addresses secret-safe normalization for
accountandmodellabels (lines 82-97). The Rust struct examples match the actual implementation inobservability.rs.openspec/changes/rook-680-production-observability-and-metrics/verify-report.md (1)
1-128: LGTM — Verification report is thorough.The report documents all required validation commands (
cargo fmt,cargo clippy,cargo test) and provides specific test evidence for each metrics category. The evidence references match actual test names in the codebase.openspec/changes/rook-680-production-observability-and-metrics/specs/gateway/spec.md (1)
1-192: LGTM — Spec defines clear, testable requirements with strong safety boundaries.The requirement at lines 161-175 explicitly forbids secret-bearing values and high-cardinality identifiers as metric labels. The scenarios provide testable criteria that align with the test coverage in
server/mod.rs.clients/rook/src/observability.rs (6)
1-26: LGTM — Clean metric family renames and new normalization infrastructure.The transition from rejection-only to outcome-based rate-limit metrics and the addition of bounded upstream failure metrics align with the design spec.
121-145: LGTM — Surface normalization helpers are well-structured.The normalization functions correctly map internal
RouteSurfaceandRateLimitedSurfaceenums to stable, bounded string labels for metrics.
147-185: LGTM — Endpoint normalization with safe fallbacks.The
normalize_surface_endpointfunction correctly:
- Returns
UNMATCHED_ENDPOINTfor missing or malformed paths- Prefixes surface-relative paths with
/apior/v1as appropriate- Preserves already-prefixed paths unchanged
217-263: LGTM — Bounded label normalization with length and character safety.The
normalize_bounded_labelfunction correctly:
- Trims and rejects empty/whitespace-only input
- Rejects secret-like values early
- Normalizes characters to a safe alphanumeric subset
- Enforces
MAX_LABEL_LENto prevent cardinality explosion- Removes leading/trailing separators
The early return on length overflow (line 253) is a good defensive measure against unbounded input.
302-354: LGTM — Handle implementations match new label contracts.The
RateLimitOutcomesHandleandUpstreamFailuresHandlecorrectly accept the new label dimensions (outcomefor rate-limit,vendor/account/model/outcomefor upstream failures).
513-562: LGTM — Comprehensive normalization tests with secret-safety validation.Tests cover:
- Surface/endpoint normalization
- Status class bucketing
- Secret-like value detection (
Bearer sk-secret,sk-secret-value)- Fallback to bounded defaults for invalid input
- Length overflow handling
Tighten secret-like label detection, lock fallback logical-model coverage, and keep idempotency outcome metrics mutually exclusive on finalize failure. Also align the OpenSpec phase state with the completed change status.
Deploying corvus with
|
| Latest commit: |
6455235
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://e1e5c6cb.corvus-42x.pages.dev |
| Branch Preview URL: | https://rook.corvus-42x.pages.dev |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/rook/src/server/mod.rs`:
- Around line 2170-2178: Add a new test case that exercises a finalize-time
idempotency failure: simulate a successful reserve phase but make
complete_chat_completion return an error, then call request_text(app,
"/api/metrics") and assert StatusCode::OK and that the metrics body contains the
"rook_idempotency_outcomes_total{surface=\"gateway_chat_completions\",outcome=\"unavailable\"}
1" entry while still not containing the "…outcome=\"pass\"} 1" entry; to
implement this, reuse the existing test scaffold around request_text and the
metrics assertions, arrange the test double or mock so reserve succeeds but
complete_chat_completion (or the function you call to finish completions) fails,
and then verify the two metric strings in the response body.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: f4a4ed60-f542-4589-b7e9-1fa6efd93b76
📒 Files selected for processing (5)
clients/rook/src/idempotency/middleware.rsclients/rook/src/observability.rsclients/rook/src/routing/mod.rsclients/rook/src/server/mod.rsopenspec/changes/rook-680-production-observability-and-metrics/state.yaml
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: submit-gradle
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (2)
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
openspec/changes/rook-680-production-observability-and-metrics/state.yamlclients/rook/src/routing/mod.rsclients/rook/src/idempotency/middleware.rsclients/rook/src/observability.rsclients/rook/src/server/mod.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/rook/src/routing/mod.rsclients/rook/src/idempotency/middleware.rsclients/rook/src/observability.rsclients/rook/src/server/mod.rs
🧠 Learnings (6)
📓 Common learnings
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Include threat/risk notes and rollback strategy for security, runtime, and gateway changes; add or update tests for boundary checks and failure modes
Applied to files:
clients/rook/src/routing/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/rook/src/routing/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/rook/src/server/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/rook/src/server/mod.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Applied to files:
clients/rook/src/server/mod.rs
🔇 Additional comments (5)
openspec/changes/rook-680-production-observability-and-metrics/state.yaml (1)
4-19: State lifecycle is now internally consistent.
statusand all phase statuses are aligned tocompleted.clients/rook/src/routing/mod.rs (1)
66-67:logical_modelpropagation and fallback regression coverage look correct.The decision now carries route model context, and the fallback assertion locks expected labeling behavior.
Also applies to: 213-216, 931-934
clients/rook/src/idempotency/middleware.rs (1)
133-137: Idempotency outcome metrics are now correctly failure-aware and mutually exclusive.
"pass"is emitted only after successful completion, while reserve/finalize failures emit"unavailable".Also applies to: 194-198, 229-243
clients/rook/src/observability.rs (1)
49-67: Metrics contract migration and secret-label hardening look solid.The renamed families/label schema and normalization tests are consistent, including the added
apikeysecret-like detection path.Also applies to: 102-118, 206-216, 443-567
clients/rook/src/server/mod.rs (1)
1515-1523: Great coverage for upstream failure-label hardening and secret leakage prevention.The added scenarios for misconfigured/network/rejected routes plus secret-like account-name redaction materially strengthen the observability contract tests.
Also applies to: 1655-1673
Add a regression test for the reserved-then-complete-fails path so metrics continue to emit only the unavailable outcome and never a pass counter when finalize-time idempotency storage fails.
|



Related Issues
closes #680
Summary
/api/*and/v1/*while preserving/api/metricsas the single scrape endpointTested Information
cargo fmt --all -- --checkcargo clippy --all-targets -- -D warningscargo testDocumentation Impact
openspec/changes/rook-680-production-observability-and-metrics/proposal.mdopenspec/changes/rook-680-production-observability-and-metrics/specs/gateway/spec.mdopenspec/changes/rook-680-production-observability-and-metrics/design.mdopenspec/changes/rook-680-production-observability-and-metrics/tasks.mdopenspec/changes/rook-680-production-observability-and-metrics/verify-report.mdBreaking Changes
Checklist