feat(cerebro): expose operational metrics#731
Conversation
Deploying corvus with
|
| Latest commit: |
bf7f275
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://5fa708c8.corvus-42x.pages.dev |
| Branch Preview URL: | https://feat-695-cerebro-operational.corvus-42x.pages.dev |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Prometheus metrics support to the Cerebro crate: new Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Server as CerebroService
participant Tools
participant Storage
participant Metrics as Prometheus
rect rgba(100,150,200,0.5)
Client->>Server: POST /jsonrpc (tools/call)
Server->>Metrics: increment CEREBRO_REQUESTS_TOTAL{method,status}
Server->>Server: authorize request
alt auth failure
Server->>Metrics: increment CEREBRO_AUTH_FAILURES_TOTAL
Server-->>Client: error response
else auth success
Server->>Tools: invoke tool
Tools->>Storage: perform storage ops
Storage-->>Tools: ok / Err(Storage)
alt storage error
Tools->>Metrics: increment CEREBRO_STORAGE_ERRORS_TOTAL{operation}
end
Tools-->>Server: tool result + elapsed
Server->>Metrics: observe CEREBRO_TOOL_LATENCY_SECONDS{tool,status}
Server-->>Client: JSON-RPC response
end
end
rect rgba(150,200,100,0.5)
Client->>Server: GET /metrics
Server->>Metrics: gather metric families
Server-->>Client: Prometheus text exposition
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/cerebro/src/server.rs`:
- Around line 127-130: The metric uses the untrusted request.method string as a
label value (see metrics::CEREBRO_REQUESTS_TOTAL increments), which risks
unbounded label cardinality; replace direct use of request.method with a bounded
canonical label by mapping request.method to a fixed set of allowed values
(e.g., "tools.call", "tools.list", "auth", etc.) and falling back to a single
safe bucket like "unknown" or "other" for anything else, then use that mapped
value when calling .with_label_values(...) in the same places where
request.method is currently used (including the other increments around the
request handling code).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 99b4a552-1b18-4cdc-aa37-7cba5cf8897f
⛔ Files ignored due to path filters (2)
clients/agent-runtime/Cargo.lockis excluded by!**/*.lockclients/cerebro/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (8)
clients/cerebro/Cargo.tomlclients/cerebro/README.mdclients/cerebro/src/lib.rsclients/cerebro/src/metrics.rsclients/cerebro/src/server.rsclients/cerebro/src/tools.rsclients/cerebro/tests/health_endpoints_test.rsopenspec/specs/cerebro/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: pr-checks
- GitHub Check: sonar
- GitHub Check: submit-gradle
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (3)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/cerebro/src/lib.rsclients/cerebro/tests/health_endpoints_test.rsclients/cerebro/src/server.rsclients/cerebro/src/tools.rsclients/cerebro/src/metrics.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/cerebro/src/lib.rsclients/cerebro/Cargo.tomlopenspec/specs/cerebro/spec.mdclients/cerebro/tests/health_endpoints_test.rsclients/cerebro/README.mdclients/cerebro/src/server.rsclients/cerebro/src/tools.rsclients/cerebro/src/metrics.rs
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
openspec/specs/cerebro/spec.mdclients/cerebro/README.md
🧠 Learnings (2)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Do not add heavy dependencies for minor convenience; justify new crate additions
Applied to files:
clients/cerebro/Cargo.toml
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/Cargo.toml : Preserve release-size profile assumptions in `Cargo.toml` and avoid adding heavy dependencies unless clearly justified
Applied to files:
clients/cerebro/Cargo.toml
🪛 LanguageTool
openspec/specs/cerebro/spec.md
[style] ~295-~295: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...abels. - cerebro_storage_errors_total counter labeled by operation. The metrics en...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (7)
clients/cerebro/src/lib.rs (1)
3-3: Good module exposure for metrics.
pub mod metrics;cleanly exposes the new metrics surface needed by the server/tool instrumentation.clients/cerebro/Cargo.toml (1)
19-19: Dependency addition looks scoped and justified.Prometheus is added with default features disabled, which keeps the footprint tighter while enabling the required metrics capability.
clients/cerebro/tests/health_endpoints_test.rs (1)
144-176: Nice integration coverage for/metrics.This test validates both endpoint availability and expected Cerebro metric families in the exposition payload.
openspec/specs/cerebro/spec.md (1)
285-312: Operational metrics requirement is well specified.The requirement and scenarios are concrete and aligned with the implemented metric families and
/metricsscrape behavior.clients/cerebro/README.md (1)
85-97: README observability update is clear and aligned.Good callout that
/metricsis unauthenticated and must be protected at the network boundary, plus a clear metric catalog.clients/cerebro/src/tools.rs (1)
294-314: Storage-error instrumentation is centralized well.
track_storagekeeps error counting consistent across storage paths, and using static operation labels is a solid guard against label-cardinality drift.Also applies to: 508-510, 530-541, 579-584, 597-602, 621-624, 655-658, 683-685, 734-744, 750-752
clients/cerebro/src/metrics.rs (1)
1-65: Metrics module structure looks solid.The static metric declarations plus explicit
init()registration/pre-warm pattern are consistent and easy to consume from server/tool paths.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
clients/cerebro/src/server.rs (1)
159-169:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winAuthenticate
tools/listbefore returning the manifest.Line 159 returns the tool inventory before
authorize()runs, so any caller can enumerate the MCP surface without a Bearer token. It also means those requests never incrementcerebro_auth_failures_total, leaving the new metrics blind to that unauthorized path. Move this fast path after auth succeeds, but beforeparamsvalidation.As per coding guidelines:
**/*: Security first, performance second.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@clients/cerebro/src/server.rs` around lines 159 - 169, The tools/list fast-path currently returns the manifest before authorization, so move the block that handles request.method == "tools/list" to occur immediately after the call to authorize() (and after successful auth) but before any params validation; ensure you still increment metrics consistently (use metrics::CEREBRO_REQUESTS_TOTAL with the same labels and also make sure cerebro_auth_failures_total is incremented on auth failure paths), and call self.tools.list_manifest() and return the JsonRpcResponse only after authorize() succeeds so unauthenticated callers cannot enumerate tools.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/cerebro/src/server.rs`:
- Around line 248-255: The tool latency histogram is being observed using the
request-scoped start timer (variable start), which causes
CEREBRO_TOOL_LATENCY_SECONDS to include auth/arg-redaction/event-bus time;
change the code to start a new timer at the tool execution boundary (e.g.,
create tool_start = Instant::now() immediately before invoking the tool) and use
tool_start.elapsed() (or its secs_f64) when calling
metrics::CEREBRO_TOOL_LATENCY_SECONDS.with_label_values(...).observe(...)
(update both occurrences referenced around the current block and the similar
block at 283-290) so the per-tool histogram measures only actual tool execution
time.
- Around line 387-395: The readiness handler currently returns error.to_string()
to unauthenticated probes (in the Err(error) branch where
metrics::CEREBRO_READINESS_FAILURES_TOTAL.inc() and
StatusCode::SERVICE_UNAVAILABLE with Json(...) are constructed); change this to
return a generic readiness payload (e.g.
{"status":"not_ready","error":"storage_unavailable"} or similar) and move the
detailed error into server-side logging (use your tracing::error! or existing
logger to log the full error with context like "readiness check failed") so no
backend names/paths/connection details are echoed to the client.
- Around line 400-421: The handler handle_metrics currently returns (StatusCode,
String) and drops the Prometheus exposition Content-Type; change handle_metrics
to return an impl IntoResponse (or a Response) and set the Content-Type header
to encoder.format_type() on the returned response: on success build a response
with StatusCode::OK, body from String::from_utf8(buffer), and header
CONTENT_TYPE = encoder.format_type().as_ref(); on error return
INTERNAL_SERVER_ERROR with the same header (or a suitable plain text
content-type) while preserving the existing tracing::error calls; locate
encoder, metric_families, and the String::from_utf8 match in handle_metrics to
implement this.
---
Outside diff comments:
In `@clients/cerebro/src/server.rs`:
- Around line 159-169: The tools/list fast-path currently returns the manifest
before authorization, so move the block that handles request.method ==
"tools/list" to occur immediately after the call to authorize() (and after
successful auth) but before any params validation; ensure you still increment
metrics consistently (use metrics::CEREBRO_REQUESTS_TOTAL with the same labels
and also make sure cerebro_auth_failures_total is incremented on auth failure
paths), and call self.tools.list_manifest() and return the JsonRpcResponse only
after authorize() succeeds so unauthenticated callers cannot enumerate tools.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 70c51cf0-9a05-4492-b170-b4df2b955765
📒 Files selected for processing (4)
clients/cerebro/README.mdclients/cerebro/src/metrics.rsclients/cerebro/src/server.rsopenspec/specs/cerebro/spec.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: sonar
- GitHub Check: pr-checks
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{md,mdx}
⚙️ CodeRabbit configuration file
**/*.{md,mdx}: Verify technical accuracy and that docs stay aligned with code changes.
For user-facing docs, check EN/ES parity or explicitly note pending translation gaps.
Files:
clients/cerebro/README.mdopenspec/specs/cerebro/spec.md
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/cerebro/README.mdclients/cerebro/src/metrics.rsopenspec/specs/cerebro/spec.mdclients/cerebro/src/server.rs
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/cerebro/src/metrics.rsclients/cerebro/src/server.rs
🧠 Learnings (2)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Applied to files:
clients/cerebro/src/server.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/cerebro/src/server.rs
🪛 LanguageTool
openspec/specs/cerebro/spec.md
[style] ~295-~295: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...abels. - cerebro_storage_errors_total counter labeled by operation. The metrics en...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@clients/cerebro/src/server.rs`:
- Around line 253-258: The tool completion and failure paths are computing
duration_ms from the request-scoped start instead of the tool-bound tool_start,
inflating ToolCallEvent.duration_ms and the "tool call completed/failed" log;
update both success and error branches that call self.tools.handle to compute
elapsed/duration_ms using tool_start (replace start.elapsed() with
tool_start.elapsed()), ensure metrics::CEREBRO_REQUESTS_TOTAL and any created
ToolCallEvent use that duration, and remove the request-wide start if it becomes
unused.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 8138ac04-98e5-4253-92ee-0c0b581f0b6b
📒 Files selected for processing (3)
clients/cerebro/src/server.rsclients/cerebro/tests/health_endpoints_test.rsclients/cerebro/tests/mcp_tools_contract.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: pr-checks
- GitHub Check: semgrep-cloud-platform/scan
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
⚙️ CodeRabbit configuration file
**/*.rs: Focus on Rust idioms, memory safety, and ownership/borrowing correctness.
Flag unnecessary clones, unchecked panics in production paths, and weak error context.
Prioritize unsafe blocks, FFI boundaries, concurrency races, and secret handling.
Files:
clients/cerebro/tests/mcp_tools_contract.rsclients/cerebro/tests/health_endpoints_test.rsclients/cerebro/src/server.rs
**/*
⚙️ CodeRabbit configuration file
**/*: Security first, performance second.
Validate input boundaries, auth/authz implications, and secret management.
Look for behavioral regressions, missing tests, and contract breaks across modules.
Files:
clients/cerebro/tests/mcp_tools_contract.rsclients/cerebro/tests/health_endpoints_test.rsclients/cerebro/src/server.rs
🧠 Learnings (6)
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/main.rs : Preserve CLI contract unless change is intentional and documented; prefer explicit errors over silent fallback for unsupported critical paths
Applied to files:
clients/cerebro/tests/mcp_tools_contract.rsclients/cerebro/tests/health_endpoints_test.rsclients/cerebro/src/server.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/tools/**/*.rs : Implement `Tool` trait in `src/tools/` with strict parameter schema, validate and sanitize all inputs, and return structured `ToolResult` without panics in runtime path
Applied to files:
clients/cerebro/tests/mcp_tools_contract.rsclients/cerebro/src/server.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/channels/**/*.rs : Implement `Channel` trait in `src/channels/` with consistent `send`, `listen`, and `health_check` semantics and cover auth/allowlist/health behavior with tests
Applied to files:
clients/cerebro/tests/health_endpoints_test.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/**/*.rs : Run `cargo fmt --all -- --check`, `cargo clippy --all-targets -- -D warnings`, and `cargo test` for code validation, or document which checks were skipped and why
Applied to files:
clients/cerebro/tests/health_endpoints_test.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools}/**/*.rs : Treat `src/security/`, `src/gateway/`, `src/tools/` as high-risk surfaces and never broaden filesystem/network execution scope without explicit policy checks
Applied to files:
clients/cerebro/src/server.rs
📚 Learning: 2026-02-17T12:31:17.076Z
Learnt from: CR
Repo: dallay/corvus PR: 0
File: clients/agent-runtime/AGENTS.md:0-0
Timestamp: 2026-02-17T12:31:17.076Z
Learning: Applies to clients/agent-runtime/src/{security,gateway,tools,config}/**/*.rs : Do not silently weaken security policy or access constraints; keep default behavior secure-by-default with deny-by-default where applicable
Applied to files:
clients/cerebro/src/server.rs
🔇 Additional comments (3)
clients/cerebro/src/server.rs (1)
395-434: Nice hardening on readiness and metrics handlers.Good call returning a fixed readiness error payload and preserving Prometheus
Content-Typeon/metricsresponses.clients/cerebro/tests/mcp_tools_contract.rs (1)
62-76: Good auth contract coverage fortools/list.This closes an important regression path by asserting the explicit unauthorized error contract.
clients/cerebro/tests/health_endpoints_test.rs (1)
144-184: Strong/metricscontract test.Status, Prometheus content type, and metric-name assertions provide solid coverage for this feature.
|



Related Issues
Fixes #695
Summary
Adds a Prometheus-compatible operational metrics layer for Cerebro so production operators can observe MCP request outcomes, tool latency, authentication failures, readiness failures, and storage errors.
Key changes:
/metricswith Prometheus text exposition.Tested Information
Ran:
cargo fmt --manifest-path clients/cerebro/Cargo.toml cargo clippy --manifest-path clients/cerebro/Cargo.toml cargo test --manifest-path clients/cerebro/Cargo.tomlPre-commit hooks also passed during commit creation.
Documentation Impact
clients/cerebro/README.mdopenspec/specs/cerebro/spec.mdBreaking Changes
None.
Checklist