Skip to content

Track connection panic metrics#281

Merged
leynos merged 3 commits intomainfrom
codex/add-production-monitoring-for-connection-panics
Aug 8, 2025
Merged

Track connection panic metrics#281
leynos merged 3 commits intomainfrom
codex/add-production-monitoring-for-connection-panics

Conversation

@leynos
Copy link
Copy Markdown
Owner

@leynos leynos commented Aug 8, 2025

Summary

  • count connection task panics via wireframe_connection_panics_total
  • surface panic count in tests and docs with guidance for Prometheus alerts

closes #217

Testing

  • make fmt
  • make markdownlint
  • make nixie (fails: too many arguments/failed to link packages)
  • make lint
  • make test

https://chatgpt.com/codex/tasks/task_e_68953c8ac6f08322a60414af8a1d1406

Summary by Sourcery

Instrument connection tasks to record panics as a new Prometheus metric and update tests and documentation to use and verify the new counter.

New Features:

  • Introduce wireframe_connection_panics_total counter and inc_connection_panics API to record panicking connection tasks.

Enhancements:

  • Increment the connection panic counter in spawn_connection_task when a task panics.

Documentation:

  • Update the production resilience guide to explain the panic metric, its usage in Prometheus, and alerting guidance.

Tests:

  • Add unit tests and integration tests to verify that connection panics increment the new metric.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai Bot commented Aug 8, 2025

Reviewer's Guide

This PR introduces a new Prometheus counter to track connection task panics, wires it into the connection task’s catch_unwind handler, adds unit and integration tests to verify the metric is recorded, and extends the production resilience guide with alerting recommendations.

ER diagram for Prometheus connection panic metric

erDiagram
    CONNECTION_TASK ||--o{ CONNECTION_PANIC_METRIC : triggers
    CONNECTION_PANIC_METRIC {
        string name
        int count
    }
Loading

Class diagram for new connection panic metric tracking

classDiagram
    class Metrics {
        +inc_connection_panics()
        +CONNECTION_PANICS : &str
    }
    class ConnectionTask {
        +spawn_connection_task()
    }
    ConnectionTask ..> Metrics : uses
Loading

File-Level Changes

Change Details Files
Define and implement a new counter for connection task panics
  • Add CONNECTION_PANICS constant with HELP and TYPE comments
  • Implement feature-gated inc_connection_panics() function
src/metrics.rs
Instrument connection task wrapper to record panics
  • Invoke inc_connection_panics() in the catch_unwind panic branch
src/server/connection.rs
Add tests to ensure panic metric increments
  • Add async rstest in connection module to snapshot and assert panic counter
  • Add unit test in tests/metrics.rs for inc_connection_panics()
src/server/connection.rs
tests/metrics.rs
Update documentation with panic metric guidance
  • Describe wireframe_connection_panics_total in production resilience guide
  • Add Prometheus rate() and recorder setup recommendations
docs/hardening-wireframe-a-guide-to-production-resilience.md

Assessment against linked issues

Issue Objective Addressed Explanation
#217 Add a metrics counter to track the number of panics in connection tasks.
#217 Ensure metrics collection can be disabled via configuration (e.g., feature flag or recorder not installed).
#217 Document the new metric and provide guidance for operators on monitoring and alerting (e.g., Prometheus, Grafana).

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 8, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Summary by CodeRabbit

  • New Features

    • Introduced a new metric to track connection panics, enabling improved monitoring and observability via Prometheus and Grafana dashboards.
  • Documentation

    • Updated guidance to include details on the new panic metric and instructions for enabling and visualising it.
  • Tests

    • Added tests to verify that the connection panic metric increments correctly when triggered.

Walkthrough

Introduce a new Prometheus-compatible metric, wireframe_connection_panics_total, to track panics in connection tasks. Integrate metric incrementation into the panic handling logic, update documentation with observability instructions, and add tests to verify correct metric behaviour when a connection task panics.

Changes

Cohort / File(s) Change Summary
Metric Definition & API
src/metrics.rs
Define CONNECTION_PANICS metric constant and add inc_connection_panics function to increment the counter.
Connection Panic Instrumentation & Testing
src/server/connection.rs
Increment the panic counter in the connection panic handler; add a test to verify metric incrementation.
Metrics Test
tests/metrics.rs
Add a test to ensure the panic metric increments with the correct peer address label.
Documentation Update
docs/hardening-wireframe-a-guide-to-production-resilience.md
Document the new metric, its purpose, setup instructions, and dashboard integration guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant MetricsRecorder

    Client->>Server: Initiate connection
    Server->>Server: spawn_connection_task
    Server-->>Server: Connection task panics
    Server->>MetricsRecorder: inc_connection_panics(peer_addr)
    MetricsRecorder-->>MetricsRecorder: Increment wireframe_connection_panics_total
    Server->>Server: Handle panic (log, isolate)
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Assessment against linked issues

Objective Addressed Explanation
Metrics are collected when connection tasks panic (#217)
Metrics include relevant dimensions (peer_addr pattern, timestamp) (#217)
Metrics collection can be disabled via configuration (#217)
Documentation explains how to set up monitoring dashboards (#217)
Minimal performance impact on happy path (#217)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes detected.

Possibly related PRs

Suggested reviewers

  • codescene-delta-analysis

Poem

When panics arise in the server's domain,
A metric now counts each crash and each strain.
Peer address in tow, the numbers ascend,
Grafana and Prometheus, your new steadfast friends.
So monitor calmly, let dashboards delight—
For resilience is measured, both day and night!
📈✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4428c6e and 8155f33.

📒 Files selected for processing (3)
  • src/metrics.rs (2 hunks)
  • src/server/connection.rs (3 hunks)
  • tests/metrics.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.rs

📄 CodeRabbit Inference Engine (AGENTS.md)

**/*.rs: Function documentation must include clear examples demonstrating the usage and outcome of the function. Test documentation should omit examples where the example serves only to reiterate the test logic.
No single code file may be longer than 400 lines. Long switch statements or dispatch tables should be broken up by feature and constituents colocated with targets. Large blocks of test data should be moved to external data files.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.
Every module must begin with a module level (//! ) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums. Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status....

Files:

  • tests/metrics.rs
  • src/metrics.rs
  • src/server/connection.rs

⚙️ CodeRabbit Configuration File

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • tests/metrics.rs
  • src/metrics.rs
  • src/server/connection.rs
🧬 Code Graph Analysis (1)
src/server/connection.rs (2)
src/metrics.rs (2)
  • inc_connection_panics (108-108)
  • inc_connection_panics (111-111)
tests/metrics.rs (1)
  • connection_panic_metric_increments (66-83)
🔍 MCP Research (1 server)

Deepwiki:

  • Issue Add metrics/monitoring for caught panics in connection tasks #217 specifies the need for a metrics counter to track panics in connection tasks, with minimal overhead and configurability to disable metrics collection if needed. Metrics should use standard libraries like Prometheus and avoid logging sensitive data. The metric should include relevant dimensions such as peer address and timestamp, and documentation should explain monitoring setup. (linked_issue Add metrics/monitoring for caught panics in connection tasks #217)

  • The wireframe project uses the metrics crate for instrumentation, with all metric functions defined in src/metrics.rs. Metrics are conditionally compiled with the metrics feature, which can be disabled to make metric functions no-ops. Metric names use a wireframe_ prefix and are defined as pub const strings. Counters are incremented using counter!(NAME).increment(value). Metrics can include labels for additional context. (src/metrics.rs)

  • Panic handling in connection tasks is implemented by wrapping connection futures in catch_unwind to isolate panics and prevent them from crashing the server. When a panic occurs, the panic message and peer address are logged as errors using tracing::error!. The server continues accepting new clients after a panic. (linked_issue Add metrics/monitoring for caught panics in connection tasks #217, src/server/connection.rs)

  • The new metric wireframe_connection_panics_total follows the naming convention and is incremented inside the panic handling block of the connection task wrapper, consistent with the team's approach to metrics and panic handling. (src/server/connection.rs)

🔇 Additional comments (3)
src/metrics.rs (1)

27-34: Metric addition meets standards

Constant, docs and helper follow existing style, compile-time gating and no-label counter avoid previous cardinality concern. No further action required.

Also applies to: 91-111

tests/metrics.rs (1)

65-83: Test asserts the exact expected count—good

The tightened assert_eq!(1, count) prevents false positives. Implementation aligns with prior guidance.

src/server/connection.rs (1)

222-225: Avoid global recorder collisions

Installing a global recorder inside a test (recorder.install()) panics if any other test already set one and can race under parallel execution. Use metrics::with_local_recorder with a spawned single-thread runtime, gate the test with serial_test, or guard install() behind a Once to ensure exclusive registration.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/add-production-monitoring-for-connection-panics

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:171` </location>
<code_context>
+Each occurrence also increments the `wireframe_connection_panics_total`
+counter, enabling alerts on unexpected spikes. Operators can chart
+`rate(wireframe_connection_panics_total[5m])` in Prometheus and create Grafana
+panels to visualise instability. To emit this metric, enable the `metrics`
+Cargo feature and install a recorder such as `metrics_exporter_prometheus`,
+which exposes an HTTP endpoint for scraping.
</code_context>

<issue_to_address>
'Visualise' uses en-gb spelling; en-oxendic prefers 'visualize'.

The word 'visualise' should be spelled 'visualize' to conform to en-oxendic spelling conventions.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md Outdated
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Aug 8, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Aug 8, 2025

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Aug 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/hardening-wireframe-a-guide-to-production-resilience.md (1)

168-174: Use Oxford “-ize” spelling

Replace “visualise” with “visualize” to follow en-GB-oxendict conventions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ce4bce5 and 6c0dc4a.

📒 Files selected for processing (4)
  • docs/hardening-wireframe-a-guide-to-production-resilience.md (1 hunks)
  • src/metrics.rs (3 hunks)
  • src/server/connection.rs (3 hunks)
  • tests/metrics.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.md

⚙️ CodeRabbit Configuration File

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -our) spelling and grammar
  • Paragraphs and bullets must be wrapped to 80 columns, except where a long URL would prevent this (in which case, silence MD013 for that line)
  • Code blocks should be wrapped to 120 columns.
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/hardening-wireframe-a-guide-to-production-resilience.md
**/*.rs

⚙️ CodeRabbit Configuration File

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • src/server/connection.rs
  • tests/metrics.rs
  • src/metrics.rs
🧬 Code Graph Analysis (1)
src/metrics.rs (1)
src/extractor.rs (1)
  • peer_addr (362-362)
🪛 LanguageTool
docs/hardening-wireframe-a-guide-to-production-resilience.md

[style] ~171-~171: Would you like to use the Oxford spelling “visualize”? The spelling ‘visualise’ is also correct.
Context: ...Prometheus and create Grafana panels to visualise instability. To emit this metric, enabl...

(OXFORD_SPELLING_Z_NOT_S)

🔍 MCP Research (1 server)

Deepwiki:

  • The WireframeServer architecture uses a factory pattern to spawn worker tasks that accept connections and spawn connection tasks wrapped with panic recovery using catch_unwind. When a panic occurs in a connection task, it is caught and logged with the peer address, and now also increments the wireframe_connection_panics_total metric. This ensures that panics do not crash the server and are observable via metrics. (docs/hardening-wireframe-a-guide-to-production-resilience.md)

  • The connection actor model and asynchronous messaging system use PushHandle and SessionRegistry with weak references to manage connection lifecycles and prevent resource leaks. Panic isolation and metric incrementing integrate with this architecture to provide resilience and observability. (docs/hardening-wireframe-a-guide-to-production-resilience.md)

  • Observability and monitoring documentation describes structured logging with tracing spans for connections and requests, and metrics collection integrated with Prometheus exporters. The new panic metric wireframe_connection_panics_total fits into this observability framework, enabling operators to monitor panic rates over time and create Grafana dashboards for visualization. (docs/hardening-wireframe-a-guide-to-production-resilience.md)

  • The production deployment documentation emphasizes the importance of enabling the metrics Cargo feature and installing a recorder like metrics_exporter_prometheus to expose an HTTP scraping endpoint for Prometheus. It also details how the panic metric counter supports operational monitoring and alerting for connection panics. (docs/hardening-wireframe-a-guide-to-production-resilience.md)

🔇 Additional comments (2)
tests/metrics.rs (1)

65-82: Test looks sound

The test correctly asserts that the counter increments under a local recorder.

src/server/connection.rs (1)

41-41: Instrumentation correctly placed

Incrementing the panic counter inside the catch_unwind branch is appropriate.

Comment thread src/metrics.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c0dc4a and 4428c6e.

📒 Files selected for processing (4)
  • docs/hardening-wireframe-a-guide-to-production-resilience.md (1 hunks)
  • src/metrics.rs (2 hunks)
  • src/server/connection.rs (3 hunks)
  • tests/metrics.rs (1 hunks)
🧰 Additional context used
📓 Path-based instructions (4)
docs/**/*.md

📄 CodeRabbit Inference Engine (docs/contents.md)

Conventions for writing project documentation as described in the documentation style guide

docs/**/*.md: Use British English based on the Oxford English Dictionary (en-oxendict) for documentation text.
The word "outwith" is acceptable in documentation.
Keep US spelling when used in an API, for example color.
Use the Oxford comma in documentation text.
Treat company names as collective nouns in documentation (e.g., "Lille Industries are expanding").
Write headings in sentence case in documentation.
Use Markdown headings (#, ##, ###, etc.) in order without skipping levels.
Follow markdownlint recommendations for Markdown files.
Provide code blocks and lists using standard Markdown syntax.
Always provide a language identifier for fenced code blocks; use plaintext for non-code text.
Use - as the first level bullet and renumber lists when items change.
Prefer inline links using [text](url) or angle brackets around the URL; avoid reference-style links like [foo][bar].
Ensure blank lines before and after bulleted lists and fenced blocks in Markdown.
Ensure tables have a delimiter line below the header row in Markdown.
Expand any uncommon acronym on first use, for example, Continuous Integration (CI).
Wrap paragraphs at 80 columns in documentation.
Wrap code at 120 columns in documentation.
Do not wrap tables in documentation.
Use sequentially numbered footnotes referenced with [^1] and place definitions at the end of the file.
Where it adds clarity, include Mermaid diagrams in documentation.
When embedding figures, use ![alt text](path/to/image) and provide concise alt text describing the content.
Add a brief description before each Mermaid diagram in documentation for screen readers.

Document examples showing how to deprecate old message versions gracefully

Write the official documentation for the new features. Create separate guides for "Duplex Messaging & Pushes", "Streaming Responses", and "Message Fragmentation". Each guide must include runna...

Files:

  • docs/hardening-wireframe-a-guide-to-production-resilience.md
docs/**/*.{md,rs}

📄 CodeRabbit Inference Engine (docs/multi-packet-and-streaming-responses-design.md)

docs/**/*.{md,rs}: The official documentation and examples must exclusively use the declarative Response model for handler responses.
The async-stream pattern must be documented as the canonical approach for dynamic stream generation.

Files:

  • docs/hardening-wireframe-a-guide-to-production-resilience.md
**/*.md

📄 CodeRabbit Inference Engine (AGENTS.md)

**/*.md: Documentation must use en-GB-oxendict spelling and grammar. (EXCEPTION: the naming of the LICENSE file, which is to be left unchanged for community consistency.)
Markdown paragraphs and bullet points must be wrapped at 80 columns.
Code blocks in Markdown files must be wrapped at 120 columns.
Tables and headings in Markdown files must not be wrapped.
Use dashes (-) for list bullets in Markdown files.
Use GitHub-flavoured Markdown footnotes ([^1]) for references and footnotes.

Files:

  • docs/hardening-wireframe-a-guide-to-production-resilience.md

⚙️ CodeRabbit Configuration File

**/*.md: * Avoid 2nd person or 1st person pronouns ("I", "you", "we")

  • Use en-GB-oxendict (-ize / -our) spelling and grammar
  • Headings must not be wrapped.
  • Documents must start with a level 1 heading
  • Headings must correctly increase or decrease by no more than one level at a time
  • Use GitHub-flavoured Markdown style for footnotes and endnotes.
  • Numbered footnotes must be numbered by order of appearance in the document.

Files:

  • docs/hardening-wireframe-a-guide-to-production-resilience.md
**/*.rs

📄 CodeRabbit Inference Engine (AGENTS.md)

**/*.rs: Function documentation must include clear examples demonstrating the usage and outcome of the function. Test documentation should omit examples where the example serves only to reiterate the test logic.
No single code file may be longer than 400 lines. Long switch statements or dispatch tables should be broken up by feature and constituents colocated with targets. Large blocks of test data should be moved to external data files.
Clippy warnings MUST be disallowed.
Fix any warnings emitted during tests in the code itself rather than silencing them.
Where a function is too long, extract meaningfully named helper functions adhering to separation of concerns and CQRS.
Where a function has too many parameters, group related parameters in meaningfully named structs.
Where a function is returning a large error consider using Arc to reduce the amount of data returned.
Write unit and behavioural tests for new functionality. Run both before and after making any change.
Every module must begin with a module level (//! ) comment explaining the module's purpose and utility.
Document public APIs using Rustdoc comments (///) so documentation can be generated with cargo doc.
Prefer immutable data and avoid unnecessary mut bindings.
Handle errors with the Result type instead of panicking where feasible.
Avoid unsafe code unless absolutely necessary and document any usage clearly.
Place function attributes after doc comments.
Do not use return in single-line functions.
Use predicate functions for conditional criteria with more than two branches.
Lints must not be silenced except as a last resort.
Lint rule suppressions must be tightly scoped and include a clear reason.
Prefer expect over allow.
Prefer .expect() over .unwrap().
Use concat!() to combine long string literals rather than escaping newlines with a backslash.
Prefer semantic error enums. Derive std::error::Error (via the thiserror crate) for any condition the caller might inspect, retry, or map to an HTTP status....

Files:

  • src/metrics.rs
  • src/server/connection.rs
  • tests/metrics.rs

⚙️ CodeRabbit Configuration File

**/*.rs: * Seek to keep the cyclomatic complexity of functions no more than 12.

  • Adhere to single responsibility and CQRS

  • Place function attributes after doc comments.

  • Do not use return in single-line functions.

  • Move conditionals with >2 branches into a predicate function.

  • Avoid unsafe unless absolutely necessary.

  • Every module must begin with a //! doc comment that explains the module's purpose and utility.

  • Comments and docs must follow en-GB-oxendict (-ize / -our) spelling and grammar

  • Lints must not be silenced except as a last resort.

    • #[allow] is forbidden.
    • Only narrowly scoped #[expect(lint, reason = "...")] is allowed.
    • No lint groups, no blanket or file-wide suppression.
    • Include FIXME: with link if a fix is expected.
  • Use rstest fixtures for shared setup and to avoid repetition between tests.

  • Replace duplicated tests with #[rstest(...)] parameterised cases.

  • Prefer mockall for mocks/stubs.

  • Prefer .expect() over .unwrap()

  • Ensure that any API or behavioural changes are reflected in the documentation in docs/

  • Ensure that any completed roadmap steps are recorded in the appropriate roadmap in docs/

  • Files must not exceed 400 lines in length

    • Large modules must be decomposed
    • Long match statements or dispatch tables should be decomposed by domain and collocated with targets
    • Large blocks of inline data (e.g., test fixtures, constants or templates) must be moved to external files and inlined at compile-time or loaded at run-time.

Files:

  • src/metrics.rs
  • src/server/connection.rs
  • tests/metrics.rs
🔍 MCP Research (1 server)

Deepwiki:

  • The panic handling mechanism in connection tasks is robust and tested: panics in connection tasks are caught and logged without crashing the server, allowing continued acceptance of new connections. This is supported by existing tests and documented in the WireframeServer wiki and test suite. The new metric addition supplements this by providing observability into panic frequency for operational monitoring. (src/server.rs, tests/world.rs, docs/hardening-wireframe-a-guide-to-production-resilience.md)

  • The documentation specifies that to emit the panic metric, the metrics Cargo feature must be enabled and a recorder such as metrics_exporter_prometheus installed to expose an HTTP scraping endpoint. This enables operators to monitor panic rates over time and create Grafana dashboards for visualization and alerting. (docs/hardening-wireframe-a-guide-to-production-resilience.md)

⏰ Context from checks skipped due to timeout of 120000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build-test
🔇 Additional comments (1)
src/server/connection.rs (1)

40-43: Good instrumentation.

Incrementing the panic counter here fulfils the monitoring objective without exposing labels.

Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread src/metrics.rs
Comment thread src/metrics.rs
Comment thread src/server/connection.rs
Comment thread tests/metrics.rs
@leynos leynos merged commit 7bc3f76 into main Aug 8, 2025
5 checks passed
@leynos leynos deleted the codex/add-production-monitoring-for-connection-panics branch August 8, 2025 22:54
@leynos
Copy link
Copy Markdown
Owner Author

leynos commented Aug 8, 2025

@sourcery-ai review

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @leynos - I've reviewed your changes and they look great!

Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments

### Comment 1
<location> `src/server/connection.rs:219` </location>
<code_context>
+    /// Panics increment the connection panic counter.
+    #[rstest]
+    #[tokio::test]
+    async fn connection_panic_metric_increments(
+        factory: impl Fn() -> WireframeApp + Send + Sync + Clone + 'static,
+    ) {
</code_context>

<issue_to_address>
Consider replacing the complex async integration test with a simple unit test that directly checks the metric increment.

Instead of spinning up a real TCP listener, driver tasks and I/O just to verify that `inc_connection_panics()` was called, you can drop that entire async test and replace it with a tiny unit‐test that directly exercises the metric. For example:

```rust
#[cfg(test)]
mod metrics_tests {
    use metrics_util::debugging::{DebugValue, DebuggingRecorder};
    use crate::metrics::{self, CONNECTION_PANICS};

    #[test]
    fn inc_connection_panics_increments_counter() {
        // Arrange: install a debugging recorder
        let recorder = DebuggingRecorder::new();
        let snapshotter = recorder.snapshotter();
        recorder.install().expect("failed to install recorder");

        // Act: invoke the panic‐increment function directly
        metrics::inc_connection_panics();

        // Assert: we saw exactly one increment on the CONNECTION_PANICS counter
        let metrics = snapshotter.snapshot().into_vec();
        assert!(metrics.iter().any(|(key, _, _, value)| {
            key.key().name() == CONNECTION_PANICS
                && matches!(value, DebugValue::Counter(c) if *c == 1)
        }), "expected CONNECTION_PANICS == 1, got {metrics:#?}");
    }
}
```

Steps:
1. Remove the entire `async fn connection_panic_metric_increments` and its TCP boilerplate.
2. Add the above `inc_connection_panics_increments_counter` test.
3. Verify `cargo test` still passes and coverage for `inc_connection_panics()` remains.
</issue_to_address>

### Comment 2
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:168` </location>
<code_context>
 Connection tasks are wrapped with `catch_unwind` to log and discard panics.
 Each panicking connection is isolated so it cannot terminate the entire server.

+Each occurrence also increments the `wireframe_connection_panics_total`
+counter, enabling alerts on unexpected spikes. The counter intentionally omits
+peer address labels to limit cardinality and protect personally identifiable
</code_context>

<issue_to_address>
This paragraph is not wrapped to 80 columns as required by the style guide.

Please wrap this paragraph so that no line exceeds 80 columns, in accordance with the documentation formatting standards.
</issue_to_address>

### Comment 3
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:169` </location>
<code_context>
 Each panicking connection is isolated so it cannot terminate the entire server.

+Each occurrence also increments the `wireframe_connection_panics_total`
+counter, enabling alerts on unexpected spikes. The counter intentionally omits
+peer address labels to limit cardinality and protect personally identifiable
+information. Operators can chart `rate(wireframe_connection_panics_total[5m])`
</code_context>

<issue_to_address>
This line is part of a paragraph that exceeds the 80 column limit.

Ensure all lines in paragraphs and bullet points are wrapped to 80 columns for readability and consistency.
</issue_to_address>

### Comment 4
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:170` </location>
<code_context>

+Each occurrence also increments the `wireframe_connection_panics_total`
+counter, enabling alerts on unexpected spikes. The counter intentionally omits
+peer address labels to limit cardinality and protect personally identifiable
+information. Operators can chart `rate(wireframe_connection_panics_total[5m])`
+in Prometheus and create Grafana panels to visualize instability. To emit this
</code_context>

<issue_to_address>
This line is part of a paragraph that is not wrapped to 80 columns.

Please reformat this paragraph so that each line does not exceed 80 columns.
</issue_to_address>

### Comment 5
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:171` </location>
<code_context>
+Each occurrence also increments the `wireframe_connection_panics_total`
+counter, enabling alerts on unexpected spikes. The counter intentionally omits
+peer address labels to limit cardinality and protect personally identifiable
+information. Operators can chart `rate(wireframe_connection_panics_total[5m])`
+in Prometheus and create Grafana panels to visualize instability. To emit this
+metric, enable the `metrics` Cargo feature and install a recorder such as
</code_context>

<issue_to_address>
This line is part of a paragraph that is not wrapped to 80 columns.

Wrap this paragraph to 80 columns as per documentation standards.
</issue_to_address>

### Comment 6
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:172` </location>
<code_context>
+counter, enabling alerts on unexpected spikes. The counter intentionally omits
+peer address labels to limit cardinality and protect personally identifiable
+information. Operators can chart `rate(wireframe_connection_panics_total[5m])`
+in Prometheus and create Grafana panels to visualize instability. To emit this
+metric, enable the `metrics` Cargo feature and install a recorder such as
+`metrics_exporter_prometheus`, which exposes an HTTP endpoint for scraping.
</code_context>

<issue_to_address>
This line is part of a paragraph that is not wrapped to 80 columns.

Please ensure all paragraph lines are wrapped to 80 columns.
</issue_to_address>

### Comment 7
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:173` </location>
<code_context>
+peer address labels to limit cardinality and protect personally identifiable
+information. Operators can chart `rate(wireframe_connection_panics_total[5m])`
+in Prometheus and create Grafana panels to visualize instability. To emit this
+metric, enable the `metrics` Cargo feature and install a recorder such as
+`metrics_exporter_prometheus`, which exposes an HTTP endpoint for scraping.
+
</code_context>

<issue_to_address>
This line is part of a paragraph that is not wrapped to 80 columns.

Reformat this paragraph so that each line is wrapped to 80 columns.
</issue_to_address>

### Comment 8
<location> `docs/hardening-wireframe-a-guide-to-production-resilience.md:174` </location>
<code_context>
+information. Operators can chart `rate(wireframe_connection_panics_total[5m])`
+in Prometheus and create Grafana panels to visualize instability. To emit this
+metric, enable the `metrics` Cargo feature and install a recorder such as
+`metrics_exporter_prometheus`, which exposes an HTTP endpoint for scraping.
+
 ### 3.2 Leak-Proof Registries with `Weak`/`Arc`
</code_context>

<issue_to_address>
This line is part of a paragraph that is not wrapped to 80 columns.

Please wrap this paragraph to 80 columns for consistency with the documentation style guide.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread src/server/connection.rs
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Comment thread docs/hardening-wireframe-a-guide-to-production-resilience.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add metrics/monitoring for caught panics in connection tasks

1 participant