Add metrics/monitoring for caught panics in connection tasks

## Context

Following PR #215 which implemented panic handling for connection tasks, we should add production monitoring capabilities to track how often connection panics occur.

## Problem

Currently, panics in connection tasks are caught and logged, but there's no easy way for operators to:
- Monitor panic frequency in production
- Set up alerts for elevated panic rates
- Analyze trends in connection task failures
- Correlate panics with specific deployment changes or traffic patterns

## Proposed Solutions

### Option 1: Metrics Counter
Add a metrics counter using a standard metrics library:

```rust
use metrics::counter;

// In the panic handling code
tracing::error!("Connection task panicked: {:?}, peer_addr: {:?}", panic_info, peer_addr);
counter!("wireframe.connection.panics", 1, "peer_addr" => peer_addr.to_string());
```

### Option 2: Custom Hook/Callback
Expose a configurable hook for panic events:

```rust
pub trait PanicHandler: Send + Sync {
    fn on_connection_panic(&self, peer_addr: Option<SocketAddr>, panic_info: &dyn Any);
}

// In server configuration
impl ServerBuilder {
    pub fn with_panic_handler<H: PanicHandler + 'static>(mut self, handler: H) -> Self {
        self.panic_handler = Some(Box::new(handler));
        self
    }
}
```

### Option 3: Structured Logging Enhancement
Enhance the existing tracing with structured fields for easier monitoring:

```rust
tracing::error!(
    peer_addr = ?peer_addr,
    panic_type = std::any::type_name_of_val(&**panic_info),
    "Connection task panicked"
);
```

## Implementation Considerations

1. **Minimal overhead**: Metrics collection should not impact performance
2. **Configurable**: Allow disabling metrics collection if not needed
3. **Standard integration**: Use common metrics libraries (prometheus, statsd, etc.)
4. **Privacy**: Avoid logging sensitive connection data in metrics

## Benefits

- **Operational visibility**: Monitor connection stability in production
- **Proactive debugging**: Identify problematic connection patterns
- **Performance insights**: Correlate panics with system load/configuration
- **SLA monitoring**: Track reliability metrics for the server

## Acceptance Criteria

- [ ] Metrics are collected when connection tasks panic
- [ ] Metrics include relevant dimensions (peer_addr pattern, timestamp)
- [ ] Metrics collection can be disabled via configuration
- [ ] Documentation explains how to set up monitoring dashboards
- [ ] Minimal performance impact on happy path

## References

- PR #215: Implement panic handling improvements  
- Comment: //pull/215#issuecomment-1753556519
- Issues: https://github.com/leynos/wireframe/issues/217

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metrics/monitoring for caught panics in connection tasks #217

Context

Problem

Proposed Solutions

Option 1: Metrics Counter

Option 2: Custom Hook/Callback

Option 3: Structured Logging Enhancement

Implementation Considerations

Benefits

Acceptance Criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add metrics/monitoring for caught panics in connection tasks #217

Description

Context

Problem

Proposed Solutions

Option 1: Metrics Counter

Option 2: Custom Hook/Callback

Option 3: Structured Logging Enhancement

Implementation Considerations

Benefits

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions