feat: Improve Coraza metrics#1835
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughCoraza middleware metrics API overhaul replacing separate RecordRequest/RecordLatency calls with unified ObserveRequest using EWMA-based "recent" metrics. Added proper WAF instance cleanup on reload and improved request-scoped snapshot handling. Updated documentation to clarify security implications and observability behavior. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 18 minutes and 49 seconds.Comment |
There was a problem hiding this comment.
Code Review
This pull request enhances the Coraza middleware by implementing lock-free EWMA-based metrics for recent latency and block rates, replacing the previous online average calculation. It also improves resource management by ensuring old WAF instances are properly closed during initialization and reloads. Documentation has been updated with security recommendations regarding rule identifiers and trusted proxy configurations. Feedback suggests optimizing the middleware's hot path by consolidating engine state retrieval to minimize redundant lock acquisitions.
| metrics := e.Metrics() | ||
| metrics.RecordRequest() | ||
| blocked := false | ||
|
|
||
| defer func() { | ||
| metrics.RecordLatency(time.Since(startTime)) | ||
| if metrics != nil { | ||
| metrics.ObserveRequest(time.Since(startTime), blocked) | ||
| } | ||
| }() | ||
|
|
||
| currentWAF, currentSupportsOptions, currentWAFWithOptions, currentErr := e.snapshot() |
There was a problem hiding this comment.
The middleware performs multiple redundant lock acquisitions on the engine's mutex for every request (via e.Metrics() at line 232 and e.snapshot() at line 241). While RLock is generally inexpensive, reducing atomic operations in the hot path is beneficial for a high-performance middleware like Fiber. Consider refactoring to capture all necessary engine state (WAF instance, metrics collector, and configuration) in a single protected call or a unified snapshot at the beginning of the handler to minimize lock contention.
There was a problem hiding this comment.
Resolved in the latest commit.
The middleware now reads WAF state, metrics, and block message through a single internal snapshot, so the hot path no longer acquires separate read locks for those values.
There was a problem hiding this comment.
Pull request overview
This PR improves the Coraza middleware’s observability by switching request “trend” metrics to EWMA-based values, simplifying collector recording to a single call, and tightening documentation guidance around security responses and reverse-proxy IP handling.
Changes:
- Replace lifetime average latency (
AvgLatencyMs) with EWMA-based trend metrics (RecentLatencyMs,RecentBlockRate) while keeping cumulative counters/rates. - Update middleware/engine lifecycle behavior to close the previous WAF instance on successful init/reload.
- Update docs and tests to match the new metrics model and provide safer security guidance.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| v3/coraza/metrics.go | Reworks metrics collector API + snapshot fields; implements EWMA tracking. |
| v3/coraza/coraza.go | Updates middleware metrics recording and closes previous WAF on init/reload. |
| v3/coraza/coraza_test.go | Updates/extends tests for EWMA metrics and WAF closing behavior. |
| v3/coraza/README.md | Documents new metrics fields and adds proxy/trusted-proxy security notes. |
| README.md | Adds Coraza workflow status badge to top-level README list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // MetricsSnapshot represents the current request metrics for an Engine. | ||
| type MetricsSnapshot struct { | ||
| // TotalRequests is the number of requests observed by the middleware. | ||
| TotalRequests uint64 `json:"total_requests"` | ||
| // BlockedRequests is the number of requests interrupted by the WAF. | ||
| BlockedRequests uint64 `json:"blocked_requests"` | ||
| // AvgLatencyMs is the average middleware latency in milliseconds. | ||
| AvgLatencyMs float64 `json:"avg_latency_ms"` | ||
| // BlockRate is the ratio of blocked requests to total requests. | ||
| // BlockRate is the cumulative ratio of blocked requests to total requests. | ||
| BlockRate float64 `json:"block_rate"` | ||
| // RecentLatencyMs is the EWMA of recent middleware latency in milliseconds. | ||
| RecentLatencyMs float64 `json:"recent_latency_ms"` | ||
| // RecentBlockRate is the EWMA of recent blocked-request outcomes. | ||
| RecentBlockRate float64 `json:"recent_block_rate"` | ||
| // Timestamp is when the snapshot was generated. | ||
| Timestamp time.Time `json:"timestamp"` |
There was a problem hiding this comment.
Thanks for the note.
I’m intentionally keeping this change. The module is still pre-v1.0.0 (v0.1.1), and this PR deliberately moves the metrics model from lifetime average latency to EWMA-based recent metrics, so I do not plan to restore AvgLatencyMs here.
| SecRule ARGS:attack "@streq 1" "id:1001,phase:2,deny,status:403,msg:'attack detected'"` | ||
|
|
||
| func assertFloat64Within(t *testing.T, got, want float64, label string) { | ||
| t.Helper() |
There was a problem hiding this comment.
Resolved in the latest commit.
The helper now explicitly rejects NaN and Inf before applying the epsilon-based comparison.
| // MetricsCollector records lightweight request metrics for a Coraza Engine. | ||
| type MetricsCollector interface { | ||
| RecordRequest() | ||
| RecordBlock() | ||
| RecordLatency(duration time.Duration) | ||
| ObserveRequest(duration time.Duration, blocked bool) | ||
| GetMetrics() *MetricsSnapshot | ||
| Reset() | ||
| } |
There was a problem hiding this comment.
Thanks for the note.
I’m intentionally keeping this API change. The module is still pre-v1.0.0 (v0.1.1), and the new ObserveRequest(duration, blocked) shape is part of the metrics refactor itself, so I do not plan to preserve the old collector interface in this PR.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@v3/coraza/coraza.go`:
- Around line 438-443: Remove the unused Engine.snapshot method: delete the
entire func (e *Engine) snapshot() implementation (which returns e.waf,
e.supportsOptions, e.wafWithOptions, e.initErr) because middlewareSnapshot()
supersedes it; then search and replace any remaining callers to use
middlewareSnapshot() if found and run golangci-lint to verify the
unused-function lint error is resolved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d7c63f48-7660-4e4b-8300-9eb458e8f7d9
📒 Files selected for processing (5)
README.mdv3/coraza/README.mdv3/coraza/coraza.gov3/coraza/coraza_test.gov3/coraza/metrics.go
📜 Review details
🧰 Additional context used
🪛 GitHub Actions: Linter
v3/coraza/coraza.go
[error] 438-438: golangci-lint: func (*Engine).snapshot is unused (unused)
🪛 GitHub Check: lint / lint (v3/coraza)
v3/coraza/coraza.go
[failure] 459-459:
func (*Engine).blockMessageValue is unused (unused)
[failure] 438-438:
func (*Engine).snapshot is unused (unused)
Reopened from #1809 — original was closed due to a broken force-push that orphaned main's history (now restored).
Summary by CodeRabbit
Release Notes
New Features
Improvements