Add prometheus throttle#283
Conversation
f759e54 to
d5f5074
Compare
| throttledResult, _, err := c.prometheusAPI.Query(ctx, throttledQuery, queryTime) | ||
| if err != nil { | ||
| return 0, fmt.Errorf("error querying throttled periods: %w", err) | ||
| } | ||
|
|
||
| // Query total periods rate | ||
| totalResult, _, err := c.prometheusAPI.Query(ctx, totalQuery, queryTime) |
There was a problem hiding this comment.
💡 Performance: Two separate Prometheus queries could be combined into one
The collectContainerCPUThrottleMetrics function issues two separate Prometheus queries — one for container_cpu_cfs_throttled_periods_total and one for container_cpu_cfs_periods_total. In clusters with many containers, this doubles the query load for this feature (2 extra queries × N containers per collection cycle).
These can be combined into a single PromQL expression that computes the fraction server-side:
sum(rate(container_cpu_cfs_throttled_periods_total{...}[5m])) / sum(rate(container_cpu_cfs_periods_total{...}[5m]))
This returns NaN when the denominator is zero, which you already handle with the guard at line 763-766. This halves the Prometheus query load for this metric.
Was this helpful? React with 👍 / 👎
Code Review 👍 Approved with suggestions 1 resolved / 3 findingsClean implementation of CPU throttle metrics with good NaN/Inf handling. The previous finding about missing 💡 Performance: Two separate Prometheus queries could be combined into one📄 internal/collector/container_resource_collector.go:737-743 The These can be combined into a single PromQL expression that computes the fraction server-side: This returns 💡 Quality: CPU throttle queries not gated by DisableNetworkIOMetrics flag📄 internal/collector/container_resource_collector.go:370-380 The CPU throttle metric collection at line 370-380 runs whenever Currently this doesn't cause a problem because the Prometheus client is only initialized when both Consider either:
✅ 1 resolved✅ Edge Case: NaN from Prometheus bypasses clamping, propagates into gRPC
OptionsAuto-apply is off → Gitar will not commit updates to this branch. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
* Add prometheus throttle * Fix up math issue * Update proto * Update readme
[Title]
📚 Description of Changes
Provide an overview of your changes and why they're needed. Link to any related issues (e.g., "Fixes #123"). If your PR fixes a bug, resolves a feature request, or updates documentation, please explain how.
What Changed:
(Describe the modifications, additions, or removals.)
Why This Change:
(Explain the problem this PR addresses or the improvement it provides.)
Affected Components:
(Which component does this change affect? - put x for all components)
Compose
K8s
Other (please specify)
❓ Motivation and Context
Why is this change required? What problem does it solve?
Context:
(Provide background information or link to related discussions/issues.)
Relevant Tasks/Issues:
(e.g., Fixes: #GitHub Issue)
🔍 Types of Changes
Indicate which type of changes your code introduces (check all that apply):
🔬 QA / Verification Steps
Describe the steps a reviewer should take to verify your changes:
make testto verify all tests pass.")make create-kind && make deploy.")✅ Global Checklist
Please check all boxes that apply:
Summary by Gitar
cpu_throttle_fraction(field 18) toContainerMetricItemproto for tracking CPU CFS throttle ratios (0.0-1.0)collectContainerCPUThrottleMetrics()queryingcontainer_cpu_cfs_throttled_periods_totalandcontainer_cpu_cfs_periods_totalwith 5-minute rate windowsmpa_server.goBroadcast()to includeCpuThrottleFractionin real-time metric streams to connected clientsThis will update automatically on new commits.