better oom detection#370
Conversation
CI failed: The CI build failed due to a transient infrastructure error during the setup of the Kind environment, unrelated to the code changes.OverviewThe CI pipeline encountered a failure during the environment setup phase where the FailuresKind Installation Checksum Failure (confidence: high)
Summary
Code Review ✅ Approved 3 resolved / 3 findingsImproves OOM detection logic by resolving data races on reconciler fields, implementing pagination for pod sweeps, and centralizing snapshot construction. No issues found. ✅ 3 resolved✅ Bug: Data race on mpaPublisher and oomReconciler fields
✅ Performance: Sweep lists all pods without filtering or pagination
✅ Quality: Duplicate OOM snapshot construction in two places
Tip Comment Was this helpful? React with 👍 / 👎 | Gitar |
Pull request was closed
* better oom detection * fix data race on mpaPublisher/oomReconciler
* better oom detection * fix data race on mpaPublisher/oomReconciler
[Title]
📚 Description of Changes
Provide an overview of your changes and why they’re needed. Link to any related issues (e.g., "Fixes #123"). If your PR fixes a bug, resolves a feature request, or updates documentation, please explain how.
What Changed:
(Describe the modifications, additions, or removals.)
Why This Change:
(Explain the problem this PR addresses or the improvement it provides.)
Affected Components:
(Which component does this change affect? - put x for all components)
Compose
K8s
Other (please specify)
❓ Motivation and Context
Why is this change required? What problem does it solve?
Context:
(Provide background information or link to related discussions/issues.)
Relevant Tasks/Issues:
(e.g., Fixes: #GitHub Issue)
🔍 Types of Changes
Indicate which type of changes your code introduces (check all that apply):
🔬 QA / Verification Steps
Describe the steps a reviewer should take to verify your changes:
make testto verify all tests pass.")make create-kind && make deploy.")✅ Global Checklist
Please check all boxes that apply:
Summary by Gitar
OOMReconcilerfor periodic K8s API sweeps to capture OOM events missed by informer-based collectors.combinedChannel.ReasonOOMKilledandReasonStartErrorconstants.MpaServerbroadcast logic to correctly populate theOomKillCountfield in gRPC metric items.This will update automatically on new commits.