fix: prefer Stalled=True as failure witness, fall back to newest condition#45
Merged
LittleChimera merged 1 commit intomainfrom May 2, 2026
Merged
Conversation
…ition
getFailureConditionTime previously scanned for any "failure-indicating"
condition (Stalled=True, Ready=False, Progressing=False, etc.) and returned
the latest LastTransitionTime among them. That worked for resources with a
stable failure signal but mis-attributed the witness on resources whose
"failure" conditions cycle (e.g. Kruise Rollout's Stalled flips False→True
during retry-clear-set, then back). The cycling timestamp leaked into
HealthCheck.LastErrorTime and made post-retry failures look fresh to the
rollout-controller's bake check.
Switch to a layered witness:
1. Prefer Stalled=True.LastTransitionTime when present — the kstatus-standard
authoritative failure signal.
2. Otherwise fall back to the newest LastTransitionTime across any condition,
which gives the caller a stable witness instead of nil for resources that
don't expose Stalled.
Stops the "every-condition-is-failure" scan and removes the misleading
isFailureCondition helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
getFailureConditionTime previously scanned for any "failure-indicating" condition (Stalled=True, Ready=False, Progressing=False, etc.) and returned the latest LastTransitionTime among them. That worked for resources with a stable failure signal but mis-attributed the witness on resources whose "failure" conditions cycle (e.g. Kruise Rollout's Stalled flips False→True during retry-clear-set, then back). The cycling timestamp leaked into HealthCheck.LastErrorTime and made post-retry failures look fresh to the rollout-controller's bake check.
Switch to a layered witness:
Stops the "every-condition-is-failure" scan and removes the misleading isFailureCondition helper.