summarize: Consider obj status condition in result#703
Merged
stefanprodan merged 1 commit intomainfrom May 3, 2022
Merged
Conversation
80a0261 to
f7dd60a
Compare
darkowlzz
commented
Apr 29, 2022
| afterFunc: func(t *WithT, obj conditions.Setter, patchOpts *patch.HelperOptions) { | ||
| t.Expect(patchOpts.IncludeStatusObservedGeneration).To(BeFalse()) | ||
| }, | ||
| }, |
Contributor
Author
There was a problem hiding this comment.
Duplicate test case.
f7dd60a to
d64def3
Compare
SummarizeAndPatch() should also consider the object's status conditions when computing and returning the runtime results to avoid any inconsistency in the runtime result and status condition of the object. When an object's Ready condition is False, the reconciler should retry unless it's in stalled condition. Signed-off-by: Sunny <darkowlzz@protonmail.com>
d64def3 to
2240106
Compare
stefanprodan
approved these changes
Apr 30, 2022
Member
stefanprodan
left a comment
There was a problem hiding this comment.
LGTM
Thanks @darkowlzz 🏅
starlingx-github
pushed a commit
to starlingx/ansible-playbooks
that referenced
this pull request
Dec 7, 2022
There are several FluxCD issues which were attempted to be fixed by
implementing recovery logic into app framework. A new issue was
observed that may be hit more often because of the recovery logic.
Flux won't attempt to finish updating a HelmRelease resource to
Ready=True but only display a message containing this signature:
'''the object has been modified...'''. The resource must be correctly
updated by Flux.
Aside from this the recovery logic was put in place for HelmChart
resources that are in Ready=False state, but this should have been
handled by Flux.
I believe the 2 issues described above map to [1] and [2].
To pull [1] and [2] updated FluxCD release to latest available [3].
Release is composed of manifests and container image used.
Update containers used: helm-controller from v0.15.0 to v0.27.0,
source-controller from v0.20.1 to v0.32.1.
Update flux crds, rbac, deployment.
Changelog claims we get k8s 1.25 support part of the upversion.
Disclaimer for tests:
1) Recovery logic concerned with flipping spec.suspend was removed.
Introduced by ongoing [4]
2) optimization by flipping spec.suspend was removed
Introduced by ongoing[4]
3) cert-manager, nginx-ingress-controller, platform-integ-apps had the
reconciliation interval decreased to 1m to allow Flux to manage the
resources by itself in a reasonable time interval.
There will be future commits per app updating reconciliation interval.
Tests on AIO-SX:
PASS: bootstrap
PASS: unlocked enabled available
PASS: apps applied
PASS: tested on k8s 1.24.4 and 1.21.8, inherit that versions between
should be OK
PASS: delete pods and see they are recreated and no errors
PASS: inspect flux pod logs for errors
PASS: re-test known trigger for 1996747 and 1995748
[1]: fluxcd/source-controller#703
[2]: fluxcd/source-controller#202
[3]: https://github.com/fluxcd/flux2/releases/tag/v0.37.0
[4]: https://review.opendev.org/c/starlingx/config/+/866862
Related-Bug: 1995748
Related-Bug: 1996747
Closes-Bug: 1999032
Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Change-Id: Id295599a2946f48081e7d27e2ab8e06063c3c88d
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
SummarizeAndPatch()should also consider the object's status conditionswhen computing and returning the runtime results to avoid any
inconsistency in the runtime result and status condition of the object.
When an object's Ready conditions are False, the reconciler should retry
unless it's in stalled state.
Detailed description
Summarize and patch takes the abstracted result and error, and computes the runtime result from it. Separately, it takes the object with status conditions and calculates the status summary and patches the object status.
This can result in a situation where the (sub)reconcilers return a successful result without any error, resulting in a success runtime result. But a negative status condition on the object set by one of the sub-reconcilers would result in the Ready status condition to be False. As a result of this, an object would have Ready=False status and the reconciler wouldn't be attempting any retry to reconcile it. We expect this behavior to only happen when the status condition Stalled=True. Unless Ready=True, the reconciler should always retry.
To make the model consistent, the summarize and patch process can take into consideration the status condition summary for Ready condition.
Runtime result behavior with respect to Ready condition:
If Ready=True, but the abstracted result is
ResultRequeueand nil error, the runtime result should haveRequeue: true.If Ready=True, but the abstracted result is
ResultSuccessand nil error, the runtime result should haveRequeueAfter: <interval>.If Ready=True, but the reconciler error is not nil, a runtime error is returned, resulting in runtime requeuing the object for reconciliation. Since an error in itself can't be used to set the Ready=False with reason and message, we can't change the status condition to reflect this failure. The retry will happen in the background without anything visible in the object status.
[New behavior] If Ready=False, but the abstracted result is Success and reconciler error is nil, the reconciler error can be changed to reflect the reason for Ready=False by setting the error to be the Ready condition message. This will result in the object to have Ready=False and the reconciler performing a retry.