Sporadically incorrect remediation actions

I have noticed that one of our end-to-end tests is flaky (`install-test-fail`), and sometimes acts in such a way that the failures counter suddenly resets and the wrong remediation strategy is taken.

From a first observation of the logs this problem does not seem to just exist for the end-to-end tests, but may also occur in real environments. I have a suspicion that it may have to do with the fact that we do not retry failed status updates, and seem to be running into `the object has been modified; please apply your changes to the latest version and try again` errors quote often, especially for the `install-test-fail` scenario.

This may be an indication that:

1. Multiple processes are touching the same resource
2. The (during reconciliation) modified object is not passed on correctly everywhere, and we have a split brain problem (aggravated by the fact that we do not perform retries on failed updates)

[34_Debug failure.txt](https://github.com/fluxcd/helm-controller/files/5404562/34_Debug.failure.txt)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadically incorrect remediation actions #120

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sporadically incorrect remediation actions #120

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions