I have noticed that one of our end-to-end tests is flaky (install-test-fail), and sometimes acts in such a way that the failures counter suddenly resets and the wrong remediation strategy is taken.
From a first observation of the logs this problem does not seem to just exist for the end-to-end tests, but may also occur in real environments. I have a suspicion that it may have to do with the fact that we do not retry failed status updates, and seem to be running into the object has been modified; please apply your changes to the latest version and try again errors quote often, especially for the install-test-fail scenario.
This may be an indication that:
- Multiple processes are touching the same resource
- The (during reconciliation) modified object is not passed on correctly everywhere, and we have a split brain problem (aggravated by the fact that we do not perform retries on failed updates)
34_Debug failure.txt
I have noticed that one of our end-to-end tests is flaky (
install-test-fail), and sometimes acts in such a way that the failures counter suddenly resets and the wrong remediation strategy is taken.From a first observation of the logs this problem does not seem to just exist for the end-to-end tests, but may also occur in real environments. I have a suspicion that it may have to do with the fact that we do not retry failed status updates, and seem to be running into
the object has been modified; please apply your changes to the latest version and try againerrors quote often, especially for theinstall-test-failscenario.This may be an indication that:
34_Debug failure.txt