Bug 1981549: lib/resourcemerge: handle container env var deletions#2800
Conversation
The logic in the resourcemerge functions only iterate through required variables, meaning any removed variable is not handled. As a fix to bug 1981549, this adds removal check for env vars, which ensures that e.g. a removed proxy will correctly propagate to the daemonset definition, which needs the proxy injected as an environment variable to allow MCD to pull os image updates. This of course is also a problem for everything else being synced via these lib functions, but for now I only added the fix to EnvVar for proxy, as it is the most likely issue for users to hit. If we want to fix all other variables, we should probably also consider reworking the resource* libraries in general, since they are outdated and error prone. To test, you can pause the pool, add a cluster proxy and then remove it, checking the MCD daemonset after both steps to see the proxy being added/removed. Interestingly, the adding of the proxy is almost instant, whereas the removal can take up to 10 minutes due to the MCO seemingly not resyncing (no action from proxy informer?). I am investigating that separately. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
This aims to add some initial tests for EnvVar handling specifically, to make sure we don't regress proxy. Other unit tests should be added as we clean up the code. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
|
@yuqi-zhang: This pull request references Bugzilla bug 1981549, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@yuqi-zhang: This pull request references Bugzilla bug 1981549, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/retest |
This code has to exist in other places...what are other operators using? Also I'd guess all "gitops" things like argocd must have similar code. |
|
CVO, as an example, has their own fork as well: https://github.com/openshift/cluster-version-operator/tree/master/lib. There is a card https://issues.redhat.com/browse/GRPA-3832 but I am not sure what came out of that discussion |
|
In the past there was an attempt in MCO repo to move away from using old version of resourcemerge and move to use library-go #829 but didn't made it to get merged. We have corresponding jira card https://issues.redhat.com/browse/MCO-17 as well, if it makes sense we can prioritize to get it fixed. |
I think it still makes sense, although I am not sure how much effort that would be, since it looks like the library-go version and the MCO fork now looks significantly different. Since this is blocking CI for other teams, would you be ok with merging this as a temp fix, and we can re-evaluate MCO-17 during a later planning? |
sinnykumari
left a comment
There was a problem hiding this comment.
Thanks Jerry for fixing this!
Tested locally, proxy changes get applied as expected to MCD pods.
/lgtm
Yes, that's what I meant. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sinnykumari, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
4 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@yuqi-zhang: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
3 similar comments
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@yuqi-zhang: All pull requests linked via external trackers have merged: Bugzilla bug 1981549 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@palonsoro: new pull request created: #3057 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
(I removed my cherry-pick comment because I thought it had a typo but it didn't) |
The logic in the resourcemerge functions only iterate through required
variables, meaning any removed variable is not handled.
As a fix to bug 1981549, this adds removal check for env vars, which
ensures that e.g. a removed proxy will correctly propagate to the
daemonset definition, which needs the proxy injected as an environment
variable to allow MCD to pull os image updates.
This of course is also a problem for everything else being synced via
these lib functions, but for now I only added the fix to EnvVar for
proxy, as it is the most likely issue for users to hit. If we want to
fix all other variables, we should probably also consider reworking
the resource* libraries in general, since they are outdated and error
prone.
To test, you can pause the pool, add a cluster proxy and then remove
it, checking the MCD daemonset after both steps to see the proxy being
added/removed. Interestingly, the adding of the proxy is almost instant,
whereas the removal can take up to 10 minutes due to the MCO seemingly
not resyncing (no action from proxy informer?). I am investigating that
separately.
Also add some initial tests for EnvVar handling specifically, to make sure
we don't regress proxy. Other unit tests should be added as we clean up
the code.