-
Notifications
You must be signed in to change notification settings - Fork 97
Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@tssurya: This pull request references Bugzilla bug 1848478, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
You should add unit tests in the corresponding files 😄 |
|
/assign @danwinship |
I know :) and I tried, but there are no corresponding files for subnets.go. I might have to start one or let me see if this stuff gets tested elsewhere. |
don't worry for subnets_test.go by now, at least the ones that exist, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I think we should do in case of failure is, verify is the configuration in kubectl.kubernetes.io/last-applied-configuration is valid, if it's valid apply the egressIPs and egressCIDRs and if it's not only remove the bad ones. @danwinship do you agree?
I just did a quick test and in the case of patching the hostsubnet with oc patch (most of our users will do that because that's how our docs tell to do it) there is no kubectl.kubernetes.io/last-applied-configuration. So at the end of the day it's less like for it to be updated than updated or present.
Removing the bad ones is just the safest IMO.
danwinship
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the real fix here is that we need to have an admission controller...
hmm I'll have to dig in more to see the details of adding an admission controller. Anyhow, in this particular case, we used to check the invalid egressCIDR values set by the users in 3.11. But in 4.x we simply don't do anything and let the values get set and then blowup when the sdn worker pods get restarted. So the user understandably feels like we caused a regression. |
|
/test e2e-gcp |
|
/retest |
HostSubnet is half system-maintained (subnet allocations) and half user-maintained (EgressIPs). If there is any invalid value that is set on the user-maintained fields, we should just ignore them and continue with the SDN startup procedure instead of failing. In this patch we remove the validation of the user-maintained fields from ValidateHostSubnet function. We can do a separate validation for the user-maintained fields. Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com
|
/test e2e-gcp |
|
/lgtm |
|
@danwinship : PTAL whenever you have some time. |
We don't have a proper validation procedure for egress ips/cidrs. This patch adds a ValidateHostSubnetEgress function which will check for the validity of the user-maintained values in the hostsubet object. Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danwinship, juanluisvaladas, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@tssurya: All pull requests linked via external trackers have merged: Bugzilla bug 1848478 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release 4.5 |
|
@tssurya: cannot checkout release 4.5: error checking out release 4.5: exit status 1. output: error: pathspec 'release 4.5' did not match any file(s) known to git DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.5 |
|
@tssurya: new pull request created: #187 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/cherry-pick release-4.4 |
|
@tssurya: #169 failed to apply on top of branch "release-4.4": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
If the user sets an invalid egressCIDR value, the API accepts this
and sets it in the hostsubnet field making the user think all went
well. However in the background sdn emits a warning in the logs
saying it ignores the invalid hostsubnet value but the user does
not become aware of this. Upon restarting the sdn-node pod, the pod
fails to come up and barfs that the egressCIDR is invalid.
This patch fixes this by making the sdn-controller wipe out the
invalid egressCIDR/IP field once the validation fails. This is the
same mechanism that the controller uses upon start-up when it comes
across an invalid hostsubnet.
Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com