Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup #169

tssurya · 2020-07-28T14:20:17Z

If the user sets an invalid egressCIDR value, the API accepts this
and sets it in the hostsubnet field making the user think all went
well. However in the background sdn emits a warning in the logs
saying it ignores the invalid hostsubnet value but the user does
not become aware of this. Upon restarting the sdn-node pod, the pod
fails to come up and barfs that the egressCIDR is invalid.

This patch fixes this by making the sdn-controller wipe out the
invalid egressCIDR/IP field once the validation fails. This is the
same mechanism that the controller uses upon start-up when it comes
across an invalid hostsubnet.

Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com

openshift-ci-robot · 2020-07-28T14:20:24Z

@tssurya: This pull request references Bugzilla bug 1848478, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target release (4.6.0) matches configured target release for branch (4.6.0)
bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Details

In response to this:

Bug 1848478: Invalid egressCIDR value is silently ignored

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

aojea · 2020-07-29T09:36:32Z

You should add unit tests in the corresponding files 😄

pkg/network/common/validation.go

aojea · 2020-07-29T09:45:11Z

/assign @danwinship

tssurya · 2020-07-29T11:13:12Z

You should add unit tests in the corresponding files

I know :) and I tried, but there are no corresponding files for subnets.go. I might have to start one or let me see if this stuff gets tested elsewhere.

aojea · 2020-07-29T13:45:37Z

You should add unit tests in the corresponding files

I know :) and I tried, but there are no corresponding files for subnets.go. I might have to start one or let me see if this stuff gets tested elsewhere.

don't worry for subnets_test.go by now, at least the ones that exist, pkg/network/common/validation_test.go and pkg/network/common/egressip_test.go to cover the new functionality and avoid regressions

juanluisvaladas

What I think we should do in case of failure is, verify is the configuration in kubectl.kubernetes.io/last-applied-configuration is valid, if it's valid apply the egressIPs and egressCIDRs and if it's not only remove the bad ones. @danwinship do you agree?

I just did a quick test and in the case of patching the hostsubnet with oc patch (most of our users will do that because that's how our docs tell to do it) there is no kubectl.kubernetes.io/last-applied-configuration. So at the end of the day it's less like for it to be updated than updated or present.

Removing the bad ones is just the safest IMO.

pkg/network/common/validation.go

danwinship

So the real fix here is that we need to have an admission controller...

pkg/network/master/subnets.go

pkg/network/common/validation.go

tssurya · 2020-08-13T11:16:28Z

So the real fix here is that we need to have an admission controller...

hmm I'll have to dig in more to see the details of adding an admission controller.

Anyhow, in this particular case, we used to check the invalid egressCIDR values set by the users in 3.11. But in 4.x we simply don't do anything and let the values get set and then blowup when the sdn worker pods get restarted. So the user understandably feels like we caused a regression.

tssurya · 2020-08-25T09:50:50Z

/test e2e-gcp
/test e2e-aws

tssurya · 2020-08-26T11:44:56Z

/retest

pkg/network/node/subnets.go

HostSubnet is half system-maintained (subnet allocations) and half user-maintained (EgressIPs). If there is any invalid value that is set on the user-maintained fields, we should just ignore them and continue with the SDN startup procedure instead of failing. In this patch we remove the validation of the user-maintained fields from ValidateHostSubnet function. We can do a separate validation for the user-maintained fields. Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com

tssurya · 2020-08-28T11:10:32Z

/test e2e-gcp
/test e2e-aws

pkg/network/common/common_test.go

juanluisvaladas · 2020-09-07T09:27:14Z

/lgtm

tssurya · 2020-09-08T10:03:49Z

@danwinship : PTAL whenever you have some time.

pkg/network/common/common_test.go

pkg/network/common/egressip.go

We don't have a proper validation procedure for egress ips/cidrs. This patch adds a ValidateHostSubnetEgress function which will check for the validity of the user-maintained values in the hostsubet object. Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com

danwinship · 2020-09-09T11:39:09Z

/lgtm

openshift-ci-robot · 2020-09-09T11:39:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, juanluisvaladas, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [danwinship]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2020-09-09T11:44:22Z

@tssurya: All pull requests linked via external trackers have merged:

openshift/sdn#169

Bugzilla bug 1848478 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tssurya · 2020-09-14T08:05:31Z

/cherry-pick release 4.5
/cherry-pick release 4.4

openshift-cherrypick-robot · 2020-09-14T08:05:40Z

@tssurya: cannot checkout release 4.5: error checking out release 4.5: exit status 1. output: error: pathspec 'release 4.5' did not match any file(s) known to git

Details

In response to this:

/cherry-pick release 4.5
/cherry-pick release 4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tssurya · 2020-09-14T08:06:14Z

/cherry-pick release-4.5
/cherry-pick release-4.4

openshift-cherrypick-robot · 2020-09-14T08:06:21Z

@tssurya: new pull request created: #187

Details

In response to this:

/cherry-pick release-4.5
/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tssurya · 2020-09-23T08:43:59Z

/cherry-pick release-4.4

openshift-cherrypick-robot · 2020-09-23T08:44:02Z

@tssurya: #169 failed to apply on top of branch "release-4.4":

Applying: Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup
Using index info to reconstruct a base tree...
M	pkg/network/common/validation.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/network/common/validation.go
CONFLICT (content): Merge conflict in pkg/network/common/validation.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jul 28, 2020

openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 28, 2020

openshift-ci-robot requested review from aojea and dcbw July 28, 2020 14:20

aojea reviewed Jul 29, 2020

View reviewed changes

pkg/network/common/validation.go Outdated Show resolved Hide resolved

openshift-ci-robot assigned danwinship Jul 29, 2020

juanluisvaladas suggested changes Aug 6, 2020

View reviewed changes

pkg/network/common/validation.go Outdated Show resolved Hide resolved

danwinship reviewed Aug 7, 2020

View reviewed changes

pkg/network/master/subnets.go Outdated Show resolved Hide resolved

pkg/network/common/validation.go Outdated Show resolved Hide resolved

tssurya changed the title ~~Bug 1848478: Invalid egressCIDR value is silently ignored~~ [WIP] Bug 1848478: Invalid egressCIDR value is silently ignored Aug 11, 2020

openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2020

tssurya force-pushed the bug-1848478 branch from 9ad8cee to 65c2a8c Compare August 24, 2020 18:03

tssurya changed the title ~~[WIP] Bug 1848478: Invalid egressCIDR value is silently ignored~~ Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup Aug 24, 2020

openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 24, 2020

danwinship reviewed Aug 26, 2020

View reviewed changes

pkg/network/node/subnets.go Outdated Show resolved Hide resolved

tssurya force-pushed the bug-1848478 branch from 65c2a8c to d7c4421 Compare August 27, 2020 11:34

tssurya force-pushed the bug-1848478 branch from d7c4421 to 14b93af Compare August 27, 2020 11:42

squeed reviewed Aug 31, 2020

View reviewed changes

pkg/network/common/common_test.go Show resolved Hide resolved

tssurya force-pushed the bug-1848478 branch from 14b93af to 62c8231 Compare September 3, 2020 18:31

openshift-ci-robot assigned juanluisvaladas Sep 7, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 7, 2020

danwinship reviewed Sep 8, 2020

View reviewed changes

pkg/network/common/common_test.go Outdated Show resolved Hide resolved

pkg/network/common/egressip.go Outdated Show resolved Hide resolved

tssurya force-pushed the bug-1848478 branch from 62c8231 to a86d6a9 Compare September 9, 2020 08:46

openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2020

openshift-merge-robot merged commit 79fd230 into openshift:master Sep 9, 2020

openshift-cherrypick-robot mentioned this pull request Sep 14, 2020

[release-4.5] Bug 1878624: Invalid egressCIDR value causes sdn pods to fail on startup #187

Merged

Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup #169

Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup #169

Uh oh!

Conversation

tssurya commented Jul 28, 2020

Uh oh!

openshift-ci-robot commented Jul 28, 2020

Uh oh!

aojea commented Jul 29, 2020

Uh oh!

Uh oh!

aojea commented Jul 29, 2020

Uh oh!

tssurya commented Jul 29, 2020

Uh oh!

aojea commented Jul 29, 2020

Uh oh!

juanluisvaladas left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danwinship left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tssurya commented Aug 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tssurya commented Aug 25, 2020

Uh oh!

tssurya commented Aug 26, 2020

Uh oh!

Uh oh!

tssurya commented Aug 28, 2020

Uh oh!

Uh oh!

juanluisvaladas commented Sep 7, 2020

Uh oh!

tssurya commented Sep 8, 2020

Uh oh!

Uh oh!

Uh oh!

danwinship commented Sep 9, 2020

Uh oh!

openshift-ci-robot commented Sep 9, 2020

Uh oh!

openshift-ci-robot commented Sep 9, 2020

Uh oh!

tssurya commented Sep 14, 2020

Uh oh!

openshift-cherrypick-robot commented Sep 14, 2020

Uh oh!

tssurya commented Sep 14, 2020

Uh oh!

openshift-cherrypick-robot commented Sep 14, 2020

Uh oh!

tssurya commented Sep 23, 2020

Uh oh!

openshift-cherrypick-robot commented Sep 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

juanluisvaladas left a comment •

edited

Loading

tssurya commented Aug 13, 2020 •

edited

Loading