Skip to content

Conversation

@tssurya
Copy link
Contributor

@tssurya tssurya commented Jul 28, 2020

If the user sets an invalid egressCIDR value, the API accepts this
and sets it in the hostsubnet field making the user think all went
well. However in the background sdn emits a warning in the logs
saying it ignores the invalid hostsubnet value but the user does
not become aware of this. Upon restarting the sdn-node pod, the pod
fails to come up and barfs that the egressCIDR is invalid.

This patch fixes this by making the sdn-controller wipe out the
invalid egressCIDR/IP field once the validation fails. This is the
same mechanism that the controller uses upon start-up when it comes
across an invalid hostsubnet.

Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com

@openshift-ci-robot openshift-ci-robot added the bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. label Jul 28, 2020
@openshift-ci-robot
Copy link
Contributor

@tssurya: This pull request references Bugzilla bug 1848478, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.6.0) matches configured target release for branch (4.6.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)
Details

In response to this:

Bug 1848478: Invalid egressCIDR value is silently ignored

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. label Jul 28, 2020
@openshift-ci-robot openshift-ci-robot requested review from aojea and dcbw July 28, 2020 14:20
@aojea
Copy link
Contributor

aojea commented Jul 29, 2020

You should add unit tests in the corresponding files 😄

@aojea
Copy link
Contributor

aojea commented Jul 29, 2020

/assign @danwinship

@tssurya
Copy link
Contributor Author

tssurya commented Jul 29, 2020

You should add unit tests in the corresponding files

I know :) and I tried, but there are no corresponding files for subnets.go. I might have to start one or let me see if this stuff gets tested elsewhere.

@aojea
Copy link
Contributor

aojea commented Jul 29, 2020

You should add unit tests in the corresponding files

I know :) and I tried, but there are no corresponding files for subnets.go. I might have to start one or let me see if this stuff gets tested elsewhere.

don't worry for subnets_test.go by now, at least the ones that exist, pkg/network/common/validation_test.go and pkg/network/common/egressip_test.go to cover the new functionality and avoid regressions

Copy link
Contributor

@juanluisvaladas juanluisvaladas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think we should do in case of failure is, verify is the configuration in kubectl.kubernetes.io/last-applied-configuration is valid, if it's valid apply the egressIPs and egressCIDRs and if it's not only remove the bad ones. @danwinship do you agree?

I just did a quick test and in the case of patching the hostsubnet with oc patch (most of our users will do that because that's how our docs tell to do it) there is no kubectl.kubernetes.io/last-applied-configuration. So at the end of the day it's less like for it to be updated than updated or present.

Removing the bad ones is just the safest IMO.

Copy link
Contributor

@danwinship danwinship left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the real fix here is that we need to have an admission controller...

@tssurya tssurya changed the title Bug 1848478: Invalid egressCIDR value is silently ignored [WIP] Bug 1848478: Invalid egressCIDR value is silently ignored Aug 11, 2020
@openshift-ci-robot openshift-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 11, 2020
@tssurya
Copy link
Contributor Author

tssurya commented Aug 13, 2020

So the real fix here is that we need to have an admission controller...

hmm I'll have to dig in more to see the details of adding an admission controller.

Anyhow, in this particular case, we used to check the invalid egressCIDR values set by the users in 3.11. But in 4.x we simply don't do anything and let the values get set and then blowup when the sdn worker pods get restarted. So the user understandably feels like we caused a regression.

@tssurya tssurya changed the title [WIP] Bug 1848478: Invalid egressCIDR value is silently ignored Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup Aug 24, 2020
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 24, 2020
@tssurya
Copy link
Contributor Author

tssurya commented Aug 25, 2020

/test e2e-gcp
/test e2e-aws

@tssurya
Copy link
Contributor Author

tssurya commented Aug 26, 2020

/retest

HostSubnet is half system-maintained (subnet allocations) and half
user-maintained (EgressIPs). If there is any invalid value that is set
on the user-maintained fields, we should just ignore them and continue
with the SDN startup procedure instead of failing.

In this patch we remove the validation of the user-maintained fields
from ValidateHostSubnet function. We can do a separate validation for
the user-maintained fields.

Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com
@tssurya
Copy link
Contributor Author

tssurya commented Aug 28, 2020

/test e2e-gcp
/test e2e-aws

@juanluisvaladas
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 7, 2020
@tssurya
Copy link
Contributor Author

tssurya commented Sep 8, 2020

@danwinship : PTAL whenever you have some time.

We don't have a proper validation procedure for egress ips/cidrs.
This patch adds a ValidateHostSubnetEgress function which will
check for the validity of the user-maintained values in the hostsubet
object.

Signed-off-by: Surya Seetharaman suryaseetharaman.9@gmail.com
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020
@danwinship
Copy link
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Sep 9, 2020
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, juanluisvaladas, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 9, 2020
@openshift-merge-robot openshift-merge-robot merged commit 79fd230 into openshift:master Sep 9, 2020
@openshift-ci-robot
Copy link
Contributor

@tssurya: All pull requests linked via external trackers have merged:

Bugzilla bug 1848478 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tssurya
Copy link
Contributor Author

tssurya commented Sep 14, 2020

/cherry-pick release 4.5
/cherry-pick release 4.4

@openshift-cherrypick-robot

@tssurya: cannot checkout release 4.5: error checking out release 4.5: exit status 1. output: error: pathspec 'release 4.5' did not match any file(s) known to git

Details

In response to this:

/cherry-pick release 4.5
/cherry-pick release 4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tssurya
Copy link
Contributor Author

tssurya commented Sep 14, 2020

/cherry-pick release-4.5
/cherry-pick release-4.4

@openshift-cherrypick-robot

@tssurya: new pull request created: #187

Details

In response to this:

/cherry-pick release-4.5
/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tssurya
Copy link
Contributor Author

tssurya commented Sep 23, 2020

/cherry-pick release-4.4

@openshift-cherrypick-robot

@tssurya: #169 failed to apply on top of branch "release-4.4":

Applying: Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup
Using index info to reconstruct a base tree...
M	pkg/network/common/validation.go
Falling back to patching base and 3-way merge...
Auto-merging pkg/network/common/validation.go
CONFLICT (content): Merge conflict in pkg/network/common/validation.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Bug 1848478: Invalid egressCIDR value causes sdn pods to fail on startup
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Details

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/severity-medium Referenced Bugzilla bug's severity is medium for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants