OCPBUGS-16051, OCPBUGS-3176: Enables IP Forwarding config in CNO by trozet · Pull Request #1952 · openshift/cluster-network-operator

trozet · 2023-08-17T14:44:41Z

CNO will detect an upgrade scenario from 4.13->4.14. In this case, IP Forwarding is set in the API to Global mode. This is desired to not break users during upgrade who have features enabled that rely on this behavior (like Metal LB). In 4.13 IP Forwarding is set via MCO. During upgrade, CNO upgrades first and will detect the upgrade, and enable global IP forwarding. Upon node reboot during upgrade, MCO will remove the forwarding files, but the ovnkube-node container will re-enable forwarding at start up.

For non 4.13->4.14 upgrade scenarios, whatever the setting in the API is will be used (by default this is restricted forwarding).

openshift-ci · 2023-08-17T14:44:58Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci-robot · 2023-08-17T18:50:37Z

@trozet: This pull request references Jira Issue OCPBUGS-3176, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.14.0) matches configured target version for branch (4.14.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @anuragthehatter

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

CNO will detect an upgrade scenario from 4.13->4.14. In this case, IP Forwarding is set in the API to Global mode. This is desired to not break users during upgrade who have features enabled that rely on this behavior (like Metal LB). In 4.13 IP Forwarding is set via MCO. During upgrade, CNO upgrades first and will detect the upgrade, and enable global IP forwarding. Upon node reboot during upgrade, MCO will remove the forwarding files, but the ovnkube-node container will re-enable forwarding at start up.

For non 4.13->4.14 upgrade scenarios, whatever the setting in the API is will be used (by default this is restricted forwarding).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

trozet · 2023-08-17T18:50:56Z

/hold

until I can verify its working as expected

trozet · 2023-08-17T18:53:15Z

/test

openshift-ci · 2023-08-17T18:53:27Z

@trozet: The /test command needs one or more targets.
The following commands are available to trigger required jobs:

/test 4.14-upgrade-from-stable-4.13-images
/test e2e-aws-ovn-network-migration
/test e2e-aws-ovn-windows
/test e2e-aws-sdn-multi
/test e2e-aws-sdn-network-migration-rollback
/test e2e-aws-sdn-network-reverse-migration
/test e2e-gcp-ovn
/test e2e-gcp-sdn
/test e2e-hypershift-ovn
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn-windows
/test images
/test lint
/test unit
/test verify

The following commands are available to trigger optional jobs:

/test 4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade
/test 4.14-upgrade-from-stable-4.13-e2e-azure-ovn-upgrade
/test 4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-upgrade
/test e2e-aws-ovn-local-to-shared-gateway-mode-migration
/test e2e-aws-ovn-serial
/test e2e-aws-ovn-shared-to-local-gateway-mode-migration-periodic
/test e2e-aws-ovn-single-node
/test e2e-aws-sdn-upgrade
/test e2e-azure-ovn
/test e2e-azure-ovn-dualstack
/test e2e-azure-ovn-manual-oidc
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6-ipsec
/test e2e-network-mtu-migration-ovn-ipv4
/test e2e-network-mtu-migration-ovn-ipv6
/test e2e-network-mtu-migration-sdn-ipv4
/test e2e-openstack-kuryr
/test e2e-openstack-ovn
/test e2e-openstack-sdn
/test e2e-ovn-hybrid-step-registry
/test e2e-ovn-ipsec-step-registry
/test e2e-ovn-step-registry
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-dualstack

Use /test all to run the following jobs that were automatically triggered:

pull-ci-openshift-cluster-network-operator-master-4.14-upgrade-from-stable-4.13-images
pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-serial
pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-shared-to-local-gateway-mode-migration-periodic
pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-single-node
pull-ci-openshift-cluster-network-operator-master-e2e-aws-ovn-windows
pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-multi
pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-network-migration-rollback
pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-network-reverse-migration
pull-ci-openshift-cluster-network-operator-master-e2e-aws-sdn-upgrade
pull-ci-openshift-cluster-network-operator-master-e2e-azure-ovn
pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn
pull-ci-openshift-cluster-network-operator-master-e2e-gcp-ovn-upgrade
pull-ci-openshift-cluster-network-operator-master-e2e-gcp-sdn
pull-ci-openshift-cluster-network-operator-master-e2e-hypershift-ovn
pull-ci-openshift-cluster-network-operator-master-e2e-metal-ipi-ovn-ipv6
pull-ci-openshift-cluster-network-operator-master-e2e-metal-ipi-ovn-ipv6-ipsec
pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-ovn-ipv4
pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-ovn-ipv6
pull-ci-openshift-cluster-network-operator-master-e2e-network-mtu-migration-sdn-ipv4
pull-ci-openshift-cluster-network-operator-master-e2e-openstack-ovn
pull-ci-openshift-cluster-network-operator-master-e2e-openstack-sdn
pull-ci-openshift-cluster-network-operator-master-e2e-ovn-hybrid-step-registry
pull-ci-openshift-cluster-network-operator-master-e2e-ovn-ipsec-step-registry
pull-ci-openshift-cluster-network-operator-master-e2e-ovn-step-registry
pull-ci-openshift-cluster-network-operator-master-e2e-vsphere-ovn
pull-ci-openshift-cluster-network-operator-master-e2e-vsphere-ovn-dualstack
pull-ci-openshift-cluster-network-operator-master-e2e-vsphere-ovn-windows
pull-ci-openshift-cluster-network-operator-master-images
pull-ci-openshift-cluster-network-operator-master-lint
pull-ci-openshift-cluster-network-operator-master-unit
pull-ci-openshift-cluster-network-operator-master-verify

Details

In response to this:

/test

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

trozet · 2023-08-17T18:54:38Z

/test 4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade

trozet · 2023-08-17T18:56:41Z

/retest

CNO will detect an upgrade scenario from 4.13->4.14. In this case, IP Forwarding is set in the API to Global mode. This is desired to not break users during upgrade who have features enabled that rely on this behavior (like Metal LB). In 4.13 IP Forwarding is set via MCO. During upgrade, CNO upgrades first and will detect the upgrade, and enable global IP forwarding. Upon node reboot during upgrade, MCO will remove the forwarding files, but the ovnkube-node container will re-enable forwarding at start up. For non 4.13->4.14 upgrade scenarios, whatever the setting in the API is will be used (by default this is restricted forwarding). Signed-off-by: Tim Rozet <trozet@redhat.com>

trozet · 2023-08-18T17:52:42Z

/test 4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade

trozet · 2023-08-18T17:54:01Z

/test 4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-upgrade

trozet · 2023-08-19T13:56:35Z

/hold cancel

looks like its working @tssurya PTAL

/assign @tssurya

trozet · 2023-08-19T13:58:06Z

/retest

openshift-ci · 2023-08-19T17:25:39Z

@trozet: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-openstack-ovn	`e77c0f6`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-vsphere-ovn	`e77c0f6`	link	false	`/test e2e-vsphere-ovn`
ci/prow/e2e-vsphere-ovn-dualstack	`e77c0f6`	link	false	`/test e2e-vsphere-ovn-dualstack`
ci/prow/e2e-network-mtu-migration-ovn-ipv4	`e77c0f6`	link	false	`/test e2e-network-mtu-migration-ovn-ipv4`
ci/prow/e2e-network-mtu-migration-sdn-ipv4	`e77c0f6`	link	false	`/test e2e-network-mtu-migration-sdn-ipv4`
ci/prow/e2e-aws-ovn-shared-to-local-gateway-mode-migration-periodic	`e77c0f6`	link	false	`/test e2e-aws-ovn-shared-to-local-gateway-mode-migration-periodic`
ci/prow/e2e-metal-ipi-ovn-ipv6-ipsec	`e77c0f6`	link	false	`/test e2e-metal-ipi-ovn-ipv6-ipsec`
ci/prow/e2e-openstack-sdn	`e77c0f6`	link	false	`/test e2e-openstack-sdn`
ci/prow/e2e-network-mtu-migration-ovn-ipv6	`e77c0f6`	link	false	`/test e2e-network-mtu-migration-ovn-ipv6`
ci/prow/e2e-aws-ovn-single-node	`e77c0f6`	link	false	`/test e2e-aws-ovn-single-node`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

tssurya · 2023-08-22T09:55:51Z

 		ipsecStatus.Version = ipsecDaemonSet.GetAnnotations()["release.openshift.io/version"]
 	}

+	// If we are upgrading from 4.13 -> 4.14 set new API for IP Forwarding mode to Global.


also includes any version older than 4.14 right? (i recently got to know workers can directly upgrade from 4.12 -> 4.14 and we should support that as well)

@deads2k told me the pods have to upgrade from 4.12 ->4.13->4.14, only MCO will jump the nodes from 4.12->4.14

tssurya · 2023-08-22T09:56:43Z


 	// Generate the objects
-	objs, progressing, err := network.Render(&operConfig.Spec, bootstrapResult, ManifestPath, r.client, r.featureGates)
+	objs, progressing, err := network.Render(&newOperConfig.Spec, bootstrapResult, ManifestPath, r.client, r.featureGates)


ha this seems like a bigger bug? :)

yeah I looked into it, it never really mattered before because Alex updated the bootstrapResult and relied on that to sync the new change rather than the operConfig. But I need to use it now, and I dont think it should hurt anything.

This change caused https://issues.redhat.com/browse/OCPBUGS-18517. And yeah, it might not be clever to inject values in Render phase. Please take a look at #1988 and the approach tried there.

tssurya · 2023-08-22T10:05:05Z


+	// If we are upgrading from 4.13 -> 4.14 set new API for IP Forwarding mode to Global.
+	// This is to ensure backwards compatibility.
+	if masterStatus != nil {


is this like the IC hack? can be removed in 4.15?

yeah, once we past 4.14 the api will exist, so we will remove this. In that case a user will be upgrading from 4.14 -> 4.15 and already have their API set.

tssurya · 2023-08-22T10:09:02Z

+		if conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig == nil {
+			conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig = &operv1.GatewayConfig{}
+		}
+		conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig.IPForwarding = operv1.IPForwardingGlobal


IIUC, we are basically keeping the IPForwarding to be global for all older clusters?
perhaps I am missing something, but are we really patching the api object to reflect this?
conf here is meant to be read only right? or is CNO actually updating this after you set this?

maybe

cluster-network-operator/pkg/network/ovn_kubernetes.go

Line 793 in e125ef9

if conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig == nil {

is a better place for this? feels like this render will be called always versus the bootstrap logic is a one time thing?

Yeah the agreement we came up with @knobunc and @cgoncalves was that for clusters upgrading we would enable global forwarding. This would preserve their existing functionality in case they rely on metal lb or some other feature that requires forwarding. For fresh 4.14 installs we will keep the default as disabled.

re: "is a better place for this? feels like this render will be called always versus the bootstrap logic is a one time thing?" and "or is CNO actually updating this after you set this?"

It works because after the bootstrap we do:

// Bootstrap any resources bootstrapResult, err := network.Bootstrap(newOperConfig, r.client) if err != nil { log.Printf("Failed to reconcile platform networking resources: %v", err) r.status.SetDegraded(statusmanager.OperatorConfig, "BootstrapError", fmt.Sprintf("Internal error while reconciling platform networking resources: %v", err)) return reconcile.Result{}, err } if !reflect.DeepEqual(operConfig, newOperConfig) { if err := r.UpdateOperConfig(ctx, newOperConfig); err != nil { log.Printf("Failed to update the operator configuration: %v", err) r.status.SetDegraded(statusmanager.OperatorConfig, "UpdateOperatorConfig", fmt.Sprintf("Internal error while updating operator configuration: %v", err)) return reconcile.Result{}, err } }

Where we update the API config if it changed. The modification to the newOperConfig is done in Bootstrap function, should I be doing it somewhere else? Either way this should only happen one time if the code is executed more than once.

ack no i won't push for a different place since its just a one time thing...lgtm

trozet · 2023-08-25T13:52:45Z

/retest-required

tssurya · 2023-08-25T16:05:18Z

/lgtm

openshift-ci · 2023-08-25T16:05:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: trozet, tssurya

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [trozet]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-08-25T20:40:25Z

@trozet: Jira Issue OCPBUGS-3176: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

openshift/cluster-network-operator#1806 is closed

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-3176 has not been moved to the MODIFIED state.

Details

In response to this:

CNO will detect an upgrade scenario from 4.13->4.14. In this case, IP Forwarding is set in the API to Global mode. This is desired to not break users during upgrade who have features enabled that rely on this behavior (like Metal LB). In 4.13 IP Forwarding is set via MCO. During upgrade, CNO upgrades first and will detect the upgrade, and enable global IP forwarding. Upon node reboot during upgrade, MCO will remove the forwarding files, but the ovnkube-node container will re-enable forwarding at start up.

For non 4.13->4.14 upgrade scenarios, whatever the setting in the API is will be used (by default this is restricted forwarding).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Since openshift#1952 `network.Render` is no longer supplied with operConfig used later, but a copy of it. This means that MTU set by Kuryr's render phase won't be used for setting the Network.Status and will lead to panic. We have this issue because for Kuryr default MTU is detected on bootstrap phase and there's no fixed value. This commit solves this by making "0" a default value for Kuryr's MTU. Having 0 means that Kuryr will attempt to autodetect the MTU. Moreover the MTU on KuryrConfig will now be set on bootstrap phase, so update of the OperConfig done after the bootstrap phase will make sure to update it too.

openshift-merge-robot · 2023-09-16T01:17:29Z

Fix included in accepted release 4.14.0-0.nightly-2023-09-11-201102

openshift-merge-robot · 2023-09-17T02:25:12Z

Fix included in accepted release 4.14.0-0.nightly-2023-09-12-024050

openshift-merge-robot · 2023-09-17T12:09:25Z

Fix included in accepted release 4.14.0-0.nightly-2023-09-15-101929

Render has side effects in the passed in operConfig that are reflected later on in the status when it is being updated. This stopped working when the the passed in operConfig was changed to a copy that was not then used when applying the status. It makes sense to use the updated operConfig for everything that comes after, not just Render, so change that to fix the abive issue. Fixes: openshift#1952 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

Render has side effects in the passed in operConfig that are reflected later on in the status when it is being updated. This stopped working when the the passed in operConfig was changed to a copy that was not then used when applying the status. It makes sense to use the updated operConfig for everything that comes after, not just Render, so change that to fix the issue. Fixes: openshift#1952 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2023

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2023

trozet force-pushed the ovn_ip_forwarding branch from db31009 to e6f97c1 Compare August 17, 2023 18:29

trozet marked this pull request as ready for review August 17, 2023 18:31

openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Aug 17, 2023

openshift-ci Bot requested review from abhat and danwinship August 17, 2023 18:34

trozet changed the title ~~Enables IP Forwarding config in CNO~~ OCPBUGS-16051, OCPBUGS-3176: Enables IP Forwarding config in CNO Aug 17, 2023

openshift-ci Bot requested a review from anuragthehatter August 17, 2023 18:50

openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 17, 2023

trozet force-pushed the ovn_ip_forwarding branch from e6f97c1 to 20c1d91 Compare August 18, 2023 13:26

trozet force-pushed the ovn_ip_forwarding branch from 20c1d91 to e77c0f6 Compare August 18, 2023 17:52

openshift-ci Bot assigned tssurya Aug 19, 2023

openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 19, 2023

tssurya reviewed Aug 22, 2023

View reviewed changes

openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Aug 25, 2023

openshift-merge-robot merged commit 3d0ed1b into openshift:master Aug 25, 2023

dulek mentioned this pull request Sep 6, 2023

OCPBUGS-18517: Kuryr: Set MTU on Bootstrap, not Render phase #1988

Merged

jcaamano mentioned this pull request Sep 21, 2023

OCPBUGS-18396: Fix config status MTU migration not being updated #2021

Merged

Conversation

trozet commented Aug 17, 2023

Uh oh!

openshift-ci Bot commented Aug 17, 2023

Uh oh!

openshift-ci-robot commented Aug 17, 2023

Uh oh!

trozet commented Aug 17, 2023

Uh oh!

trozet commented Aug 17, 2023

Uh oh!

openshift-ci Bot commented Aug 17, 2023

Uh oh!

trozet commented Aug 17, 2023

Uh oh!

trozet commented Aug 17, 2023

Uh oh!

trozet commented Aug 18, 2023

Uh oh!

trozet commented Aug 18, 2023

Uh oh!

trozet commented Aug 19, 2023

Uh oh!

trozet commented Aug 19, 2023

Uh oh!

openshift-ci Bot commented Aug 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

trozet commented Aug 25, 2023

Uh oh!

tssurya commented Aug 25, 2023

Uh oh!

openshift-ci Bot commented Aug 25, 2023

Uh oh!

openshift-ci-robot commented Aug 25, 2023

Uh oh!

openshift-merge-robot commented Sep 16, 2023

Uh oh!

openshift-merge-robot commented Sep 17, 2023

Uh oh!

openshift-merge-robot commented Sep 17, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

openshift-ci Bot commented Aug 19, 2023 •

edited

Loading