OCPBUGS-16051, OCPBUGS-3176: Enables IP Forwarding config in CNO#1952
Conversation
|
Skipping CI for Draft Pull Request. |
db31009 to
e6f97c1
Compare
|
@trozet: This pull request references Jira Issue OCPBUGS-3176, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/hold until I can verify its working as expected |
|
/test |
|
@trozet: The
The following commands are available to trigger optional jobs:
Use
DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/test 4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade |
|
/retest |
e6f97c1 to
20c1d91
Compare
CNO will detect an upgrade scenario from 4.13->4.14. In this case, IP Forwarding is set in the API to Global mode. This is desired to not break users during upgrade who have features enabled that rely on this behavior (like Metal LB). In 4.13 IP Forwarding is set via MCO. During upgrade, CNO upgrades first and will detect the upgrade, and enable global IP forwarding. Upon node reboot during upgrade, MCO will remove the forwarding files, but the ovnkube-node container will re-enable forwarding at start up. For non 4.13->4.14 upgrade scenarios, whatever the setting in the API is will be used (by default this is restricted forwarding). Signed-off-by: Tim Rozet <trozet@redhat.com>
20c1d91 to
e77c0f6
Compare
|
/test 4.14-upgrade-from-stable-4.13-e2e-aws-ovn-upgrade |
|
/test 4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-upgrade |
|
/retest |
|
@trozet: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
| ipsecStatus.Version = ipsecDaemonSet.GetAnnotations()["release.openshift.io/version"] | ||
| } | ||
|
|
||
| // If we are upgrading from 4.13 -> 4.14 set new API for IP Forwarding mode to Global. |
There was a problem hiding this comment.
also includes any version older than 4.14 right? (i recently got to know workers can directly upgrade from 4.12 -> 4.14 and we should support that as well)
There was a problem hiding this comment.
@deads2k told me the pods have to upgrade from 4.12 ->4.13->4.14, only MCO will jump the nodes from 4.12->4.14
|
|
||
| // Generate the objects | ||
| objs, progressing, err := network.Render(&operConfig.Spec, bootstrapResult, ManifestPath, r.client, r.featureGates) | ||
| objs, progressing, err := network.Render(&newOperConfig.Spec, bootstrapResult, ManifestPath, r.client, r.featureGates) |
There was a problem hiding this comment.
ha this seems like a bigger bug? :)
There was a problem hiding this comment.
yeah I looked into it, it never really mattered before because Alex updated the bootstrapResult and relied on that to sync the new change rather than the operConfig. But I need to use it now, and I dont think it should hurt anything.
There was a problem hiding this comment.
This change caused https://issues.redhat.com/browse/OCPBUGS-18517. And yeah, it might not be clever to inject values in Render phase. Please take a look at #1988 and the approach tried there.
|
|
||
| // If we are upgrading from 4.13 -> 4.14 set new API for IP Forwarding mode to Global. | ||
| // This is to ensure backwards compatibility. | ||
| if masterStatus != nil { |
There was a problem hiding this comment.
is this like the IC hack? can be removed in 4.15?
There was a problem hiding this comment.
yeah, once we past 4.14 the api will exist, so we will remove this. In that case a user will be upgrading from 4.14 -> 4.15 and already have their API set.
| if conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig == nil { | ||
| conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig = &operv1.GatewayConfig{} | ||
| } | ||
| conf.Spec.DefaultNetwork.OVNKubernetesConfig.GatewayConfig.IPForwarding = operv1.IPForwardingGlobal |
There was a problem hiding this comment.
IIUC, we are basically keeping the IPForwarding to be global for all older clusters?
perhaps I am missing something, but are we really patching the api object to reflect this?
conf here is meant to be read only right? or is CNO actually updating this after you set this?
There was a problem hiding this comment.
maybe
is a better place for this? feels like this render will be called always versus the bootstrap logic is a one time thing?There was a problem hiding this comment.
Yeah the agreement we came up with @knobunc and @cgoncalves was that for clusters upgrading we would enable global forwarding. This would preserve their existing functionality in case they rely on metal lb or some other feature that requires forwarding. For fresh 4.14 installs we will keep the default as disabled.
re: "is a better place for this? feels like this render will be called always versus the bootstrap logic is a one time thing?" and "or is CNO actually updating this after you set this?"
It works because after the bootstrap we do:
// Bootstrap any resources
bootstrapResult, err := network.Bootstrap(newOperConfig, r.client)
if err != nil {
log.Printf("Failed to reconcile platform networking resources: %v", err)
r.status.SetDegraded(statusmanager.OperatorConfig, "BootstrapError",
fmt.Sprintf("Internal error while reconciling platform networking resources: %v", err))
return reconcile.Result{}, err
}
if !reflect.DeepEqual(operConfig, newOperConfig) {
if err := r.UpdateOperConfig(ctx, newOperConfig); err != nil {
log.Printf("Failed to update the operator configuration: %v", err)
r.status.SetDegraded(statusmanager.OperatorConfig, "UpdateOperatorConfig",
fmt.Sprintf("Internal error while updating operator configuration: %v", err))
return reconcile.Result{}, err
}
}
Where we update the API config if it changed. The modification to the newOperConfig is done in Bootstrap function, should I be doing it somewhere else? Either way this should only happen one time if the code is executed more than once.
There was a problem hiding this comment.
ack no i won't push for a different place since its just a one time thing...lgtm
|
/retest-required |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: trozet, tssurya The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@trozet: Jira Issue OCPBUGS-3176: Some pull requests linked via external trackers have merged:
The following pull requests linked via external trackers have not merged:
These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with Jira Issue OCPBUGS-3176 has not been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Since openshift#1952 `network.Render` is no longer supplied with operConfig used later, but a copy of it. This means that MTU set by Kuryr's render phase won't be used for setting the Network.Status and will lead to panic. We have this issue because for Kuryr default MTU is detected on bootstrap phase and there's no fixed value. This commit solves this by making "0" a default value for Kuryr's MTU. Having 0 means that Kuryr will attempt to autodetect the MTU. Moreover the MTU on KuryrConfig will now be set on bootstrap phase, so update of the OperConfig done after the bootstrap phase will make sure to update it too.
Since openshift#1952 `network.Render` is no longer supplied with operConfig used later, but a copy of it. This means that MTU set by Kuryr's render phase won't be used for setting the Network.Status and will lead to panic. We have this issue because for Kuryr default MTU is detected on bootstrap phase and there's no fixed value. This commit solves this by making "0" a default value for Kuryr's MTU. Having 0 means that Kuryr will attempt to autodetect the MTU. Moreover the MTU on KuryrConfig will now be set on bootstrap phase, so update of the OperConfig done after the bootstrap phase will make sure to update it too.
Since openshift#1952 `network.Render` is no longer supplied with operConfig used later, but a copy of it. This means that MTU set by Kuryr's render phase won't be used for setting the Network.Status and will lead to panic. We have this issue because for Kuryr default MTU is detected on bootstrap phase and there's no fixed value. This commit solves this by making "0" a default value for Kuryr's MTU. Having 0 means that Kuryr will attempt to autodetect the MTU. Moreover the MTU on KuryrConfig will now be set on bootstrap phase, so update of the OperConfig done after the bootstrap phase will make sure to update it too.
|
Fix included in accepted release 4.14.0-0.nightly-2023-09-11-201102 |
|
Fix included in accepted release 4.14.0-0.nightly-2023-09-12-024050 |
|
Fix included in accepted release 4.14.0-0.nightly-2023-09-15-101929 |
Render has side effects in the passed in operConfig that are reflected later on in the status when it is being updated. This stopped working when the the passed in operConfig was changed to a copy that was not then used when applying the status. It makes sense to use the updated operConfig for everything that comes after, not just Render, so change that to fix the abive issue. Fixes: openshift#1952 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Render has side effects in the passed in operConfig that are reflected later on in the status when it is being updated. This stopped working when the the passed in operConfig was changed to a copy that was not then used when applying the status. It makes sense to use the updated operConfig for everything that comes after, not just Render, so change that to fix the issue. Fixes: openshift#1952 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
Render has side effects in the passed in operConfig that are reflected later on in the status when it is being updated. This stopped working when the the passed in operConfig was changed to a copy that was not then used when applying the status. It makes sense to use the updated operConfig for everything that comes after, not just Render, so change that to fix the issue. Fixes: openshift#1952 Signed-off-by: Jaime Caamaño Ruiz <jcaamano@redhat.com>
CNO will detect an upgrade scenario from 4.13->4.14. In this case, IP Forwarding is set in the API to Global mode. This is desired to not break users during upgrade who have features enabled that rely on this behavior (like Metal LB). In 4.13 IP Forwarding is set via MCO. During upgrade, CNO upgrades first and will detect the upgrade, and enable global IP forwarding. Upon node reboot during upgrade, MCO will remove the forwarding files, but the ovnkube-node container will re-enable forwarding at start up.
For non 4.13->4.14 upgrade scenarios, whatever the setting in the API is will be used (by default this is restricted forwarding).