Bug 2048352: ovn-kubernetes: Fixed ofport_request for physical ports#2941
Conversation
|
/assign @trozet |
|
Verification: And the logs will show: And the script looks like this: |
|
@andreaskaris: This pull request references Bugzilla bug 2048352, which is invalid:
Comment DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/bugzilla refresh |
|
@andreaskaris: This pull request references Bugzilla bug 2048352, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker. 3 validation(s) were run on this bug
Requesting review from QA contact: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@jcaamano Wdyt? |
|
/retest |
1 similar comment
|
/retest |
ba7d11c to
99a127a
Compare
|
@andreaskaris on your verification, when you restart NM, how does the NM dhcp request correlate in time with the execution of the dispatcher script? Because there is a time window when the ofport is not the one we expect right? |
|
Yeah, we lose 1 DHCP DISCOVER, at least for my test. That's not bad though: 2048352-nm-pre-up.txt |
88a667e to
2b8f100
Compare
|
@jcaamano I modified the script, now it will:
It works for linux vlan, linux bond, ovs bond (I tested those on a VM). The only thing that comes to my mind that I haven't tested is an OVS VLAN on the interface and attached to the bridge, if that's even possible. |
|
Just FYI - no idea why it landed on ID 2 for this run, but it's stable on 2 with the script: |
|
/retest |
|
Output of the latest test: |
|
/retest |
d57a7b7 to
3aa55ed
Compare
|
/lgtm |
|
@trozet can you PTAL and give final review? /test e2e-agnostic-upgrade |
There was a problem hiding this comment.
With pre-up, has the interface been created in OVS already? Are we guaranteed that?
There was a problem hiding this comment.
According to the documentation: https://developer-old.gnome.org/NetworkManager/stable/NetworkManager.html
pre-up | The interface is connected to the network but is not yet fully activated. Scripts acting on this event must be placed or symlinked into the /etc/NetworkManager/dispatcher.d/pre-up.d directory, and NetworkManager will wait for script execution to complete before indicating to applications that the interface is fully activated.
-- | --
So, The interface is connected to the network. There are several traces attached above with debug logging for NM where we can see that the interface is attached to OVS first, and pre-up is run only after that. I also tested on a dedicated virtual machine (RHEL 8.4) with the following setup with various port types (vlan, bond, standalone interfaces), and it works and the behavior is consistent:
[root@rhel84 ~]# ovs-vsctl show
0c29b398-2f10-497a-ba0f-ad834737cbd6
Bridge br-ex
Port eth2
Interface eth2
type: system
Port vlan10
Interface vlan10
type: system
Port br-ex
Interface br-ex
type: internal
Port bond0
Interface eth4
type: system
Interface eth5
type: system
Port patch-to-br-int
Interface patch-to-br-int
type: patch
options: {peer=patch-to-br-ex}
Port eth1
Interface eth1
type: system
Port bond1
Interface bond1
type: system
Bridge br-int
Port patch-to-br-ex
Interface patch-to-br-ex
type: patch
options: {peer=patch-to-br-int}
Port br-int
Interface br-int
type: internal
Bridge br-ex2
Port eth3
Interface eth3
type: system
ovs_version: "2.15.3"
[root@rhel84 ~]# nmcli conn
NAME UUID TYPE DEVICE
Wired connection 2 923b22f9-8cc8-32fa-aa0d-54cffb024215 ethernet eth10
Wired connection 3 2b1f9ef1-5fb3-35f5-9269-5ab7ef70f5c9 ethernet eth8
Wired connection 4 9432aa2c-d1aa-30a1-b852-970bc72b687d ethernet eth9
Wired connection 1 a63944ec-3426-3ec1-b485-5b1603aadf3d ethernet eth0
ovs-if-br-ex 74c03eca-f6a0-4409-b4e5-2698e0e0665a ovs-interface br-ex
bond1 cd4c8622-8ccd-4a3e-b2a1-9b1d57667e80 bond bond1
bond1-eth6 23969750-49ef-4145-99c3-6af53b54f4a4 ethernet eth6
bond1-eth7 d20b59c8-99c8-4521-980f-9422b1682ab4 ethernet eth7
bond1-ovsport 1cc55059-da8f-4838-873c-6458e4c2682b ovs-port bond1
br-ex 9e0055c1-23e7-473c-8444-015d4aa11dac ovs-bridge br-ex
br-ex2 34109401-5f47-44ca-9f3b-5fc1b37422a5 ovs-bridge br-ex2
ovs-if-eth1 ec425a9c-9128-4d6e-a367-8cb0493f688f ethernet eth1
ovs-if-eth2 c98f0470-7955-4195-bc0e-2174dc49085f ethernet eth2
ovs-if-eth3 195a5964-2eff-4ead-8bda-0b071739e9d6 ethernet eth3
ovs-port-br-ex 0d18b6a6-2fe5-4c44-b705-d2fc4301aafc ovs-port br-ex
ovs-port-br-ex2 8ba11cc7-a222-40f5-86a4-c569a85f81f4 ovs-port br-ex2
ovs-port-eth1 47871482-b875-4ba0-bb26-5e213135583f ovs-port eth1
ovs-port-eth2 5b9bdfeb-7238-45dc-b7af-714b9a12ab11 ovs-port eth2
ovs-port-eth3 2f4f0995-7ead-4ffe-873c-e1f544ce1b7d ovs-port eth3
ovs-slave-bond0 676b62a4-63b8-4c85-8cba-c1a091616e19 ovs-port bond0
ovs-slave-eth4 261b0374-1446-4f19-a2dd-5494b70212fe ethernet eth4
ovs-slave-eth5 5d0fc00d-17e0-471a-b0a9-4bd851876d83 ethernet eth5
vlan10 4dfb2bb4-45ed-4fff-b746-6432874fde5a vlan vlan10
vlan10-ovsport 1e7cf87e-a6c0-467c-ae88-70fb0e0d701b ovs-port vlan10
ovs-if-br-ex2 4f70d1f0-48f7-42fa-b4bd-d7f69e4477de ovs-interface --
All traces show that the interface is connected first (and gets a preassigned ID) and then the script kicks in.
There was a problem hiding this comment.
I threw in 545bc39#diff-bfb1096d30a23088db321d88004b44a807716f56cad84e920f3e5451f3d6f1e0R65 as a sanity check to exit gracefully if the interface does not exist.
There was a problem hiding this comment.
Conversation with Dan: https://coreos.slack.com/archives/CDCP2LA9L/p1645621075899249
There was a problem hiding this comment.
this is really an array. Why use an array to store a single value? There can only be one physical port at a time right?
There was a problem hiding this comment.
a) I wrote the script on a RHEL VM and purposefully added lots of interfaces and interface variatoins to make the script more capable, flexible and future proof. I did not want the script to be a point of failure if for some reason we decide to change anything (as unlikely as it may be).
b) As a more compelling argument, OVS bonds are implemented based on flows and they consist of multiple interfaces each with their own index, so they look like this:
Bridge br-ex
Port bond0
Interface eth4
type: system
Interface eth5
type: system
And each of them get their own port ID:
5(eth4): addr:52:54:00:df:9c:35
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
6(eth5): addr:52:54:00:15:ec:29
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
So the array solution is a must if we don't want to exclude OVS bonding. Granted, ovs-configure.sh doesn't do ovs-bonding at the moment, but we don't know for how long this script would be around and what changes we make to ovs-configure.sh, so me might as well do it right in this script here and keep it general purpose and have it handle anything we throw at it.
3aa55ed to
455006c
Compare
bb3f353 to
9c63e55
Compare
When NetworkManager is restarted, all interfaces are removed from the external bridge (br-ex / breth0). By default, ports' OVS IDs change and the existing flows point to wrong 'output:' port numbers leading to packet drops of all traffic that originates at LOCAL and the node will not be able to request an IP address via DHCP. Request fixed ofport numbers via the ofport_request field. Signed-off-by: Andreas Karis <ak.karis@gmail.com>
9c63e55 to
545bc39
Compare
|
/assign @kikisdeliveryservice |
|
@kikisdeliveryservice Can you approve this? Thanks :-) |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: andreaskaris, jcaamano, kikisdeliveryservice, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@andreaskaris: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
|
@andreaskaris: All pull requests linked via external trackers have merged: Bugzilla bug 2048352 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When NetworkManager is restarted, all interfaces are removed from the
external bridge (br-ex / breth0). By default, ports' OVS IDs
change and the existing flows point to wrong 'output:' port numbers
leading to packet drops of all traffic that originates at
LOCAL and the node will not be able to request an IP address via DHCP.
Request fixed ofport numbers via the ofport_request field.
Signed-off-by: Andreas Karis ak.karis@gmail.com
- What I did
- How to verify it
- Description for the changelog