Skip to content

Bug 1847674: mount /var/run/netns rslave in ovnkube#579

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
haircommander:bidirectional-netns-ovnkube-master
Apr 19, 2020
Merged

Bug 1847674: mount /var/run/netns rslave in ovnkube#579
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
haircommander:bidirectional-netns-ovnkube-master

Conversation

@haircommander
Copy link
Copy Markdown
Member

@haircommander haircommander commented Apr 8, 2020

When trying to transfer cri-o to manage its network namespaces in openshift, we have run into problems with multus. Specifically we see the error:
2020-04-07T17:46:52Z [error] delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[openshift-dns/dns-default-98tll] failed to configure pod interface: failed to open netns "/var/run/netns/76716600-f7fd-462f-b7df-ae054dbd144e": unknown FS magic on "/var/run/netns/76716600-f7fd-462f-b7df-ae054dbd144e": 1021994
'

through testing, I've found the netns is definitely mounted as an nsfs and not tmpfs, so I suspect we are seeing containernetworking/plugins#69

to fix this, attempt mounting /var/run/netns as HostToContainer in ovnkube container

I have verified this works for 4.4 (by creating a cluster with #576, replacing crio and pinns binaries, running crio using managed namespaces, and verifying the node comes up (as well as the namespaces are in /var/run/netns). Thus, this PR is ready for full review

@haircommander
Copy link
Copy Markdown
Member Author

/retest

1 similar comment
@haircommander
Copy link
Copy Markdown
Member Author

/retest

@fidencio
Copy link
Copy Markdown

fidencio commented Apr 9, 2020

I've tested the 4.4 version of this PR and it does solve the issue I've faced with kata, when combined with cri-o/cri-o#3530 (plus two patches @haircommander must likely would like to have added to that cri-o PR ;-)).

It's worth to mention that I've faced some OVN weirdness, such as:
evel=error msg="Error while checking pod to CNI network "multus-cni-network": neither IPv4 nor IPv6 found when retrieving network status: [Unexpected com...

I'm not confident to say the error faced aboved is related to this patch, as my environment is far from stable (it's an azure cluster spawned using cluster bot with kata installed via a bleeding edge, with known issues, DaemonSet);

Thanks for digging this out, @haircommander!

@haircommander
Copy link
Copy Markdown
Member Author

/retest

another day, another attempt at happy tests

@haircommander
Copy link
Copy Markdown
Member Author

/retest

2 similar comments
@haircommander
Copy link
Copy Markdown
Member Author

/retest

@haircommander
Copy link
Copy Markdown
Member Author

/retest

@haircommander
Copy link
Copy Markdown
Member Author

/retest

plz

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@haircommander: The /retest command does not accept any targets.
The following commands are available to trigger jobs:

  • /test e2e-aws-multitenant
  • /test e2e-aws-ovn
  • /test e2e-gcp
  • /test e2e-gcp-ovn
  • /test e2e-gcp-ovn-upgrade
  • /test e2e-gcp-upgrade
  • /test e2e-metal-ipi
  • /test e2e-ovn-hybrid-step-registry
  • /test e2e-ovn-step-registry
  • /test images
  • /test unit
  • /test verify

Use /test all to run all jobs.

Details

In response to this:

/retest

plz

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Copy Markdown
Member Author

/retest

@haircommander
Copy link
Copy Markdown
Member Author

@dcbw @danwinship @alexanderConstantinescu happy green tests, PTAL

@juanluisvaladas
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 15, 2020
Comment thread bindata/network/ovn-kubernetes/ovnkube-node.yaml Outdated
@haircommander haircommander force-pushed the bidirectional-netns-ovnkube-master branch from 354009e to 5df8c3f Compare April 17, 2020 17:05
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@fidencio
Copy link
Copy Markdown

Works like a charm for both OCI and VM runtime types.

@haircommander
Copy link
Copy Markdown
Member Author

/retest

@haircommander haircommander force-pushed the bidirectional-netns-ovnkube-master branch from 5df8c3f to 71fcfb1 Compare April 17, 2020 18:37
@haircommander
Copy link
Copy Markdown
Member Author

commits squashed and comment updated 😄 PTAL @danwinship

@haircommander haircommander changed the title mount /var/run/netns shared in ovnkube mount /var/run/netns rslave in ovnkube Apr 17, 2020
When trying to transfer cri-o to manage its network namespaces in openshift, we have run into problems with multus. Specifically we see the error:
2020-04-07T17:46:52Z [error] delegateAdd: error invoking DelegateAdd - "ovn-k8s-cni-overlay": error in getting result from AddNetwork: CNI request failed with status 400: '[openshift-dns/dns-default-98tll] failed to configure pod interface: failed to open netns "/var/run/netns/76716600-f7fd-462f-b7df-ae054dbd144e": unknown FS magic on "/var/run/netns/76716600-f7fd-462f-b7df-ae054dbd144e": 1021994
'

through testing, I've found the netns is definitely mounted as an nsfs and not tmpfs, so I suspect we are seeing containernetworking/plugins#69

to fix this, attempt mounting /var/run/netns as HostToContainer in ovnkube container

Signed-off-by: Peter Hunt <pehunt@redhat.com>
@haircommander haircommander force-pushed the bidirectional-netns-ovnkube-master branch from 71fcfb1 to cce9496 Compare April 17, 2020 18:38
@danwinship
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danwinship, haircommander, juanluisvaladas

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

16 similar comments
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Apr 19, 2020

@haircommander: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-gcp-ovn cce9496 link /test e2e-gcp-ovn

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-cherrypick-robot
Copy link
Copy Markdown

@haircommander: new pull request created: #600

Details

In response to this:

/cherry-pick release-4.4

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

s1061123 added a commit to s1061123/cluster-network-operator that referenced this pull request May 25, 2020
To fix that DHCP pod cannot get netns info due to openshift#579, need to change the mount option for DHCP CNI server.
This fix changes it.
@haircommander haircommander changed the title mount /var/run/netns rslave in ovnkube Bug 1847675: mount /var/run/netns rslave in ovnkube Jun 16, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@haircommander: All pull requests linked via external trackers have merged: . Bugzilla bug 1847675 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1847675: mount /var/run/netns rslave in ovnkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander
Copy link
Copy Markdown
Member Author

let's see if robots can do the work for me

/bugzilla refresh

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@haircommander: Bugzilla bug 1847675 is in an unrecognized state (MODIFIED) and will not be moved to the MODIFIED state.

Details

In response to this:

let's see if robots can do the work for me

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@haircommander haircommander changed the title Bug 1847675: mount /var/run/netns rslave in ovnkube Bug 1847674: mount /var/run/netns rslave in ovnkube Jun 16, 2020
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@haircommander: All pull requests linked via external trackers have merged: . Bugzilla bug 1847674 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1847674: mount /var/run/netns rslave in ovnkube

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants