Skip to content

[mcbs] Fast forward to master#2860

Merged
openshift-merge-robot merged 116 commits intomcbsfrom
master
Dec 6, 2021
Merged

[mcbs] Fast forward to master#2860
openshift-merge-robot merged 116 commits intomcbsfrom
master

Conversation

@cgwalters
Copy link
Copy Markdown
Member

We're going to iterate on this branch, so the first step is to update it to match git master.

kikisdeliveryservice and others added 30 commits August 20, 2021 16:53
The clusteroperator version object is updated to the new version
when Progressing=False. However, syncAvailableStatus is using the
incoming version in its status before syncProgressing officially
updates the clusteroperator object. This yields available at the incoming
version before we are finished.
The most common sync error that was see is RequiredPoolsFailed,
which does not mean that the operator itself is impaired. Let's
only set Available = False when operand syncs fail.
KNI CoreDNS does not resolve hostnames if the nameservers in the forwarders provide a DNS reponse > 512 bytes

So name resolution to github.com from application pod fails if the forward section of Corefile had an upstream DNS which provides a DNS response > 512

The response size you get while dig to github.com using upstream namserver 10.11.142.1  is 602 whereas using namserver 10.11.5.19 it is 55

The limit for UDP DNS messages is 512 bytes long. Well behaved DNS servers are supposed to truncate the message and set the truncated bit. See RFC 1035 section 4.2.1.

https://datatracker.ietf.org/doc/html/rfc1035#section-4.2.1

https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4

The difference in the dig to github.com is the response size. For  10.11.142.1 it is 602 > 512 and for 10.11.5.19 it is 55 < 512

CoreDNS will compress messages that exceed 512 bytes, unless the client allows a larger maximum size by sending the corresponding edns0 option in the request.

dig in particular sends a buffer size > 512 by default. I think the exact number depends on the dig version or perhaps the environment... on my OCP nodes it defaults to 4096 - I think this is most common.

[miheer@localhost ~]$ oc debug node/mykrbid-vcd8j-worker-0-hlkmf
Starting pod/mykrbid-vcd8j-worker-0-hlkmf-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.0.94
If you don't see a command prompt, try pressing enter.
sh-4.4#
sh-4.4# sysctl -a | grep  rmem
net.core.rmem_default = 212992
net.core.rmem_max = 212992
net.ipv4.tcp_rmem = 4096	87380	6291456
net.ipv4.udp_rmem_min = 4096

So, we should be setting 512 as bufsize for KNI coredns to avoid this issue.
Update the CRD description as well as inline comments/docs, so users
looking to use maxUnavailable can better understand some nuances.

Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
Vendor: update library-go for ibmcloud provider
…-4.10-ose-machine-config-operator

Updating ose-machine-config-operator images to be consistent with ART
…atus

Bug 1955300: tighten operator availability conditions
Currently br-ex defaults to the mode stable-privacy because that's
the default for new nmcli connections. However, the host interface
is actually configured for eui64, which causes inconsistency from
initial boot to bridge configuration.

To fix this we can just persist the addr-gen-mode like we do with
the DHCP parameters.
mDNS functionality was removed in 4.8, but we left some of the code
in the interest of not risking issues on upgrade. Now that we're
multiple releases past the initial removal we should be able to get
rid of the last vestiges.
  openshift/api
  openshift/client-go
  openshift/library-go
Bug 1990625: configure-ovs: Persist addr-gen-mode for ipv6 connections
instead of relying on the default config.

Signed-off-by: Peter Hunt <pehunt@redhat.com>
Under some circumstances, the network infrastructure might miss the
GARPs sent when a VIP fails over. This causes the VIP to become
inaccessible because the network doesn't know where it is. Worse, in
some cases the network might not realize it doesn't have the correct
information because it has previously cached the old VIP location.

To get around this, configure keepalived to periodically send GARPs.
This will ensure that even if the failover is missed, the network
will eventually catch up.
Updating openshift libraries
The Keepalived default ingress track script checks if a node running an instance of a default router pod.
We noticed that Keepalived failed to run this script as a command, this PR moves default ingress track script
to a separate file (like chk_ocp_lb and chk_ocp_both).
Bug 1991067: [on-prem] Set coredns bufsize to 512
Remove all references to mdns
This commit starts using "embed" go module to store manifests in
the binary. It is easier and more natural than doing so with go-bindata.
We don't need this library anymore as we've switched to "embed".
Bug 1970021: Revert ephemeral NM configuration change
Fix external cloud provider GCP unittests, add vSphere related ones.
Reorder tests a bit.
Every now and again NM fails to bring up the phys0 connection.
Try a couple times.

configure-ovs.sh[1280]: ++ nmcli -g connection.master connection show uuid 3739be42-f3bc-4ac0-8c69-7a4de7d9e464
configure-ovs.sh[1280]: + '[' c5abc0d7-e93c-4a2d-ac6b-71cb1ead98bf '!=' ens5 ']'
configure-ovs.sh[1280]: + continue
configure-ovs.sh[1280]: + nmcli conn up br-ex
configure-ovs.sh[1280]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5)
configure-ovs.sh[1280]: + nmcli conn up ovs-if-phys0
configure-ovs.sh[1280]: Error: Connection activation failed: A dependency of the connection failed
configure-ovs.sh[1280]: Hint: use 'journalctl -xe NM_CONNECTION=3739be42-f3bc-4ac0-8c69-7a4de7d9e464 + NM_DEVICE=ens5' to get more details.
configure-ovs.sh[1280]: + handle_exit_error
configure-ovs.sh[1280]: + e=4
configure-ovs.sh[1280]: + '[' 4 -eq 0 ']'
configure-ovs.sh[1280]: + set +e
configure-ovs.sh[1280]: + nmcli c show
configure-ovs.sh[1280]: NAME                UUID                                  TYPE        DEVICE
configure-ovs.sh[1280]: Wired connection 1  06f63e11-cc3b-3d8e-9a3c-e685b2a11f24  ethernet    ens5
configure-ovs.sh[1280]: br-ex               6b41000c-f7f5-4bc7-9a52-e4388ce360e3  ovs-bridge  br-ex
configure-ovs.sh[1280]: ovs-port-br-ex      e9041443-4386-4547-9747-49202456d736  ovs-port    br-ex
configure-ovs.sh[1280]: ovs-port-phys0      c5abc0d7-e93c-4a2d-ac6b-71cb1ead98bf  ovs-port    ens5
configure-ovs.sh[1280]: ovs-if-phys0        3739be42-f3bc-4ac0-8c69-7a4de7d9e464  ethernet    --
configure-ovs.sh[1280]: + nmcli conn down ovs-if-phys0
configure-ovs.sh[1280]: Error: 'ovs-if-phys0' is not an active connection.
configure-ovs.sh[1280]: Error: no active connection provided.
configure-ovs.sh[1280]: + nmcli conn up 06f63e11-cc3b-3d8e-9a3c-e685b2a11f24
configure-ovs.sh[1280]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10)
configure-ovs.sh[1280]: + exit 4
openshift-merge-robot and others added 11 commits November 29, 2021 22:20
Bug 2017756: Remove crio settings that overwrite /etc/containers/storage.conf
The storage.conf(5) override_kernel_check option was removed from the
containers/storage library in early 2019:

  containers/storage@bd6cac9

With recent version of CRI-O present in OCP >= 4.9, the presence of
this field causes sandbox creation failure when using user namespaces:

  Warning  FailedCreatePodSandBox  SSs (xN over MMm)  kubelet
  (combined from similar events): Failed to create pod sandbox: rpc
  error: code = Unknown desc = error creating pod sandbox with name
  "{NAME}": error creating an ID-mapped copy of layer "{HASH}":
  time="{TIMESTAMP}" level=warning msg="Failed to decode the keys
  [\"storage.options.override_kernel_check\"] from
  \"/etc/containers/storage.conf\"."

Remove the override_kernel_check option.

Signed-off-by: Fraser Tweedale <ftweedal@redhat.com>
Bug 2023657: Only write ssh keys if core user exists
…field

storage.conf: remove obsolete option override_kernel_check
Don't reboot for GPG key changes
Includes existing approvers and new members in MCO team
Bug 2024826: Allow resolv prepender without default search domain
owners: add reviewers for MCO repo
Besides eliminating code duplication, this will allow more easily unit
testing functions that call resourceapply.* functions, which will help
in migrating from using ./lib to library-go

Created a new type manifestPaths to wrap all the arguments needed for
applyManifests
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Consolidate duplicated code into applyManifests
@openshift-ci openshift-ci Bot requested review from EmilienM and cybertron December 3, 2021 18:45
Copy link
Copy Markdown
Contributor

@kikisdeliveryservice kikisdeliveryservice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wowzers.

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 3, 2021
Bug 2028802: crio: fix a segfault on 4.9->4.10 upgrade
@cgwalters
Copy link
Copy Markdown
Member Author

This just needs a lgtm

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

This just needs a lgtm

since it's just a ff i'll do it ;)

/lgtm

@mkenigs
Copy link
Copy Markdown
Contributor

mkenigs commented Dec 6, 2021

Should we rename this branch to layering?

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Dec 6, 2021
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 6, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, kikisdeliveryservice, sinnykumari

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [kikisdeliveryservice,sinnykumari]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sinnykumari
Copy link
Copy Markdown
Contributor

Should we rename this branch to layering?

Not sure if it worth that effort considering it is a test branch. Also, renaming branch would mean making corresponding config changes into openshift/release repo too and getting approval from corresponding team.

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

Agree branch name is fine as is.

@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

1 similar comment
@openshift-bot
Copy link
Copy Markdown
Contributor

/retest-required

Please review the full test history for this PR and help us cut down flakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. layering lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.