[mcbs] Fast forward to master#2860
Conversation
The clusteroperator version object is updated to the new version when Progressing=False. However, syncAvailableStatus is using the incoming version in its status before syncProgressing officially updates the clusteroperator object. This yields available at the incoming version before we are finished.
The most common sync error that was see is RequiredPoolsFailed, which does not mean that the operator itself is impaired. Let's only set Available = False when operand syncs fail.
KNI CoreDNS does not resolve hostnames if the nameservers in the forwarders provide a DNS reponse > 512 bytes So name resolution to github.com from application pod fails if the forward section of Corefile had an upstream DNS which provides a DNS response > 512 The response size you get while dig to github.com using upstream namserver 10.11.142.1 is 602 whereas using namserver 10.11.5.19 it is 55 The limit for UDP DNS messages is 512 bytes long. Well behaved DNS servers are supposed to truncate the message and set the truncated bit. See RFC 1035 section 4.2.1. https://datatracker.ietf.org/doc/html/rfc1035#section-4.2.1 https://datatracker.ietf.org/doc/html/rfc1035#section-2.3.4 The difference in the dig to github.com is the response size. For 10.11.142.1 it is 602 > 512 and for 10.11.5.19 it is 55 < 512 CoreDNS will compress messages that exceed 512 bytes, unless the client allows a larger maximum size by sending the corresponding edns0 option in the request. dig in particular sends a buffer size > 512 by default. I think the exact number depends on the dig version or perhaps the environment... on my OCP nodes it defaults to 4096 - I think this is most common. [miheer@localhost ~]$ oc debug node/mykrbid-vcd8j-worker-0-hlkmf Starting pod/mykrbid-vcd8j-worker-0-hlkmf-debug ... To use host binaries, run `chroot /host` Pod IP: 192.168.0.94 If you don't see a command prompt, try pressing enter. sh-4.4# sh-4.4# sysctl -a | grep rmem net.core.rmem_default = 212992 net.core.rmem_max = 212992 net.ipv4.tcp_rmem = 4096 87380 6291456 net.ipv4.udp_rmem_min = 4096 So, we should be setting 512 as bufsize for KNI coredns to avoid this issue.
Update the CRD description as well as inline comments/docs, so users looking to use maxUnavailable can better understand some nuances. Signed-off-by: Yu Qi Zhang <jerzhang@redhat.com>
Pull in support for ibmcloud added in openshift/library-go#1161
Vendor: update library-go for ibmcloud provider
…-4.10-ose-machine-config-operator Updating ose-machine-config-operator images to be consistent with ART
…atus Bug 1955300: tighten operator availability conditions
Currently br-ex defaults to the mode stable-privacy because that's the default for new nmcli connections. However, the host interface is actually configured for eui64, which causes inconsistency from initial boot to bridge configuration. To fix this we can just persist the addr-gen-mode like we do with the DHCP parameters.
mDNS functionality was removed in 4.8, but we left some of the code in the interest of not risking issues on upgrade. Now that we're multiple releases past the initial removal we should be able to get rid of the last vestiges.
openshift/api openshift/client-go openshift/library-go
Bug 1990625: configure-ovs: Persist addr-gen-mode for ipv6 connections
instead of relying on the default config. Signed-off-by: Peter Hunt <pehunt@redhat.com>
Under some circumstances, the network infrastructure might miss the GARPs sent when a VIP fails over. This causes the VIP to become inaccessible because the network doesn't know where it is. Worse, in some cases the network might not realize it doesn't have the correct information because it has previously cached the old VIP location. To get around this, configure keepalived to periodically send GARPs. This will ensure that even if the failover is missed, the network will eventually catch up.
Updating openshift libraries
The Keepalived default ingress track script checks if a node running an instance of a default router pod. We noticed that Keepalived failed to run this script as a command, this PR moves default ingress track script to a separate file (like chk_ocp_lb and chk_ocp_both).
Send WARN message to stderr
Bug 1991067: [on-prem] Set coredns bufsize to 512
Remove all references to mdns
This commit starts using "embed" go module to store manifests in the binary. It is easier and more natural than doing so with go-bindata.
We don't need this library anymore as we've switched to "embed".
Bug 1970021: Revert ephemeral NM configuration change
Fix external cloud provider GCP unittests, add vSphere related ones. Reorder tests a bit.
Every now and again NM fails to bring up the phys0 connection. Try a couple times. configure-ovs.sh[1280]: ++ nmcli -g connection.master connection show uuid 3739be42-f3bc-4ac0-8c69-7a4de7d9e464 configure-ovs.sh[1280]: + '[' c5abc0d7-e93c-4a2d-ac6b-71cb1ead98bf '!=' ens5 ']' configure-ovs.sh[1280]: + continue configure-ovs.sh[1280]: + nmcli conn up br-ex configure-ovs.sh[1280]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/5) configure-ovs.sh[1280]: + nmcli conn up ovs-if-phys0 configure-ovs.sh[1280]: Error: Connection activation failed: A dependency of the connection failed configure-ovs.sh[1280]: Hint: use 'journalctl -xe NM_CONNECTION=3739be42-f3bc-4ac0-8c69-7a4de7d9e464 + NM_DEVICE=ens5' to get more details. configure-ovs.sh[1280]: + handle_exit_error configure-ovs.sh[1280]: + e=4 configure-ovs.sh[1280]: + '[' 4 -eq 0 ']' configure-ovs.sh[1280]: + set +e configure-ovs.sh[1280]: + nmcli c show configure-ovs.sh[1280]: NAME UUID TYPE DEVICE configure-ovs.sh[1280]: Wired connection 1 06f63e11-cc3b-3d8e-9a3c-e685b2a11f24 ethernet ens5 configure-ovs.sh[1280]: br-ex 6b41000c-f7f5-4bc7-9a52-e4388ce360e3 ovs-bridge br-ex configure-ovs.sh[1280]: ovs-port-br-ex e9041443-4386-4547-9747-49202456d736 ovs-port br-ex configure-ovs.sh[1280]: ovs-port-phys0 c5abc0d7-e93c-4a2d-ac6b-71cb1ead98bf ovs-port ens5 configure-ovs.sh[1280]: ovs-if-phys0 3739be42-f3bc-4ac0-8c69-7a4de7d9e464 ethernet -- configure-ovs.sh[1280]: + nmcli conn down ovs-if-phys0 configure-ovs.sh[1280]: Error: 'ovs-if-phys0' is not an active connection. configure-ovs.sh[1280]: Error: no active connection provided. configure-ovs.sh[1280]: + nmcli conn up 06f63e11-cc3b-3d8e-9a3c-e685b2a11f24 configure-ovs.sh[1280]: Connection successfully activated (D-Bus active path: /org/freedesktop/NetworkManager/ActiveConnection/10) configure-ovs.sh[1280]: + exit 4
Bug 2017756: Remove crio settings that overwrite /etc/containers/storage.conf
The storage.conf(5) override_kernel_check option was removed from the containers/storage library in early 2019: containers/storage@bd6cac9 With recent version of CRI-O present in OCP >= 4.9, the presence of this field causes sandbox creation failure when using user namespaces: Warning FailedCreatePodSandBox SSs (xN over MMm) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "{NAME}": error creating an ID-mapped copy of layer "{HASH}": time="{TIMESTAMP}" level=warning msg="Failed to decode the keys [\"storage.options.override_kernel_check\"] from \"/etc/containers/storage.conf\"." Remove the override_kernel_check option. Signed-off-by: Fraser Tweedale <ftweedal@redhat.com>
Bug 2023657: Only write ssh keys if core user exists
…field storage.conf: remove obsolete option override_kernel_check
Don't reboot for GPG key changes
Includes existing approvers and new members in MCO team
Bug 2024826: Allow resolv prepender without default search domain
owners: add reviewers for MCO repo
Besides eliminating code duplication, this will allow more easily unit testing functions that call resourceapply.* functions, which will help in migrating from using ./lib to library-go Created a new type manifestPaths to wrap all the arguments needed for applyManifests
Signed-off-by: Peter Hunt <pehunt@redhat.com>
Consolidate duplicated code into applyManifests
Bug 2028802: crio: fix a segfault on 4.9->4.10 upgrade
|
This just needs a lgtm |
since it's just a ff i'll do it ;) /lgtm |
|
Should we rename this branch to layering? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: cgwalters, kikisdeliveryservice, sinnykumari The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Not sure if it worth that effort considering it is a test branch. Also, renaming branch would mean making corresponding config changes into openshift/release repo too and getting approval from corresponding team. |
|
Agree branch name is fine as is. |
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
1 similar comment
|
/retest-required Please review the full test history for this PR and help us cut down flakes. |
We're going to iterate on this branch, so the first step is to update it to match git master.