Bug 1751978: templates/baremetal: Fix keepalived dysfunction on vrrp iface change#1124
Conversation
|
Depends on openshift/baremetal-runtimecfg#20 being merged |
|
/assign runcom |
There was a problem hiding this comment.
Is runtimecfg still used by anyone or it can be dropped from https://github.com/openshift/baremetal-runtimecfg/?
There was a problem hiding this comment.
It is used by mdns-publisher
There was a problem hiding this comment.
there is a possibility so safe one (!) character by dropping the extra space
There was a problem hiding this comment.
You remove it and start listening to it on the next line? Why?
There was a problem hiding this comment.
In case the container is restarted the socket file might be there and socat won't be able to start unless it is gone.
There was a problem hiding this comment.
There is a regression. Previously it would fail if the config does not exist.
There was a problem hiding this comment.
Now instead of failing it will just wait in socat for a reload when the config is written. What's the issue with that?
|
@celebdor: This pull request references Bugzilla bug 1751978, which is valid. The bug has been moved to the POST state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
since this depends on openshift/baremetal-runtimecfg#20 & response to the reviewer's questions above. /hold |
|
Also updated the PR title, please use this format in the future :) |
There was a problem hiding this comment.
@celebdor could you please elaborate about the logic of the LivenessProbe?
There was a problem hiding this comment.
Absolutely :-)
If keepalived is not running, the pgrep will be empty and the kill command will fail with exit code 1. This means the liveness check will consider it failed.
If keepalived is running, sending SIGUSR1 to the parent keepalived process will make keepalived to write /tmp/keepalived.data containing information about each vrrp group including the state. The state can be MASTER, BACKUP or FAULT. If any of the states are FAULT grep will exit with exit code 1 and the liveness test will fail.
Oops. I just realized I forgot to put put the negative. Will fix it now
There was a problem hiding this comment.
@celebdor I think that "$pid" could be empty (the keepalived container starts before monitor container and .conf is missing), should we update something in 'kill -s SIGHUP "$pid"' ?
1773fda to
b2cf75e
Compare
There was a problem hiding this comment.
@celebdor, not sure that I (still :-) ) fully understand the liveness probe logic.
So, if some component (e.g: CNV ) changed network interfaces configuration, that may lead to keepalived failure (State=FAULT), and this container will be restarted by Kubelet. IIUC, the problem will be fixed only after monitor container updates .conf file, right?
Could u please explain what added value do we get from this Liveness check?
There was a problem hiding this comment.
The added value is that if any of the VIP management (ingress, api or dns) gets faulty for any cause, we'll restart keepalived.
|
Verified with success that the IP was moved to brext and the VIPs did as well. The downtime for reconfiguration of the interfaces was of about 90seconds. |
|
/hold cancel |
When using CNV or other operators that modify how the node is connected to the network, we may end up in the case where the configured VRRP interface no longer has an address in the network that it is configured to hold virtual IPs in. This patch takes a page from what we do for HAProxy and adds a monitor side car container that checks keepalived and reloads it when necessary. Signed-off-by: Antoni Segura Puimedon <antoni@redhat.com>
|
/retest |
|
looks good. /lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: celebdor, kikisdeliveryservice, phoracek, runcom, yboaron The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@celebdor: All pull requests linked via external trackers have merged. Bugzilla bug 1751978 has been moved to the MODIFIED state. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
When using CNV or other operators that modify how the node is connected to the network, we may end up in the case where the configured VRRP interface no longer has an address in the network that it is configured to hold virtual IPs in. This patch takes a page from what we do for HAProxy and adds a monitor side car container that checks keepalived and reloads it when necessary. This ports openshift#1124 to OpenStack platform, alongside with fixes from openshift#1508 and openshift#1604.
When using CNV or other operators that modify how the node is connected to the network, we may end up in the case where the configured VRRP interface no longer has an address in the network that it is configured to hold virtual IPs in. This patch takes a page from what we do for HAProxy and adds a monitor side car container that checks keepalived and reloads it when necessary. This ports openshift#1124 to OpenStack platform, alongside with fixes from openshift#1508 and openshift#1604.
When using CNV or other operators that modify how the node is connected to the network, we may end up in the case where the configured VRRP interface no longer has an address in the network that it is configured to hold virtual IPs in. This patch takes a page from what we do for HAProxy and adds a monitor side car container that checks keepalived and reloads it when necessary. This ports openshift#1124 to OpenStack platform, alongside with fixes from openshift#1508 and openshift#1604.
When using CNV or other operators that modify how the node is connected
to the network, we may end up in the case where the configured VRRP
interface no longer has an address in the network that it is configured
to hold virtual IPs in.
This patch takes a page from what we do for HAProxy and adds a monitor
side car container that checks keepalived and reloads it when necessary.
Fixes: #1751978
Signed-off-by: Antoni Segura Puimedon antoni@redhat.com