Change keepalived healthcheck to loadbalanced API endpoint by cybertron · Pull Request #1744 · openshift/machine-config-operator

cybertron · 2020-05-20T19:42:07Z

We don't necessarily want to move the API VIP just because the API
server on the local node goes down. HAProxy will happily continue to
distribute traffic to other nodes (as long as one is still up) and
moving the VIP may cause connection disruptions like the ones we are
seeing in CI.

This switches the keepalived healthcheck to look at port 9443, which
is where HAProxy listens, instead of 6443, which is just the local API
server.

- Description for the changelog
Change the keepalived healthcheck to avoid unnecessary failover of the API VIP.

openshift-ci-robot · 2020-05-20T19:42:23Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
To complete the pull request process, please assign ericavonb
You can assign the PR to them by writing /assign @ericavonb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

cybertron · 2020-05-20T20:47:57Z

/hold

If this proves to solve the baremetal api flake issues, I need to apply it to the other platforms.

cybertron · 2020-05-21T14:02:43Z

/test e2e-metal-ipi

It seems that /readyz is intercepting our healthchecks to the api, even though monitor-uri is only defined in a different listen block. This is probably good anyway as it clarifies that this is not a healthcheck against the kubernetes api.

We don't necessarily want to move the API VIP just because the API server on the local node goes down. HAProxy will happily continue to distribute traffic to other nodes (as long as one is still up) and moving the VIP may cause connection disruptions like the ones we are seeing in CI. This switches the keepalived healthcheck to look at port 9443, which is where HAProxy listens, instead of 6443, which is just the local API server. In order for this to work sanely, it was also necessary to change the timings on the healthchecks in keepalived and HAProxy. Previously, HAProxy had a check interval of 3 seconds and required two failing checks to mark a backend down. This meant up to 6 seconds to notice a dead backend. Keepalived had a check interval of only 1 second and no grace period for failed tests, which means it would trigger a failover much sooner than 6 seconds. Depending on how HAProxy chooses to distribute requests, it's possible the keepalived healthcheck would get sent to the down backend and trigger a VIP failover. Since we'd prefer to let HAProxy handle outages (as moving the VIP may break connections), let's swap those timings so that HAProxy is checking every second with a fall value of 2 (so two failed tests to mark backend down), while keepalived checks every 2 seconds, also with a fall value of 2. This should give HAProxy time to notice an outage before the second keepalived healthcheck (HAProxy will respond within 2 seconds, where keepalived will take between 2 and 4 seconds). This should still result in acceptably quick failover but hopefully avoid doing so unnecessarily.

Currently it's difficult/impossible to tell which check caused a failover of the VIP. Split the check so there's a separate one for each service and we can see which failed. The weight for each check is also changed to 6. The reason for this number is that the bootstrap's priority is 50, while the masters start at 40. Each passing vrrp_script adds its weight to the priority, and since we don't want the masters to take the VIP until both haproxy and the api are up, we need to make sure that the master priority is only >50 when both checks pass.

celebdor · 2020-06-10T11:01:25Z

  inline: |
    vrrp_script chk_ocp {
-        script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
+        script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"


So we change to only chk_ocp to only check the API status through the load balancer.

celebdor · 2020-06-10T11:02:26Z

+        script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"
        interval 1
-        weight 50
+        weight 6


Why are we reducing weight? Are we still getting heavier weight than the bootstrap?

I should have mentioned that this was explained more extensively in the commit message that made this change: df74269

In short, we don't want the VIP to migrate to the masters until both checks are passing, so that means we need to make sure one check isn't enough to trigger a move from the bootstrap. The bootstrap priority is 50, the masters start at 40, and each check is additive so when both pass the masters are 52 and trigger the move.

celebdor · 2020-06-10T11:03:34Z

+    }
+
+    vrrp_script chk_haproxy {
+        script "/usr/bin/curl -o /dev/null -kLfs http://localhost:50936/haproxyready"


When does /haproxyready report failure? When all the servers in the master backend are down?

I haven't made that change yet. For the purposes of this PR I wanted to limit the changes to what was necessary to change the healthcheck endpoint. I'm planning a followup to have the haproxy healthcheck fail when no backends are available.

I think right now this check only fails when haproxy isn't up at all. Come to think of it, I wonder if we even need a separate haproxy check now that we're checking the loadbalanced endpoint. If haproxy is down that check is going to fail anyway. I don't think it hurts anything to have it, but it's probably worth looking into.

To complete this change I think we should :
A. Handle the firewall rule add/delete by Keepalived instead of haproxy-monitor
B. Haproxy-monitor should be responsible only for generating/updating haproxy configuration.

I think that the reason we see [1] in keepalived logs is the late start of HAProxy LB, it takes 3*6 seconds for haproxy-monitor to render the configuration.
Maybe we should also change the monitor polling timing.

[1] /usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz exited with status 7

I discussed this with Toni a bit yesterday and we realized that we actually need to keep the monitor check for the case where all loadbalancers go down, but at least one of the api servers is still up. If we just rely on keepalived to move the VIP then that wouldn't work because there's no functional haproxy for it to move to.

yboaron · 2020-06-11T10:22:44Z

  inline: |
    vrrp_script chk_ocp {
-        script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
+        script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"


LB port passed as a parameter in baremetal-runtimecfg default to 9443, I assume it will be better to render it instead of using hardcoded value

[1] https://github.com/openshift/baremetal-runtimecfg/blob/master/cmd/monitor/monitor.go#L55

Good point. Will update.

stbenjam · 2020-06-18T12:54:03Z

Needs rebase

openshift-ci-robot · 2020-07-21T22:10:47Z

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name	Commit	Details	Rerun command
ci/prow/e2e-aws	`df74269`	link	`/test e2e-aws`
ci/prow/e2e-metal-ipi	`df74269`	link	`/test e2e-metal-ipi`
ci/prow/e2e-ovn-step-registry	`df74269`	link	`/test e2e-ovn-step-registry`
ci/prow/e2e-aws-proxy	`df74269`	link	`/test e2e-aws-proxy`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

cybertron · 2020-08-25T15:16:14Z

This is now being handled by #1893

openshift-ci-robot requested review from Fedosin and sinnykumari May 20, 2020 19:42

cybertron force-pushed the keepalived-lb-api branch from 8980160 to ffe175a Compare May 20, 2020 20:46

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 20, 2020

cybertron force-pushed the keepalived-lb-api branch from ffe175a to f1f5590 Compare May 22, 2020 21:48

openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2020

cybertron added 3 commits June 9, 2020 15:48

cybertron force-pushed the keepalived-lb-api branch from f1f5590 to df74269 Compare June 9, 2020 20:52

openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2020

celebdor reviewed Jun 10, 2020

View reviewed changes

yboaron reviewed Jun 11, 2020

View reviewed changes

yboaron mentioned this pull request Jun 28, 2020

[Baremetal, Openstack,Vsphere, Ovirt] Update fall and rise values in keepalived check scripts #1721

Closed

cybertron closed this Aug 25, 2020

Conversation

cybertron commented May 20, 2020

Uh oh!

openshift-ci-robot commented May 20, 2020

Uh oh!

cybertron commented May 20, 2020

Uh oh!

cybertron commented May 21, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yboaron Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam commented Jun 18, 2020

Uh oh!

openshift-ci-robot commented Jul 21, 2020

Uh oh!

cybertron commented Aug 25, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yboaron Jun 11, 2020 •

edited

Loading