Skip to content

Change keepalived healthcheck to loadbalanced API endpoint#1744

Closed
cybertron wants to merge 3 commits intoopenshift:masterfrom
cybertron:keepalived-lb-api
Closed

Change keepalived healthcheck to loadbalanced API endpoint#1744
cybertron wants to merge 3 commits intoopenshift:masterfrom
cybertron:keepalived-lb-api

Conversation

@cybertron
Copy link
Copy Markdown
Member

We don't necessarily want to move the API VIP just because the API
server on the local node goes down. HAProxy will happily continue to
distribute traffic to other nodes (as long as one is still up) and
moving the VIP may cause connection disruptions like the ones we are
seeing in CI.

This switches the keepalived healthcheck to look at port 9443, which
is where HAProxy listens, instead of 6443, which is just the local API
server.

- Description for the changelog
Change the keepalived healthcheck to avoid unnecessary failover of the API VIP.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cybertron
To complete the pull request process, please assign ericavonb
You can assign the PR to them by writing /assign @ericavonb in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cybertron cybertron force-pushed the keepalived-lb-api branch from 8980160 to ffe175a Compare May 20, 2020 20:46
@cybertron
Copy link
Copy Markdown
Member Author

/hold

If this proves to solve the baremetal api flake issues, I need to apply it to the other platforms.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 20, 2020
@cybertron
Copy link
Copy Markdown
Member Author

/test e2e-metal-ipi

@cybertron cybertron force-pushed the keepalived-lb-api branch from ffe175a to f1f5590 Compare May 22, 2020 21:48
@openshift-ci-robot openshift-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 7, 2020
cybertron added 3 commits June 9, 2020 15:48
It seems that /readyz is intercepting our healthchecks to the api,
even though monitor-uri is only defined in a different listen block.

This is probably good anyway as it clarifies that this is not a
healthcheck against the kubernetes api.
We don't necessarily want to move the API VIP just because the API
server on the local node goes down. HAProxy will happily continue to
distribute traffic to other nodes (as long as one is still up) and
moving the VIP may cause connection disruptions like the ones we are
seeing in CI.

This switches the keepalived healthcheck to look at port 9443, which
is where HAProxy listens, instead of 6443, which is just the local API
server.

In order for this to work sanely, it was also necessary to change the
timings on the healthchecks in keepalived and HAProxy. Previously,
HAProxy had a check interval of 3 seconds and required two failing
checks to mark a backend down. This meant up to 6 seconds to notice
a dead backend. Keepalived had a check interval of only 1 second and
no grace period for failed tests, which means it would trigger a
failover much sooner than 6 seconds. Depending on how HAProxy chooses
to distribute requests, it's possible the keepalived healthcheck would
get sent to the down backend and trigger a VIP failover.

Since we'd prefer to let HAProxy handle outages (as moving the VIP
may break connections), let's swap those timings so that HAProxy is
checking every second with a fall value of 2 (so two failed tests to
mark backend down), while keepalived checks every 2 seconds, also with
a fall value of 2. This should give HAProxy time to notice an outage
before the second keepalived healthcheck (HAProxy will respond within
2 seconds, where keepalived will take between 2 and 4 seconds). This
should still result in acceptably quick failover but hopefully avoid
doing so unnecessarily.
Currently it's difficult/impossible to tell which check caused a
failover of the VIP. Split the check so there's a separate one for
each service and we can see which failed.

The weight for each check is also changed to 6. The reason for this
number is that the bootstrap's priority is 50, while the masters
start at 40. Each passing vrrp_script adds its weight to the priority,
and since we don't want the masters to take the VIP until both haproxy
and the api are up, we need to make sure that the master priority
is only >50 when both checks pass.
@cybertron cybertron force-pushed the keepalived-lb-api branch from f1f5590 to df74269 Compare June 9, 2020 20:52
@openshift-ci-robot openshift-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 9, 2020
inline: |
vrrp_script chk_ocp {
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we change to only chk_ocp to only check the API status through the load balancer.

script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"
interval 1
weight 50
weight 6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we reducing weight? Are we still getting heavier weight than the bootstrap?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned that this was explained more extensively in the commit message that made this change: df74269

In short, we don't want the VIP to migrate to the masters until both checks are passing, so that means we need to make sure one check isn't enough to trigger a move from the bootstrap. The bootstrap priority is 50, the masters start at 40, and each check is additive so when both pass the masters are 52 and trigger the move.

}

vrrp_script chk_haproxy {
script "/usr/bin/curl -o /dev/null -kLfs http://localhost:50936/haproxyready"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does /haproxyready report failure? When all the servers in the master backend are down?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made that change yet. For the purposes of this PR I wanted to limit the changes to what was necessary to change the healthcheck endpoint. I'm planning a followup to have the haproxy healthcheck fail when no backends are available.

I think right now this check only fails when haproxy isn't up at all. Come to think of it, I wonder if we even need a separate haproxy check now that we're checking the loadbalanced endpoint. If haproxy is down that check is going to fail anyway. I don't think it hurts anything to have it, but it's probably worth looking into.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To complete this change I think we should :
A. Handle the firewall rule add/delete by Keepalived instead of haproxy-monitor
B. Haproxy-monitor should be responsible only for generating/updating haproxy configuration.

I think that the reason we see [1] in keepalived logs is the late start of HAProxy LB, it takes 3*6 seconds for haproxy-monitor to render the configuration.
Maybe we should also change the monitor polling timing.

[1] /usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz exited with status 7

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with Toni a bit yesterday and we realized that we actually need to keep the monitor check for the case where all loadbalancers go down, but at least one of the api servers is still up. If we just rely on keepalived to move the VIP then that wouldn't work because there's no functional haproxy for it to move to.

inline: |
vrrp_script chk_ocp {
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"
Copy link
Copy Markdown
Contributor

@yboaron yboaron Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LB port passed as a parameter in baremetal-runtimecfg default to 9443, I assume it will be better to render it instead of using hardcoded value

[1] https://github.com/openshift/baremetal-runtimecfg/blob/master/cmd/monitor/monitor.go#L55

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update.

@stbenjam
Copy link
Copy Markdown
Member

Needs rebase

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@cybertron: The following tests failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
ci/prow/e2e-aws df74269 link /test e2e-aws
ci/prow/e2e-metal-ipi df74269 link /test e2e-metal-ipi
ci/prow/e2e-ovn-step-registry df74269 link /test e2e-ovn-step-registry
ci/prow/e2e-aws-proxy df74269 link /test e2e-aws-proxy

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cybertron
Copy link
Copy Markdown
Member Author

This is now being handled by #1893

@cybertron cybertron closed this Aug 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants