-
Notifications
You must be signed in to change notification settings - Fork 489
Change keepalived healthcheck to loadbalanced API endpoint #1744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,9 +3,19 @@ path: "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl" | |
| contents: | ||
| inline: | | ||
| vrrp_script chk_ocp { | ||
| script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz" | ||
| script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LB port passed as a parameter in baremetal-runtimecfg default to 9443, I assume it will be better to render it instead of using hardcoded value [1] https://github.com/openshift/baremetal-runtimecfg/blob/master/cmd/monitor/monitor.go#L55
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Will update. |
||
| interval 1 | ||
| weight 50 | ||
| weight 6 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are we reducing weight? Are we still getting heavier weight than the bootstrap?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I should have mentioned that this was explained more extensively in the commit message that made this change: df74269 In short, we don't want the VIP to migrate to the masters until both checks are passing, so that means we need to make sure one check isn't enough to trigger a move from the bootstrap. The bootstrap priority is 50, the masters start at 40, and each check is additive so when both pass the masters are 52 and trigger the move. |
||
| rise 3 | ||
| fall 2 | ||
| } | ||
|
|
||
| vrrp_script chk_haproxy { | ||
| script "/usr/bin/curl -o /dev/null -kLfs http://localhost:50936/haproxyready" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When does /haproxyready report failure? When all the servers in the master backend are down?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I haven't made that change yet. For the purposes of this PR I wanted to limit the changes to what was necessary to change the healthcheck endpoint. I'm planning a followup to have the haproxy healthcheck fail when no backends are available. I think right now this check only fails when haproxy isn't up at all. Come to think of it, I wonder if we even need a separate haproxy check now that we're checking the loadbalanced endpoint. If haproxy is down that check is going to fail anyway. I don't think it hurts anything to have it, but it's probably worth looking into.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To complete this change I think we should : I think that the reason we see [1] in keepalived logs is the late start of HAProxy LB, it takes 3*6 seconds for haproxy-monitor to render the configuration. [1] /usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz exited with status 7
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I discussed this with Toni a bit yesterday and we realized that we actually need to keep the monitor check for the case where all loadbalancers go down, but at least one of the api servers is still up. If we just rely on keepalived to move the VIP then that wouldn't work because there's no functional haproxy for it to move to. |
||
| interval 1 | ||
| weight 6 | ||
| rise 3 | ||
| fall 2 | ||
| } | ||
|
|
||
| # TODO: Improve this check. The port is assumed to be alive. | ||
|
|
@@ -31,6 +41,7 @@ contents: | |
| } | ||
| track_script { | ||
| chk_ocp | ||
| chk_haproxy | ||
| } | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we change to only chk_ocp to only check the API status through the load balancer.