Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ contents:
listen health_check_http_url
bind :::50936 v4v6
mode http
monitor-uri /readyz
monitor-uri /haproxyready
option dontlognull
listen stats
bind localhost:{{`{{ .LBConfig.StatPort }}`}}
Expand All @@ -35,5 +35,5 @@ contents:
option log-health-checks
balance roundrobin
{{`{{- range .LBConfig.Backends }}
server {{ .Host }} {{ .Address }}:{{ .Port }} weight 1 verify none check check-ssl inter 3s fall 2 rise 3
server {{ .Host }} {{ .Address }}:{{ .Port }} weight 1 verify none check check-ssl inter 1s fall 2 rise 3
{{- end }}`}}
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ contents:
livenessProbe:
initialDelaySeconds: 10
httpGet:
path: /readyz
path: /haproxyready
port: 50936
terminationMessagePolicy: FallbackToLogsOnError
imagePullPolicy: IfNotPresent
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,19 @@ path: "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl"
contents:
inline: |
vrrp_script chk_ocp {
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we change to only chk_ocp to only check the API status through the load balancer.

Copy link
Copy Markdown
Contributor

@yboaron yboaron Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LB port passed as a parameter in baremetal-runtimecfg default to 9443, I assume it will be better to render it instead of using hardcoded value

[1] https://github.com/openshift/baremetal-runtimecfg/blob/master/cmd/monitor/monitor.go#L55

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will update.

interval 1
weight 50
weight 6
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we reducing weight? Are we still getting heavier weight than the bootstrap?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned that this was explained more extensively in the commit message that made this change: df74269

In short, we don't want the VIP to migrate to the masters until both checks are passing, so that means we need to make sure one check isn't enough to trigger a move from the bootstrap. The bootstrap priority is 50, the masters start at 40, and each check is additive so when both pass the masters are 52 and trigger the move.

rise 3
fall 2
}

vrrp_script chk_haproxy {
script "/usr/bin/curl -o /dev/null -kLfs http://localhost:50936/haproxyready"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does /haproxyready report failure? When all the servers in the master backend are down?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made that change yet. For the purposes of this PR I wanted to limit the changes to what was necessary to change the healthcheck endpoint. I'm planning a followup to have the haproxy healthcheck fail when no backends are available.

I think right now this check only fails when haproxy isn't up at all. Come to think of it, I wonder if we even need a separate haproxy check now that we're checking the loadbalanced endpoint. If haproxy is down that check is going to fail anyway. I don't think it hurts anything to have it, but it's probably worth looking into.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To complete this change I think we should :
A. Handle the firewall rule add/delete by Keepalived instead of haproxy-monitor
B. Haproxy-monitor should be responsible only for generating/updating haproxy configuration.

I think that the reason we see [1] in keepalived logs is the late start of HAProxy LB, it takes 3*6 seconds for haproxy-monitor to render the configuration.
Maybe we should also change the monitor polling timing.

[1] /usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz exited with status 7

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed this with Toni a bit yesterday and we realized that we actually need to keep the monitor check for the case where all loadbalancers go down, but at least one of the api servers is still up. If we just rely on keepalived to move the VIP then that wouldn't work because there's no functional haproxy for it to move to.

interval 1
weight 6
rise 3
fall 2
}

# TODO: Improve this check. The port is assumed to be alive.
Expand All @@ -31,6 +41,7 @@ contents:
}
track_script {
chk_ocp
chk_haproxy
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ path: "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl"
contents:
inline: |
vrrp_script chk_ocp {
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
interval 1
weight 50
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ path: "/etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl"
contents:
inline: |
vrrp_script chk_ocp {
script "/usr/bin/curl -o /dev/null -kLfs https://0:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
script "/usr/bin/curl -o /dev/null -kLfs https://0:9443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
interval 1
weight 50
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ contents:
{{ if .Infra.Status.PlatformStatus.VSphere -}}
{{ if .Infra.Status.PlatformStatus.VSphere.APIServerInternalIP -}}
vrrp_script chk_ocp {
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:6443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
script "/usr/bin/curl -o /dev/null -kLfs https://localhost:9443/readyz && /usr/bin/curl -o /dev/null -kLfs http://localhost:50936/readyz"
interval 1
weight 50
}
Expand Down