Add an initial delay to the router liveness probe#1658
Add an initial delay to the router liveness probe#1658openshift-bot merged 1 commit intoopenshift:masterfrom
Conversation
|
Can you spawn an upstream issue and lay out the use cases you and Jimmy can think of where the static delay is insufficient? Jitter + backoff are probably indicated for problems like this.
|
|
@pweil- I've not seen my router killed due to liveness probes since I've applied this patch. LGTM |
|
👍 |
|
[merge], link the upstream issue here when it exists. |
|
10-4 |
|
continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/1516/) (Image: devenv-fedora_1242) |
|
Evaluated for origin up to 42f5a0d |
|
Linking upstream issue kubernetes/kubernetes#6758 |
The router is experiences issues with the liveness probe where it is probing the socket before the plugin has a chance to write out the initial configs and start the router.
There were two issues to address:
The combination of writing the initial config plus the initial delay should help reduce the chance the probe fires before HAProxy is started. However we still have a case where, if the router is writing a large file, we can probe before the router is ready.
We discussed in IRC the possibility of having the kubelet backoff on probe failure based on configuration which would ultimately be the right solution. As @jimmidyson pointed out in #1603 (comment) it's very likely that a lot of containers may need to perform X amount of variable size work before they are considered ready and a static delay would not cover it.
@sdodson could you give this a test manually (the combination of #1603 plus the delay) in your environment? I was unable to reproduce the loop after at least 10 e2e runs with this combination.
Reference issues:
#1504
#1575
@sdodson @jimmidyson @smarterclayton