Add an initial delay to the router liveness probe by pweil- · Pull Request #1658 · openshift/origin

pweil- · 2015-04-08T22:49:39Z

The router is experiences issues with the liveness probe where it is probing the socket before the plugin has a chance to write out the initial configs and start the router.

There were two issues to address:

The router never started until a watch event was received, fixed by The router should update config immediately on startup #1603
The probe occurred before haproxy was started

The combination of writing the initial config plus the initial delay should help reduce the chance the probe fires before HAProxy is started. However we still have a case where, if the router is writing a large file, we can probe before the router is ready.

We discussed in IRC the possibility of having the kubelet backoff on probe failure based on configuration which would ultimately be the right solution. As @jimmidyson pointed out in #1603 (comment) it's very likely that a lot of containers may need to perform X amount of variable size work before they are considered ready and a static delay would not cover it.

@sdodson could you give this a test manually (the combination of #1603 plus the delay) in your environment? I was unable to reproduce the loop after at least 10 e2e runs with this combination.

Reference issues:
#1504
#1575

@sdodson @jimmidyson @smarterclayton

smarterclayton · 2015-04-08T22:54:08Z

Can you spawn an upstream issue and lay out the use cases you and Jimmy can think of where the static delay is insufficient?

Jitter + backoff are probably indicated for problems like this.

On Apr 8, 2015, at 6:49 PM, Paul notifications@github.com wrote:

The router is experiences issues with the liveness probe where it is probing the socket before the plugin has a chance to write out the initial configs and start the router.

There were two issues to address:

The router never started until a watch event was received, fixed by #1603
The probe occurred before haproxy was started
The combination of writing the initial config plus the initial delay should help reduce the chance the probe fires before HAProxy is started. However we still have a case where, if the router is writing a large file, we can probe before the router is ready.

We discussed in IRC the possibility of having the kubelet backoff on probe failure based on configuration which would ultimately be the right solution. As @jimmidyson pointed out in #1603 (comment) it's very likely that a lot of containers may need to perform X amount of variable size work before they are considered ready and a static delay would not cover it.

@sdodson could you give this a test manually (the combination of #1603 plus the delay)? I was unable to reproduce the loop after at least 10 e2e runs with this combination.

Reference issues:
#1504
#1575

@sdodson @jimmidyson @smarterclayton

You can view, comment on, or merge this pull request online at:

#1658

Commit Summary

probe delay
File Changes

M pkg/cmd/experimental/router/router.go (1)
Patch Links:

https://github.com/openshift/origin/pull/1658.patch
https://github.com/openshift/origin/pull/1658.diff
—
Reply to this email directly or view it on GitHub.

sdodson · 2015-04-09T18:58:08Z

@pweil- I've not seen my router killed due to liveness probes since I've applied this patch. LGTM

pweil- · 2015-04-09T20:30:09Z

👍

smarterclayton · 2015-04-09T20:32:44Z

[merge], link the upstream issue here when it exists.

pweil- · 2015-04-09T20:34:51Z

10-4

openshift-bot · 2015-04-09T20:35:41Z

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/merge_pull_requests_openshift3/1516/) (Image: devenv-fedora_1242)

openshift-bot · 2015-04-09T20:35:41Z

Evaluated for origin up to 42f5a0d

Merged by openshift-bot

pweil- · 2015-04-13T13:31:54Z

Linking upstream issue kubernetes/kubernetes#6758

probe delay

42f5a0d

openshift-bot pushed a commit that referenced this pull request Apr 9, 2015

Merge pull request #1658 from pweil-/router-probe-delay

88f13de

Merged by openshift-bot

openshift-bot merged commit 88f13de into openshift:master Apr 9, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an initial delay to the router liveness probe#1658

Add an initial delay to the router liveness probe#1658
openshift-bot merged 1 commit intoopenshift:masterfrom
pweil-:router-probe-delay

pweil- commented Apr 8, 2015

Uh oh!

smarterclayton commented Apr 8, 2015

Uh oh!

sdodson commented Apr 9, 2015

Uh oh!

pweil- commented Apr 9, 2015

Uh oh!

smarterclayton commented Apr 9, 2015

Uh oh!

pweil- commented Apr 9, 2015

Uh oh!

openshift-bot commented Apr 9, 2015

Uh oh!

openshift-bot commented Apr 9, 2015

Uh oh!

pweil- commented Apr 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pweil- commented Apr 8, 2015

Uh oh!

smarterclayton commented Apr 8, 2015

Uh oh!

sdodson commented Apr 9, 2015

Uh oh!

pweil- commented Apr 9, 2015

Uh oh!

smarterclayton commented Apr 9, 2015

Uh oh!

pweil- commented Apr 9, 2015

Uh oh!

openshift-bot commented Apr 9, 2015

Uh oh!

openshift-bot commented Apr 9, 2015

Uh oh!

pweil- commented Apr 13, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants