During shutdown, don't let Conmon die with Systemd CGroups#3474
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mheon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
f596d5d to
54e13e8
Compare
There was a problem hiding this comment.
I'd the opposite here: set TimeoutStopUSec to math.MaxUint64 and also add newProp("KillSignal", syscall.SIGUSR1)) that is used internally by conmon and it won't be forwarded to the container process.
In this way we won't lose the retcode from the container.
There was a problem hiding this comment.
Oooh. I like that. Will do.
137ae13 to
e093197
Compare
|
Repushed. New version doesn't use 0, and instead uses a longer timeout with a different stop signal that won't kill Conmon. |
|
I'm gonna knock off the WIP |
e093197 to
5a155ef
Compare
|
LGTM |
|
Rebased and repushed, let's see if the tests go green |
5a155ef to
a32672a
Compare
|
Hmm. The F29/F30 failures sort of make sense, though I can't replicate locally - systemd cgroups are in use. The ubuntu ones are just flakes, not Cgroup issues - makes sense, no systemd cgroups there. |
|
☔ The latest upstream changes (presumably #3143) made this pull request unmergeable. Please resolve the merge conflicts. |
|
@mheon This seems to have gotten lost. Could you rebase? |
|
Ack. I'll try and push this the rest of the way. |
If we set SIGUSR1, we won't kill Conmon or force a SIGTERM to be sent to the container. Instead, the container will exit as per usual during system shutdown, and conmon will remain alive to record its exit status. Use a 10 minute timeout so we don't permanently halt system shutdown if an error occurs. Signed-off-by: Matthew Heon <matthew.heon@pm.me>
a32672a to
d2f96d0
Compare
|
☔ The latest upstream changes (presumably #3959) made this pull request unmergeable. Please resolve the merge conflicts. |
|
@mheon Needs a rebase. Do we need this for a release? |
|
Nope. We can go without it.
…On Thu, Sep 12, 2019, 16:18 Daniel J Walsh ***@***.***> wrote:
@mheon <https://github.com/mheon> Needs a rebase. Do we need this for a
release?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3474>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB3AOCABC7DBKPQVOGJEMIDQJKPZTANCNFSM4H44M6YA>
.
|
|
Probably not making 1.6.0 |
|
needs a rebase |
|
@mheon ping |
|
@mheon what is the scoop on this one? |
|
CI errors were proving very difficult to track down, and there wasn't much priority since we resolved the problem in other ways. Still would be nice to get in if I can find time. |
|
This pull request had no activity for 30 days. In the absence of activity or the "do-not-close" label, the pull request will be automatically closed within 7 days. |
|
@mheon: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
@mheon Still interested in this one? |
This should immediately hit Conmon with a SIGKILL, preventing potential shutdown ordering issues.