🐛 ensure envtest stops the whole process group#3447
🐛 ensure envtest stops the whole process group#3447k8s-ci-robot merged 1 commit intokubernetes-sigs:mainfrom
Conversation
|
Welcome @dnwe! |
|
Hi @dnwe. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
755af69 to
b909de1
Compare
|
/ok-to-test |
6aaf6a1 to
48df3b6
Compare
A previous pull request attempted to prevent envtest from leaking orphaned etcd / kube-apiserver etc. processes by adding a SysProcAttr of `Setpgid: true`. However, it made a few mistakes: - It incorrectly described the change as causing the process to "inherit the parent's process group id", which is the opposite of what it does. In fact it causes a brand new process group to be created for the child process, which is the actual behaviour we want, so correct the comment. - Whilst it added a new process group, it didn't update the signalling to send SIGTERM (and eventually SIGKILL) to the process group (by using a `-` prefix) and was still just sending the signal to the process itself. - It didn't add a unittest to assert on the desired behaviour before and after the changes. This PR should hopefully fix all of those problems and result in fewer process leaks. Signed-off-by: Dominic Evans <dominic.evans@uk.ibm.com>
48df3b6 to
6c7ba77
Compare
| // GetSysProcAttr returns the SysProcAttr to use for the process, | ||
| // for unix systems this returns a SysProcAttr with Setpgid set to true, | ||
| // which inherits the parent's process group id. | ||
| // which creates a new process group for the child process. |
There was a problem hiding this comment.
While your updated description is correctly describing what already happened, I am not sure I understand why that actually matters. Unless I am missing something, we directly start the apiserver and etcd binary without an intermediate like a shell so it shouldn't make a difference if we send a signal to them directly or to the processgroup?
I've seen the stale etcd and apiserver processes too, but my theory would be that happens because the test dies and Stop() doesn't run, not because the cleanup is buggy. But maybe I am missing something here?
There was a problem hiding this comment.
Ah yes sadly you're right I think that this won't help the etcd/apiserver case. I hadn't actually checked how they were being started, I'd just spotted the leaking processes and then found #2560 which looked a bit wrong.
As you say, it seems like in pkg/internal/testing/controlplane/plane.go we just start them one after the other as singleton processes. So I think the PR is still right, in that it corrects the behaviour when a group of processes is spawned, but it's more of a housekeeping thing rather than actually improving the enduser experience when running envtest 😔
There was a problem hiding this comment.
Would it make sense to slightly change the PR description (and PR title) to surface that this won't change anything for envtest?
|
LGTM label has been added. DetailsGit tree hash: 3916cadbe6f68a45ee0daef4f00af73c88272893 |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, dnwe The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Previous PR #2560 attempted to prevent envtest from leaking orphaned etcd / kube-apiserver etc. processes by adding a SysProcAttr of
Setpgid: true. However, it made a few mistakes:-prefix) and was still just sending the signal to the process itself [1].While with this PR the whole process group is stopped when stopping envtest, this doesn't really result in behavioral changes because the processsgroup consists of one process in all circumstances (either kas or etcd).
[1] https://medium.com/@felixge/killing-a-child-process-and-all-of-its-children-in-go-54079af94773