Fix Process.ExitCode on mac for killed processes#29407
Fix Process.ExitCode on mac for killed processes#29407stephentoub merged 3 commits intodotnet:masterfrom
Conversation
dotnet#26291 changed process reaping from using waitpid to waitid. This caused a regression on mac, since for processes that are killed, (on mac) waitid does not return the signal number that caused the process to terminated. We change back to waitpid for reaping children and determining the exit code. waitid is used to find terminated children. Fixes https://github.com/dotnet/corefx/issues/29370
|
Thanks a lot for fixing this so quickly @tmds, @stephentoub I suppose this will have to get cherry-picked to release branch as well? |
|
This looks good from an initial look, but I would like @stephentoub to sign off on this |
|
@joperezr yes it needs to go into release. |
| if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux)) | ||
| { | ||
| const int SIGKILL = 9; | ||
| Assert.Equal(128 + SIGKILL, p.ExitCode); |
There was a problem hiding this comment.
Can we verify the exact code on Mac? Or if not maybe a comment why?
There was a problem hiding this comment.
SIGKILL may change per platform. I grepped the Linux kernel, and it is 9 for every architecture.
From the bug report, it seems to be 9 for mac too.
Grepping the freebsd source, shows it is 9 there too.
I can remove the runtime check since probably it will pass on all platforms and add a comment about the platform difference.
| } | ||
| while (CheckInterrupted(result = waitid(idtype, static_cast<id_t>(pid), &siginfo, options))); | ||
| if (idtype == P_ALL && result == -1 && errno == ECHILD) | ||
| while (CheckInterrupted(result = waitid(P_ALL, 0, &siginfo, WEXITED | WNOHANG | WNOWAIT))); |
There was a problem hiding this comment.
Did you mean to change the pid from -1 to 0? From the man page, that would seem to restrict this to only be for child processes in the same group.
There was a problem hiding this comment.
id should be ignored for P_ALL. I've changed from -1 to 0 because id_t is unsigned and it gave me a compile error if I didn't add a cast.
There was a problem hiding this comment.
id should be ignored for P_ALL
Where is that documented? I'm not seeing it.
There was a problem hiding this comment.
Linux man page: https://linux.die.net/man/3/waitid
If idtype is P_ALL, waitid() shall wait for any children and id is ignored.
http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html has the same line.
| } | ||
| result = siginfo.si_pid; | ||
| } | ||
| return result; |
There was a problem hiding this comment.
Do we need all of this logic in native code? In 2.0, wasn't most of it in managed and we just P/Invoked for waitpid, WIFEXITED, etc.? We've been trying to keep the shims as thin as possible.
There was a problem hiding this comment.
Yes, 2.0 had pinvoke for WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG.
Those were not needed for waitid but now needed again. Personally I like having this packed together with the waitpid call and avoid the back-and-forth between native and managed. If you want, I can add back those functions.. Here or in a follow-up PR.
There was a problem hiding this comment.
Ok. We can keep it here in this PR, but I do think it'd be good to revisit it subsequently. I think we've been a bit lax about the intent of the shims and we've ended up with more native code than is desirable.
| p.Kill(); | ||
| p.WaitForExit(); | ||
|
|
||
| Assert.True(p.ExitCode > 128, "Exit code should be 128 + SIGKILL."); |
There was a problem hiding this comment.
nit, you could remove this assert now, but it's harmless...
There was a problem hiding this comment.
If we keep it, it'd be better to use InRange so that we get the actual value in the assert, e.g.
Assert.InRange(p.ExitCode, 129, int.MaxValue); // Exit code should be 128 + SIGKILLbut I agree we can just delete it at this point.
|
@dotnet/dnceng, another case of a leg getting stuck in xunit processing: |
|
Thanks, @tmds. |
|
@stephentoub no problem. I'm glad this regression won't be in 2.1 and hope this is all the fallout we'll see from the changes to the Process class. Thank you for thorough reviewing! |
* Fix Process.ExitCode on mac for killed processes dotnet#26291 changed process reaping from using waitpid to waitid. This caused a regression on mac, since for processes that are killed, (on mac) waitid does not return the signal number that caused the process to terminated. We change back to waitpid for reaping children and determining the exit code. waitid is used to find terminated children. Fixes https://github.com/dotnet/corefx/issues/29370 * TestExitCodeKilledChild: remove runtime check * TestExitCodeKilledChild: remove greater than assert
* Fix Process.ExitCode on mac for killed processes #26291 changed process reaping from using waitpid to waitid. This caused a regression on mac, since for processes that are killed, (on mac) waitid does not return the signal number that caused the process to terminated. We change back to waitpid for reaping children and determining the exit code. waitid is used to find terminated children. Fixes https://github.com/dotnet/corefx/issues/29370 * TestExitCodeKilledChild: remove runtime check * TestExitCodeKilledChild: remove greater than assert
* Fix Process.ExitCode on mac for killed processes dotnet/corefx#26291 changed process reaping from using waitpid to waitid. This caused a regression on mac, since for processes that are killed, (on mac) waitid does not return the signal number that caused the process to terminated. We change back to waitpid for reaping children and determining the exit code. waitid is used to find terminated children. Fixes https://github.com/dotnet/corefx/issues/29370 * TestExitCodeKilledChild: remove runtime check * TestExitCodeKilledChild: remove greater than assert Commit migrated from dotnet/corefx@a03f785
#26291 changed process reaping from using waitpid to waitid.
This caused a regression on mac, since for processes that are killed, (on mac) waitid does not return the signal number that caused the process to terminated.
We change back to waitpid for reaping children and determining the exit code. waitid is used to find terminated children.
Fixes https://github.com/dotnet/corefx/issues/29370
CC @stephentoub @akoeplinger @danmosemsft @ViktorHofer @rainersigwald