Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Fix Process.ExitCode on mac for killed processes#29407

Merged
stephentoub merged 3 commits intodotnet:masterfrom
tmds:mac_exitcode
May 1, 2018
Merged

Fix Process.ExitCode on mac for killed processes#29407
stephentoub merged 3 commits intodotnet:masterfrom
tmds:mac_exitcode

Conversation

@tmds
Copy link
Copy Markdown
Member

@tmds tmds commented Apr 30, 2018

#26291 changed process reaping from using waitpid to waitid.
This caused a regression on mac, since for processes that are killed, (on mac) waitid does not return the signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

CC @stephentoub @akoeplinger @danmosemsft @ViktorHofer @rainersigwald

dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370
@joperezr
Copy link
Copy Markdown
Member

Thanks a lot for fixing this so quickly @tmds, @stephentoub I suppose this will have to get cherry-picked to release branch as well?

@joperezr
Copy link
Copy Markdown
Member

This looks good from an initial look, but I would like @stephentoub to sign off on this

@danmoseley
Copy link
Copy Markdown
Member

@joperezr yes it needs to go into release.

if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
const int SIGKILL = 9;
Assert.Equal(128 + SIGKILL, p.ExitCode);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we verify the exact code on Mac? Or if not maybe a comment why?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIGKILL may change per platform. I grepped the Linux kernel, and it is 9 for every architecture.
From the bug report, it seems to be 9 for mac too.
Grepping the freebsd source, shows it is 9 there too.
I can remove the runtime check since probably it will pass on all platforms and add a comment about the platform difference.

@danmoseley danmoseley requested a review from stephentoub April 30, 2018 17:38
@wtgodbe wtgodbe self-assigned this Apr 30, 2018
}
while (CheckInterrupted(result = waitid(idtype, static_cast<id_t>(pid), &siginfo, options)));
if (idtype == P_ALL && result == -1 && errno == ECHILD)
while (CheckInterrupted(result = waitid(P_ALL, 0, &siginfo, WEXITED | WNOHANG | WNOWAIT)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to change the pid from -1 to 0? From the man page, that would seem to restrict this to only be for child processes in the same group.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id should be ignored for P_ALL. I've changed from -1 to 0 because id_t is unsigned and it gave me a compile error if I didn't add a cast.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id should be ignored for P_ALL

Where is that documented? I'm not seeing it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux man page: https://linux.die.net/man/3/waitid

If idtype is P_ALL, waitid() shall wait for any children and id is ignored.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/waitid.html has the same line.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks.

}
result = siginfo.si_pid;
}
return result;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all of this logic in native code? In 2.0, wasn't most of it in managed and we just P/Invoked for waitpid, WIFEXITED, etc.? We've been trying to keep the shims as thin as possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 2.0 had pinvoke for WIFEXITED, WEXITSTATUS, WIFSIGNALED, WTERMSIG.
Those were not needed for waitid but now needed again. Personally I like having this packed together with the waitpid call and avoid the back-and-forth between native and managed. If you want, I can add back those functions.. Here or in a follow-up PR.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. We can keep it here in this PR, but I do think it'd be good to revisit it subsequently. I think we've been a bit lax about the intent of the shims and we've ended up with more native code than is desirable.

p.Kill();
p.WaitForExit();

Assert.True(p.ExitCode > 128, "Exit code should be 128 + SIGKILL.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, you could remove this assert now, but it's harmless...

Copy link
Copy Markdown
Member

@stephentoub stephentoub May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we keep it, it'd be better to use InRange so that we get the actual value in the assert, e.g.

Assert.InRange(p.ExitCode, 129, int.MaxValue); // Exit code should be 128 + SIGKILL

but I agree we can just delete it at this point.

@stephentoub
Copy link
Copy Markdown
Member

@dotnet/dnceng, another case of a leg getting stuck in xunit processing:
https://ci3.dot.net/job/dotnet_corefx/job/master/job/linux-TGroup_netcoreapp+CGroup_Release+AGroup_x64+TestOuter_false_prtest/12267/

@stephentoub
Copy link
Copy Markdown
Member

Thanks, @tmds.

@stephentoub stephentoub merged commit a03f785 into dotnet:master May 1, 2018
@karelz karelz modified the milestones: 2.1.0, 2.2.0 May 1, 2018
@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 1, 2018

@stephentoub no problem. I'm glad this regression won't be in 2.1 and hope this is all the fallout we'll see from the changes to the Process class. Thank you for thorough reviewing!

joperezr pushed a commit to joperezr/corefx that referenced this pull request May 1, 2018
* Fix Process.ExitCode on mac for killed processes

dotnet#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

* TestExitCodeKilledChild: remove runtime check

* TestExitCodeKilledChild: remove greater than assert
stephentoub pushed a commit that referenced this pull request May 2, 2018
* Fix Process.ExitCode on mac for killed processes

#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

* TestExitCodeKilledChild: remove runtime check

* TestExitCodeKilledChild: remove greater than assert
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Fix Process.ExitCode on mac for killed processes

dotnet/corefx#26291 changed process reaping
from using waitpid to waitid. This caused a regression on mac, since
for processes that are killed, (on mac) waitid does not return the
signal number that caused the process to terminated.

We change back to waitpid for reaping children and determining the
exit code. waitid is used to find terminated children.

Fixes https://github.com/dotnet/corefx/issues/29370

* TestExitCodeKilledChild: remove runtime check

* TestExitCodeKilledChild: remove greater than assert


Commit migrated from dotnet/corefx@a03f785
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants