Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Process: avoid performing operations on a different process with recycled pid#28404

Merged
stephentoub merged 8 commits intodotnet:masterfrom
tmds:check_exited
May 29, 2018
Merged

Process: avoid performing operations on a different process with recycled pid#28404
stephentoub merged 8 commits intodotnet:masterfrom
tmds:check_exited

Conversation

@tmds
Copy link
Copy Markdown
Member

@tmds tmds commented Mar 23, 2018

With recent changes to the Process class, we know when our child processes have exited.
We can use this information to ensure a process instance doesn't start hitting a different process with a recycled pid.

Requires #27763

Example code:

            var processes = new Dictionary<int, Process>();
            for (int i = 0; i < 1_000_000; i++)
            {
                if (i % 100 == 0)
                {
                    Console.Write(".");
                }

                var process = new Process();
                process.StartInfo.FileName = "/usr/bin/sleep";
                process.StartInfo.Arguments = "5";
                process.EnableRaisingEvents = true;
                process.Start();

                Process recycled;
                if (processes.TryGetValue(process.Id, out recycled))
                {
                    Console.WriteLine("Killing recycled");
                    recycled.Kill();
                    recycled.WaitForExit();

                    Console.WriteLine("New process exited? " + process.HasExited);
                    return;
                }
                else
                {
                    processes.Add(process.Id, process);
                }

                process.Kill();
                process.WaitForExit();
            }

This code will throw when the Kill operation targets a recycled pid:

System.InvalidOperationException: Cannot process request because the process (12993) has exited.

CC @stephentoub @danmosemsft @wfurt

@tmds
Copy link
Copy Markdown
Member Author

tmds commented Mar 23, 2018

unrelated CI fail in ClientAsyncAuthenticate_AllServerVsIndividualClientSupportedProtocols_Success(clientProtocol: Tls) on OSX.1012:

   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult) in /Users/dotnet-bot/j/workspace/dotnet_corefx/master/osx-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/System.Net.Sockets/src/System/Net/Sockets/Socket.cs:line 2430
   at System.Net.Sockets.TcpClient.EndConnect(IAsyncResult asyncResult) in /Users/dotnet-bot/j/workspace/dotnet_corefx/master/osx-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/System.Net.Sockets/src/System/Net/Sockets/TCPClient.cs:line 334
   at System.Net.Sockets.TcpClient.<>c.<ConnectAsync>b__27_1(IAsyncResult asyncResult) in /Users/dotnet-bot/j/workspace/dotnet_corefx/master/osx-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/System.Net.Sockets/src/System/Net/Sockets/TCPClient.cs:line 278
   at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization) in /Users/buildagent/agent/_work/482/s/src/mscorlib/src/System/Threading/Tasks/FutureFactory.cs:line 533
--- End of stack trace from previous location where exception was thrown ---
   at System.Net.Security.Tests.ClientAsyncAuthenticateTest.ClientAsyncSslHelper(EncryptionPolicy encryptionPolicy, SslProtocols clientSslProtocols, SslProtocols serverSslProtocols) in /Users/dotnet-bot/j/workspace/dotnet_corefx/master/osx-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/System.Net.Security/tests/FunctionalTests/ClientAsyncAuthenticateTest.cs:line 156
   at System.Net.Security.Tests.ClientAsyncAuthenticateTest.ClientAsyncAuthenticate_AllServerVsIndividualClientSupportedProtocols_Success(SslProtocols clientProtocol) in /Users/dotnet-bot/j/workspace/dotnet_corefx/master/osx-TGroup_netcoreapp+CGroup_Debug+AGroup_x64+TestOuter_false_prtest/src/System.Net.Security/tests/FunctionalTests/ClientAsyncAuthenticateTest.cs:line 107
--- End of stack trace from previous location where exception was thrown ---

@karelz karelz added this to the 2.1.0 milestone Mar 27, 2018
@karelz karelz removed this from the 2.1.0 milestone Mar 28, 2018
@stephentoub
Copy link
Copy Markdown
Member

@dotnet-bot test this please

@tmds
Copy link
Copy Markdown
Member Author

tmds commented Mar 29, 2018

I want to update the test that is added in #27763 to cover this. I'll do that when that PR is merged.


if (GetWaitState().GetExited(out _, refresh))
{
throw new InvalidOperationException(SR.Format(SR.ProcessHasExited, _processId.ToString(CultureInfo.CurrentCulture)));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the CultureInfo.CurrentCulture needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lifted this exception from GetProcessHandle. I don't know if it is needed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove it. It won't affect the default formatting.

{
HaveId = 0x1,
IsLocal = 0x2,
HaveNonExitedId = HaveId | 0x4,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was nice of Nt to leave a gap here :)

}
if (pws == null)
{
pws = new ProcessWaitState(processId, isChild: false, exitTime);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand this better? It's not clear to me from this code nor from the comment why a different ProcessWaitState object helps with anything. What's the key difference between the pws we nulled out and this one we're creating?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key difference is the new pws isn't in the exited state.
If we have a non-child Process with a recycled PID and we'd reused the exited pws, then the Process is unusable because HaveNonExitedId throws.

@danmoseley danmoseley added this to the 2.1.0 milestone Apr 2, 2018
@danmoseley
Copy link
Copy Markdown
Member

I suppose it is unlikely this would help fix https://github.com/dotnet/corefx/issues/28668

@tmds
Copy link
Copy Markdown
Member Author

tmds commented Apr 11, 2018

@dotnet-bot test this please

@danmoseley
Copy link
Copy Markdown
Member

@tmds @stephentoub given all the churn to pid management in the last month and we are about to enter ask -mode for 2.1 changes, do we need some focused stress testing to reduce risk here?

@stephentoub
Copy link
Copy Markdown
Member

Yes. I'm not comfortable with the level of churn and the associated risks on stability. I'm concerned we've been fixing some niche issues that could cause problems for some, but only in rare conditions, and in exchange potentially causing problems for main-stream cases. It's possible of course everything is fine, but...

@danmoseley
Copy link
Copy Markdown
Member

@stephentoub what do you suggest? Defer changes (which) or test more first? Perhaps take this change then do focused testing, based on that choose if and whether it is necessary to revert any part for 2.1?

Testing would have to cover distro /Mac matrix.

@stephentoub
Copy link
Copy Markdown
Member

I'd suggest testing more first. I'm not convinced this PR is necessary for 2.1.

@danmoseley
Copy link
Copy Markdown
Member

@ViktorHofer could you please look at this and related pid-tracking changes (@stephentoub can list them perhaps) and devise and execute a plan for stress tests? We can use CI/official runs to cover the matrix.

@stephentoub
Copy link
Copy Markdown
Member

The most recent two shown here: https://github.com/dotnet/corefx/commits/master/src/System.Diagnostics.Process/src?author=tmds. They overhaul how process waiting works.

@tmds
Copy link
Copy Markdown
Member Author

tmds commented Apr 11, 2018

These are the related PRs:

The first PR is the most important one. It is also the most risky/churny.
The second (and this PR) are about recycled pids. I think they solve issues that may occur in practice, but with a low probability. So I agree, this isn't must-have for 2.1.

edit: #26291 is in preview2 so it will get some external 'testing' too.
edit2: #27763 included some fixes. These were afaik not fixing issues introduced by #26291, rather the test added in that PR revealed some existing issues.

@stephentoub stephentoub removed this from the 2.1.0 milestone Apr 14, 2018
@stephentoub stephentoub added this to the 2.2.0 milestone Apr 14, 2018
@ViktorHofer
Copy link
Copy Markdown
Member

@ViktorHofer could you please look at this and related pid-tracking changes (@stephentoub can list them perhaps) and devise and execute a plan for stress tests? We can use CI/official runs to cover the matrix.

sorry, I missed that one. does this still apply?

@danmoseley
Copy link
Copy Markdown
Member

@ViktorHofer it's less urgent but yes, I would look at the changes above by @tmds, the tests he performed locally and/or added, and recommend whether we should add more test coverage/stress tests.

@stephentoub stephentoub reopened this May 17, 2018
@stephentoub
Copy link
Copy Markdown
Member

@dotnet-bot test this please

return;
}

if (GetWaitState().GetExited(out _, refresh))
Copy link
Copy Markdown
Member

@stephentoub stephentoub May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe combine the above into:

if ((_waitStateHolder != null || refresh) && GetWaitState().GetExited(out _, refresh))

?

@danmoseley
Copy link
Copy Markdown
Member

@tmds seems I can merge, unless you want to update per suggestion above?

@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 29, 2018

The combined check would be:

if ((_waitStateHolder == null || refresh) && GetWaitState().GetExited(out _, refresh))

I've left it separate and added a comment what we're checking for.

@stephentoub
Copy link
Copy Markdown
Member

The combined check would be:

It would? Doesn't the == need to be != ? Otherwise I'm confused.

@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 29, 2018

It would? Doesn't the == need to be != ? Otherwise I'm confused.

GetWaitState assigns _waitStateHolder when it is null. When refresh is false we don't assign _waitStateHolder since it would just say we haven't exited.

@stephentoub
Copy link
Copy Markdown
Member

Ok, but how does that work with the current check? Currently you return if _waitStateHolder == null && !refresh. So not returning is !(_waitStateHolder == null && !refresh) which is _waitStateHolder != null || refresh.

@tmds
Copy link
Copy Markdown
Member Author

tmds commented May 29, 2018

Correct. The combined check is:

if ((_waitStateHolder != null || refresh) && GetWaitState().GetExited(out _, refresh))

I think you had suggested

if (_waitStateHolder != null && refresh && GetWaitState().GetExited(out _, refresh))

but that would have been different too.

I had added a comment about what we're checking to make it more clear. Do you prefer putting things in a single if statement? with the comment?

@stephentoub
Copy link
Copy Markdown
Member

The combined check is

Yes

I think you had suggested

Yes, we both had it wrong. I fixed mine after the fact.

Do you prefer putting things in a single if statement?

Either is fine. But I wanted to make sure what's there is what you intended since the combined version you shared initially didn't match the behavior.

@stephentoub stephentoub merged commit 4e27e0a into dotnet:master May 29, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…ycled pid (dotnet/corefx#28404)

* Process: avoid performing operations on a different process with recycled pid

* ThrowIfExited: don't create ProcessWaitState(Holder) if we are not refreshing

* Handle non-child pid recycling

* ProcessWaitState: update overview comment to reflect current implementation

* Update TestProcessRecycledPid

* Add comment to ThrowIfExited

* Remove CultureInfo.CurrentCulture from _processId.ToString


Commit migrated from dotnet/corefx@4e27e0a
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants