Skip to content
This repository was archived by the owner on Oct 12, 2022. It is now read-only.
/ druntime Public archive

fix Issue 20270 - [REG2.087] Deadlock in garbage collection when runn…#2816

Merged
dlang-bot merged 3 commits intodlang:stablefrom
rainers:issue20270
Oct 29, 2019
Merged

fix Issue 20270 - [REG2.087] Deadlock in garbage collection when runn…#2816
dlang-bot merged 3 commits intodlang:stablefrom
rainers:issue20270

Conversation

@rainers
Copy link
Member

@rainers rainers commented Oct 6, 2019

…ing processes in parallel

start scan threads while the world isn't suspended

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @rainers!

Bugzilla references

Auto-close Bugzilla Severity Description
20270 regression [REG2.087] Deadlock in garbage collection when running processes in parallel

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "stable + druntime#2816"

@dlang-bot dlang-bot added the Bug Fix Include reference to corresponding bugzilla issue label Oct 6, 2019
@thewilsonator
Copy link
Contributor

Is it possible to test?

@CyberShadow
Copy link
Member

Thank you! Confirmed the fix on my machine.

Is it possible to test?

There is a small test program in the bug report. It should be suitable to be added to the test suite.

@rainers
Copy link
Member Author

rainers commented Oct 6, 2019

There is a small test program in the bug report. It should be suitable to be added to the test suite.

It depends on phobos which is not allowed in druntime. The root cause is much more basic, but I'm not sure it is easy to reproduce.

@CyberShadow
Copy link
Member

I'll try to create a smaller test case.

@CyberShadow
Copy link
Member

CyberShadow commented Oct 6, 2019

Reduced test case:

Edit: this was a potential fork bomb, will update in a minute.

@CyberShadow
Copy link
Member

I think I found a new bug while trying to create a test case:

import core.stdc.stdlib : exit;
import core.sys.posix.sys.wait;
import core.sys.posix.unistd : fork;
import core.thread : Thread;

void main()
{
	foreach (t; 0 .. 10)
		new Thread({
			foreach (n; 0 .. 100)
			{
				foreach (x; 0 .. 100)
					new ubyte[x];
				auto f = fork();
				assert(f >= 0);
				if (f == 0)
					exit(0);
				else
					waitpid(f, null, 0);
			}
		}).start();
}

This program has a high chance of never exiting, because it is waiting on forked processes which are deadlocked. The forked processes have a single thread in core.internal.spinlock.SpinLock.yield (called by Gcx.removeRange).

It looks like the GC lock was not cleared after a fork?

@CyberShadow
Copy link
Member

Reduced test case:

For this bug / PR:

import core.sys.posix.sys.wait : waitpid;
import core.sys.posix.unistd : fork, _exit;
import core.thread : Thread;

void main()
{
	foreach (t; 0 .. 10)
		new Thread({
			foreach (n; 0 .. 100)
			{
				foreach (x; 0 .. 100)
					new ubyte[x];
				auto f = fork();
				assert(f >= 0);
				if (f == 0)
					_exit(0);
				else
					waitpid(f, null, 0);
			}
		}).start();
}

@CyberShadow
Copy link
Member

I think I found a new bug while trying to create a test case:

I meant to call _exit instead of exit. The latter calls finalizers incl. GC destruction, but the fork might have happened while the GC lock was being held. This is a very separate issue. Probably, it would be good to keep the GC working after a fork, so it would need to clear its lock in its atfork handler.

@rainers
Copy link
Member Author

rainers commented Oct 6, 2019

Thanks, I added the test case.

The latter calls finalizers incl. GC destruction, but the fork might have happened while the GC lock was being held. This is a very separate issue. Probably, it would be good to keep the GC working after a fork, so it would need to clear its lock in its atfork handler.

I think there are more issues in druntime like that. For example the chain of Threads all point to dead threads in the fork-child, so joining these will fail, too.

@CyberShadow
Copy link
Member

I think there are more issues in druntime like that. For example the chain of Threads all point to dead threads in the fork-child, so joining these will fail, too.

Hmm, probably we should try to fix them one by one.

What do you think about #2817 ?

@rainers
Copy link
Member Author

rainers commented Oct 14, 2019

@CyberShadow This is working now, unfortunately a day too late for the release.

@wilzbach
Copy link
Contributor

@rainers FYI: the master-stable branchoff hasn't happened yet.

@rainers
Copy link
Member Author

rainers commented Oct 14, 2019

@wilzbach I meant the 2.088.1 release, this PR is targeting stable.

version (Windows)
alias allThreadsDead = thread_DLLProcessDetaching;
else
enum allThreadsDead = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what happened here? A DLL with running threads was unloaded, terminating them (causing them to segfault)?

Copy link
Member Author

@rainers rainers Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the process terminates while threads are still running. The threads are just terminated as well without DllMain(DLL_THREAD_DETACH) being called. DllMain(DLL_PROCESS_DETACH) is still ìnvoked, though.

…ing processes in parallel

start scan threads while the world isn't suspended
@rainers
Copy link
Member Author

rainers commented Oct 29, 2019

Rebased. Would be good to have this in the coming release as the Visual D installer is hit by the non-termination issue, too.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Bug Fix Include reference to corresponding bugzilla issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants