Fix issue 17037 - std.concurrency has random segfaults#5004
Fix issue 17037 - std.concurrency has random segfaults#5004dlang-bot merged 1 commit intodlang:masterfrom
Conversation
|
Thanks for your pull request and interest in making D better, @WalterWaldron! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "master + phobos#5004" |
4766211 to
3bd0258
Compare
std/concurrency.d
Outdated
| Thread.sleep(dur!("msecs")( 10 )); | ||
| else | ||
| dosleep = true; | ||
| GC.collect; |
There was a problem hiding this comment.
A comment would be nice to explain why doing a collection helps here as it's not immediately obvious.
There was a problem hiding this comment.
std.concurrency is designed around having a global variable (scheduler) set once at the outset. There is no synchronization for this variable and the implementation does not appear to support changing it on the fly.
However we need to test with both implementations of Scheduler: ThreadScheduler and FiberScheduler.
This function waits until it is the only thread before modifying scheduler (i.e. it's a mutual exclusion hack.)
Collection helps because threads can wait on the finalization action of other threads (e.g. waiting for OwnerTerminated exceptions initiated by static ~this.)
There was a problem hiding this comment.
Thanks for the explanation, I actually meant a comment in the source. I suggest something like // wait for all other threads to terminate, using GC.collect to trigger finalizers which may terminate threads (e.g. OwnerTerminated or LinkTerminated) at the top of the loop.
There was a problem hiding this comment.
I know, I was giving the explanation as interim to updating the PR.
|
Updated according to feedback. |
|
@wilzbach in the same vein as #5515 (comment) why hasn't the bot suggested a reviewer? |
|
@MartinNowak ping please! this has been open for 7 months! |
Same answer as in #5515 (comment) (we turned the feature of due to too much noise), but #5573 looks very promising. |
|
This has been all green for awhile now which I think should be a pretty good indicator that it at least shouldn't break anything, and if it does we can revert it. Unfortunately Martin is a pretty busy guy so it's hard to say when he'll get to this. |
It can't break code because it only modifies the unittests. My changes are:
|
What I was getting at is any unittest build would fail if one of the tests was broken. It's pretty common for people to run the full test suite for Phobos locally, and would also break the auto tester. |
JackStouffer
left a comment
There was a problem hiding this comment.
I don't see any problems with this. I'll leave this open for two or three more days and merge if no one has any more comments.
std/concurrency.d
Outdated
| Thread.sleep(dur!("msecs")( 10 )); | ||
| else | ||
| sleepFirst = true; | ||
| GC.collect; |
There was a problem hiding this comment.
That looks frightening? Do we really only send Owner/LinkTerminated messages when the thread object get's collected? Sounds horribly unreliable as the other peer might hold some (implicit) reference to the thread.
If so we should add some onThreadExit hook to core.thread or wrap the thread function with some scope (exit) guard.
There was a problem hiding this comment.
Do we really only send Owner/LinkTerminated messages when the thread object get's collected?
They get sent when the module destructor is run for threads, and via scope(exit) for fibers, so the comment I added must be wrong.
I have look at the code again (it's been so long since I made this PR) to see whether this was just a hack for force bad tests to hang (instead of random failures,) or whether it was necessary.
|
|
||
| changeScheduler(new ThreadScheduler); | ||
| scheduler.spawn(testdg); | ||
| assert(receiveOnly!bool()); |
There was a problem hiding this comment.
Somewhat unclear, is this really the last life-signal of thread being spawned?
There was a problem hiding this comment.
Yes, I made it so that the test result (failure/success) was communicated back to the main thread instead of relying on exceptions being re-thrown (like it had been previously.)
Typically it was failing like this:
The problem is that the code being tested references the global variable ( |
|
I've removed the |
|
I think it's still necessary to serialize changing the scheduler, however I don't think the |
|
Is this still "frightening". More review needed? |
https://issues.dlang.org/show_bug.cgi?id=17037