Merged
Conversation
the phase to always be set to TRAIN (always random crops) and RNG failures
Contributor
There was a problem hiding this comment.
I'm not sure that this is related to anything, but rand() is not thread safe.
Contributor
Author
There was a problem hiding this comment.
true - I had planned to followup (either tonight or tomorrow probably) with another PR which removes all uses of rand throughout the codebase (by giving the prefetch thread its own private RNG object). Thanks for the comment.
Member
There was a problem hiding this comment.
Thanks for the rand() followup-to-be. I remember it coming up now and then so it'll be nice to have it squared away.
Member
|
This has been quite the chase. I'd make some claim about shared pointers, threading, and freeing but honestly I'm not sure exactly how this worked out either. However, this does fix the issue. Thanks Jeff! |
shelhamer
added a commit
that referenced
this pull request
Apr 22, 2014
Fix singleton call in data layers Note previous crash reports were not in fact due to RNG, although they led to improvements of the RNG code.
Member
This was referenced Apr 22, 2014
Closed
mitmul
pushed a commit
to mitmul/caffe
that referenced
this pull request
Sep 30, 2014
Fix singleton call in data layers Note previous crash reports were not in fact due to RNG, although they led to improvements of the RNG code.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

I believe this fully fixes the apparent issue with the RNG mentioned in #335, (which it turns out my PR #336 did nothing to address, but was still nonetheless an improvement).
Basically, the DataLayer prefetch thread has no knowledge of the main Caffe thread's
Caffesingleton, but it was making a call to Caffe::phase(), which resulted in the DataLayer constructing its own brand new Caffe singleton instance, and then, my theory at least is that somehow the destruction of this singleton instance when the thread exits isn't correctly handled and interferes with main thread memory? Not entirely sure about that explanation, but I do know that after at least 25 runs of the imagenet architecture with this change, I have not seen a segfault (vs. current dev which segfaults 25-75% of the time).I also thought I'd uncovered a long-standing
DataLayerbug as it seemed like the Caffe::phase() call in the prefetch thread should always return the default phase (TRAIN), but I added debug printouts inside the prefetch thread and ran from dev and it turns out that somehow the prefetch thread's phase is already being set correctly, but I'm pretty sure I have no idea how...