Skip to content

Comments

DataLayer singleton bugfix#348

Merged
shelhamer merged 2 commits intoBVLC:devfrom
jeffdonahue:datalayer-singleton-bugfix
Apr 22, 2014
Merged

DataLayer singleton bugfix#348
shelhamer merged 2 commits intoBVLC:devfrom
jeffdonahue:datalayer-singleton-bugfix

Conversation

@jeffdonahue
Copy link
Contributor

I believe this fully fixes the apparent issue with the RNG mentioned in #335, (which it turns out my PR #336 did nothing to address, but was still nonetheless an improvement).

Basically, the DataLayer prefetch thread has no knowledge of the main Caffe thread's Caffe singleton, but it was making a call to Caffe::phase(), which resulted in the DataLayer constructing its own brand new Caffe singleton instance, and then, my theory at least is that somehow the destruction of this singleton instance when the thread exits isn't correctly handled and interferes with main thread memory? Not entirely sure about that explanation, but I do know that after at least 25 runs of the imagenet architecture with this change, I have not seen a segfault (vs. current dev which segfaults 25-75% of the time).

I also thought I'd uncovered a long-standing DataLayer bug as it seemed like the Caffe::phase() call in the prefetch thread should always return the default phase (TRAIN), but I added debug printouts inside the prefetch thread and ran from dev and it turns out that somehow the prefetch thread's phase is already being set correctly, but I'm pretty sure I have no idea how...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is related to anything, but rand() is not thread safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true - I had planned to followup (either tonight or tomorrow probably) with another PR which removes all uses of rand throughout the codebase (by giving the prefetch thread its own private RNG object). Thanks for the comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the rand() followup-to-be. I remember it coming up now and then so it'll be nice to have it squared away.

@shelhamer
Copy link
Member

This has been quite the chase. I'd make some claim about shared pointers, threading, and freeing but honestly I'm not sure exactly how this worked out either.

However, this does fix the issue. Thanks Jeff!

shelhamer added a commit that referenced this pull request Apr 22, 2014
Fix singleton call in data layers

Note previous crash reports were not in fact due to RNG, although they led to improvements of the RNG code.
@shelhamer shelhamer merged commit aeae177 into BVLC:dev Apr 22, 2014
@shelhamer
Copy link
Member

single shot.

@jeffdonahue jeffdonahue deleted the datalayer-singleton-bugfix branch April 22, 2014 18:37
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
Fix singleton call in data layers

Note previous crash reports were not in fact due to RNG, although they led to improvements of the RNG code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants