Less memory: Blob sharing across nets by name#1985
Less memory: Blob sharing across nets by name#1985tnarihi wants to merge 5 commits intoBVLC:masterfrom
Conversation
There was a problem hiding this comment.
Is this better than using Blob::ShareData() (and possibly ShareDiff)? E.g.
blobs_[target_blob_id]->ShareData((*other->blobs()[i]));Then you don't need UpdateBlobPointers because each net just keeps its own. I think this is more robust; ShareData() doesn't require that blob pointers be reassigned as in UpdateBlobPointers. If you assign blob pointers instead of using ShareData, UpdateBlobPointers has to know all the places that blob pointers reside and keep them in sync. That introduces tight coupling to the rest of the implementation which is not robust but rather prone to breaking.
Actually, if you do ShareData(), I think you have to reshape before sharing because ShareData checks that the shape is the same.
There was a problem hiding this comment.
Yeah, first I tried to do like that. However, I noticed that ShareData actually do sharing SyncedMemory instance (shared_ptr), and if you reshape the blob after sharing it and the allocated capacity is not large enough as you need for reshaping, it will create a new synced memory instance. Then, the coupling will be gone. This will happen in the special case such as an input layer produces different shape blobs for each data. That is why I decide to share blobs directly. But, yes, your point is right. Using UpdateBlobPointers makes it a little complicated. If you have any better idea to implement this, please help me. Thanks!
which save memory consumptions. This should be called in solver initialization. In the current implementation, `Solver` intializes the training and testing nets independently. That leads to much memory consumption.
632ce31 to
d0021f8
Compare
2c70614 to
e8a9b02
Compare
|
any news about this pr? |
This PR implements blob sharing feature which can reduce memory consumption in solver. In the previous implementation, blobs storing intermediate results are independently used among different nets, which usually causes redundant memory consumption for blob data. In this PR, Nets in solver, i.e.
net,test_nets, share the blobs by name to save memory. However, in the case that the shapes of blobs over nets have different shapes, e.g. different batch size, it will break. Layers except data layers is okay since there are reshaped before everyForwardcall. I made some changes that data layers always reshape their top blobs inForward, and it will changes the default behavior ofDummyDataLayer. Another issue is that it will break if the nets are used simultaneously, e.g. multi-threads, but I believe nobody does like that so far.I have only thought about standard classification net and segmentation net. I am not sure if it will work for every possible network. And I am also not sure whether this is best solution (another idea is to use global
Blobpool or memory pool) but definitely helpful for anyone who concerns memory usage. In my experiment, It reduces the memory usage 1GB-ish in VGG16 training (batch_size=16). Again, I summarize this PR here:SolverParameter. default=false)DummyDataLayer(refill_constant=false will keep the behavior but might breaks share_blobs=true)