[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch#2892
[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch#2892ronghanghu wants to merge 3 commits intoBVLC:masterfrom
Conversation
|
It might be best if @erictzeng takes a look at this given his recent hdf5 work. |
2b7c2e4 to
70168ba
Compare
|
If I understand correctly, shuffling is still preserved in |
|
This still shuffles the order of hdf5 files but no longer shuffles rows within hdf5. Generally rows >> files so shuffling is limited by this change. This is no different than lmdb / leveldb however. |
70168ba to
11d0d74
Compare
Adapt HDF5DataLayer Prefetch to BVLC#2836
|
@shelhamer OK, I see. This may be a serious drawback of this PR. Instead of this PR, we can also keep the current shuffle behavior and just implement the prefetch (using additional prefetch memory blob, like in other prefetch data layers). I am hacking that directly based upon #2870. |
|
Another disadvantage of the PR (which I didn't realize until I started using HDF5DataLayer myself after writing the initial version of this PR) is the optimization in the case of the entire dataset being a single HDF5 file -- with the current HDF5DataLayer, the entire file is loaded into memory initially and nothing is ever read from disk again. I'm not sure an HDF5 prefetching PR should be merged until the optimization of this important special case is somehow brought back. |
Cleaned up #2271 to adapt to #2836. Original authors are @jeffdonahue and @pclove1
Note: this adds prefetching but disables shuffling rows for hdf5, and still preserve shuffling files.
DO NOT MERGE