Skip to content

[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch#2892

Open
ronghanghu wants to merge 3 commits intoBVLC:masterfrom
ronghanghu:hdf5-prefetch
Open

[Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch#2892
ronghanghu wants to merge 3 commits intoBVLC:masterfrom
ronghanghu:hdf5-prefetch

Conversation

@ronghanghu
Copy link
Member

Cleaned up #2271 to adapt to #2836. Original authors are @jeffdonahue and @pclove1

Note: this adds prefetching but disables shuffling rows for hdf5, and still preserve shuffling files.

DO NOT MERGE

@shelhamer shelhamer added the JD label Aug 9, 2015
@shelhamer
Copy link
Member

It might be best if @erictzeng takes a look at this given his recent hdf5 work.

@ronghanghu ronghanghu force-pushed the hdf5-prefetch branch 5 times, most recently from 2b7c2e4 to 70168ba Compare August 9, 2015 23:24
@ronghanghu
Copy link
Member Author

If I understand correctly, shuffling is still preserved in FillHDF5FileData

https://github.com/ronghanghu/caffe/blob/hdf5-prefetch/src/caffe/layers/hdf5_data_layer.cpp#L61-L64

@shelhamer
Copy link
Member

This still shuffles the order of hdf5 files but no longer shuffles rows within hdf5. Generally rows >> files so shuffling is limited by this change. This is no different than lmdb / leveldb however.

Adapt HDF5DataLayer Prefetch to BVLC#2836
@ronghanghu
Copy link
Member Author

@shelhamer OK, I see. This may be a serious drawback of this PR.

Instead of this PR, we can also keep the current shuffle behavior and just implement the prefetch (using additional prefetch memory blob, like in other prefetch data layers). I am hacking that directly based upon #2870.

@ronghanghu ronghanghu changed the title Rebase and Clean up Hdf5DataLayer Prefetch [Don't Merge] Rebase and Clean up Hdf5DataLayer Prefetch Aug 10, 2015
@jeffdonahue
Copy link
Contributor

Another disadvantage of the PR (which I didn't realize until I started using HDF5DataLayer myself after writing the initial version of this PR) is the optimization in the case of the entire dataset being a single HDF5 file -- with the current HDF5DataLayer, the entire file is loaded into memory initially and nothing is ever read from disk again. I'm not sure an HDF5 prefetching PR should be merged until the optimization of this important special case is somehow brought back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments