Skip to content

fix #1362. prefetch HDF5DataLayer#2271

Closed
pclove1 wants to merge 3 commits intoBVLC:masterfrom
pclove1:hdf5-prefetch
Closed

fix #1362. prefetch HDF5DataLayer#2271
pclove1 wants to merge 3 commits intoBVLC:masterfrom
pclove1:hdf5-prefetch

Conversation

@pclove1
Copy link

@pclove1 pclove1 commented Apr 7, 2015

@jeffdonahue , @shelhamer

This PR attempts to fix #1362.
I highly recommend you to read #1362's description first to see what this PR tries to achieve.
To be short, we want to prefetch HDF5 files and avoid excessive memory usage by reading HDF5 files partially.

The below is the summary of modifications made on top of #1362.

  • override DataLayerSetUp() instead of LayerSetUp() following the existing design
  • Blobs are N-D arrays now since Blobs are N-D arrays (for N not necessarily equals 4) #1970
  • wait until a thread is joined in when destructing HDF5DataLayer
  • preserve the ability of shuffling files
  • add a unit test case where a batch is loaded from interleaving HDF5 files to cover the found error on the index math.
  • plus a few minor bug fixes

Caveats

  • don't shuffle rows in a HDF5 file due to the introduction of partial reading
    • I believe this is also true for LMDB/LevelDB cases where the users are supposed to shuffle the data when they construct them.
  • If someone needs to access the same HDF5 file concurrently, they need to make sure to have a thread-safe version of hdf5 library. (See this)

@ronghanghu
Copy link
Member

Rebased and cleaned up in #2892. Thanks the original authors @jeffdonahue and @pclove1. Authorship preserved in commit messages.

@ronghanghu ronghanghu closed this Aug 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments