HDF5: Empiric for Optimal Chunk Size#916
Conversation
287d7ee to
fdbbf4f
Compare
|
At the moment, a naïve port of the logic causes a deadlock in the MPI tests. Stuck only in After some tests, it looks like adding chunking makes the dataset declaration |
| //for( auto const& val : parameters.chunkSize ) | ||
| // chunk_dims.push_back(static_cast< hsize_t >(val)); | ||
|
|
||
| herr_t status = H5Pset_chunk(datasetCreationProperty, chunk_dims.size(), chunk_dims.data()); |
There was a problem hiding this comment.
Fun fact: H5Pset_chunk_opts (HDF5 1.10.0+):
H5Pset_chunk_opts is used to specify storage options for chunks on the edge of a dataset’s dataspace. This capability allows the user to tune performance in cases where the dataset size may not be a multiple of the chunk size and the handling of partial edge chunks can impact performance.
a625114 to
83695d8
Compare
83695d8 to
04c0164
Compare
04c0164 to
44f72a4
Compare
|
I've added some scaffolding for JSON options in HDF5 |
0c2a55a to
34cde08
Compare
34cde08 to
22f70f0
Compare
|
Finished the global options (JSON & env) to disable chucking when needed (mainly for HiPACE's legacy pipeline + potential regressions). |
368c2c4 to
aea7e96
Compare
aea7e96 to
2d34cf6
Compare
|
It's a bit concerning that the clang sanitizer run parallel benchmark (8) runs into a time-out with the new patch: But I don't see an immediately relation. 'll turn the chunking off for this one. |
d834f9f to
ff1d13d
Compare
|
I tried out whether this PR alone already enables extensible datasets in HDF5, apparently not: According to the documentation, only certain kinds of datasets can have their extents extended:
Our datasets seem to fall into the second category. I guess, for extensible datasets, we would have to create unlimited datasets from the beginning? |
|
In order to make a dataset resizable, we need to pass Setting |
This ports a prior empirical algorithm from libSplash to determine an optimal (large) chunk size for an HDF5 dataset based on its datatype and global extent. Original implementation by Felix Schmitt @f-schmitt (ZIH, TU Dresden) in [libSplash](https://github.com/ComputationalRadiationPhysics/libSplash). Original source: - https://github.com/ComputationalRadiationPhysics/libSplash/blob/v1.7.0/src/DCDataSet.cpp - https://github.com/ComputationalRadiationPhysics/libSplash/blob/v1.7.0/src/include/splash/core/DCHelper.hpp Co-authored-by: Felix Schmitt <felix.schmitt@zih.tu-dresden.de>
The parallel, independent I/O pattern here is corner-case for what HDF5 can support, due to non-collective declarations of data sets. Testing shows that it does not work with chunking.
Runs into timeout for unclear reasons with this patch: ``` 15/32 Test openPMD#15: MPI.8_benchmark_parallel ...............***Timeout 1500.17 sec ```
ff1d13d to
16c14c7
Compare
Co-authored-by: Franz Pöschel <franz.poeschel@gmail.com>
|
|
||
| A full configuration of the HDF5 backend: | ||
|
|
||
| .. literalinclude:: hdf5.json |
There was a problem hiding this comment.
@franzpoeschel I just realized I forgot to add a hdf5.json file here 🤪
There was a problem hiding this comment.
Ah, I accidentally named it json.json
| All keys found under ``hdf5.dataset`` are applicable globally (future: as well as per dataset). | ||
| Explanation of the single keys: | ||
|
|
||
| * ``adios2.dataset.chunks``: This key contains options for data chunking via `H5Pset_chunk <https://support.hdfgroup.org/HDF5/doc/RM/H5P/H5Pset_chunk.htm>`__. |
There was a problem hiding this comment.
Ouch, that should read hdf5.dataset.chunks...
This ports a prior empirical algorithm from libSplash to determine an optimal (large) chunk size for an HDF5 dataset based on its datatype and global extent.
Original implementation by Felix Schmitt @f-schmitt (ZIH, TU Dresden) in libSplash.
Original source:
Close #406
Related to #898 (improve HDF5 baseline performance)
Required for #510: basis to extend resizable data sets (#829) to HDF5
To Do
OPENPMD_HDF5_INDEPENDENT="OFF"andOPENPMD_HDF5_ALIGNMENT="1048576"for our8_benchmark_parallel -wbenchmarkSample & bin directory
du -hs bin sampleswith:OPENPMD_HDF5_CHUNKSon my laptop (4KiB blocksize).
With
"auto"theMPI.8_benchmark_paralleltest is significantly slower. Changing the 4D test to 3D brings the difference down to about 20% slowdown #1010. (Maybe the 4th, 10-element dimension is sub-ideal for chunking?)Follow-Ups