Inject dataset reading via `Hdf5Reader`. by 1uc · Pull Request #307 · BlueBrain/libsonata

1uc · 2023-11-07T12:30:36Z

The idea is that one injects call-backs for reading selections from datasets. These callbacks implement:

template<class T>
std::vector<T> readSelection(const HighFive::DataSet& dset, Selection& selection);
HighFive::File open_file(const std::string& filename);

The default calls _readSelection. Separate readers can implement variants suitable for their purpose, e.g. MPI-IO. The advantage is that the reader has control over both the required collective semantics; and setting HDF5 properties. This allows some readers to have non-collective behaviour, such as short-circuiting for empty selections. While others implement strictly collective reading of datasets.

Additionally, we can create a suite of readers for different usecases, e.g. MPI-IO for neurodamus, aggregating reader for serial workloads with GPFS (and a separate one for LUSTER if need be), the default implementation because it minimizes number of bytes read.

ferdonline · 2023-11-07T17:29:45Z

python/bindings.cpp

+
+    m.def("make_collective_reader",
+          [](py::object comm, bool collective_metadata, bool collective_transfer) {
+#if SONATA_HAS_MPI == 1


When SONATA_HAS_MPI is False I would rather support we dont even expose collective stuff, as it might be misleading to ask for a collective reader and get a normal one.

Summary: I think it's important that MPI is entirely optional in libsonata, even when writing code that wants to use collective MPI-IO. At most, I'd add a flag that prevents the returning a suitable default.

The lines aren't precise because likely we'll move anything MPI to its own repo. The reason is that mpi4py is a build dependency; and one can't have optional build dependencies (nicely).

Nevertheless, I think code that's written for collective IO should work even if the user fails to install mpi4py. HDF5 doesn't fail either if it can't do collective IO, instead it returns the correct values and sets an internal flag users can check. Functionally collective IO is optional. It's only important for performance reasons under certain conditions.

The risk is that installing mpi4py or libsonata with MPI support will fail to build under certain conditions. I suspect many of our users will be unable or unwilling to debug these failures. To me it's a big difference if what I'm running is a large production run that requires MPI-IO to run reasonably fast, or if it's something small (debugging) that runs fast simply because it's small. Large runs are always done on a cluster and we can expect someone adept at these issues to install the required toolchain (including MPI-enabled libsonata), even if not, there's a necessity and therefore it makes sense to spend time to debug why something failed to build. Small runs are carried out anywhere, including laptops where MPI might not be installed or it might not work correctly at that moment, etc. In my opinion I don't want to force users to have to figure these issues out just to run something that takes 2.3s with MPI-IO and 2.2s without MPI-IO. Similarly, I don't want every downstream project to remember to implement a graceful fallback.

To detect if the important applications are using MPI-IO we can:

check GPFS waiters,

run Darshan,

use ROMIO_PRINT_HINTS=1 and check the output, e.g., during module testing on the BB5.

Therefore, I think, on BB5, we're in a sufficiently strong position to notice if it's silently not using collective IO when it really should be.

ferdonline · 2023-11-07T17:49:14Z

Interesting design. As I understand you are implementing a collective reader already, so in Neurodamus we would need basically to call make_collective_reader and use it as argument in calls to the config which return storage. I think this is fine.

matz-e

Nice! If I get this correct, we can just pull out the collective read interface into it's own library?

include/bbp/sonata/hdf5_reader.h

1uc · 2023-11-08T08:59:36Z

Nice! If I get this correct, we can just pull out the collective read interface into it's own library?

Yes, almost certainly, because mpi4py is a build dependency; and one can't have optional build dependencies (nicely).

1uc · 2023-11-14T10:20:23Z

This PR has been heavily reworked since the previous round of discussions. Eventually it will only include the Hdf5Reader API. The two commits that refactor libsonata have been split into their own PRs. The implementation of an MPI-IO reader has been moved to a separate PR (until it can be moved to its own repository).

include/bbp/sonata/hdf5_reader.h

src/read_bulk.hpp

1uc · 2023-11-29T08:58:18Z

The questions for review are:

Are the rules regarding collectiveness of libsonata acceptable?
Do we need the ability to pass options to the reader on getAttribute and similar? Currently, I can only think of collective which would be used to specify if the call is collective. Things like block size, I fell would go into the reader.
Naming of Hdf5Reader.

The next review would include question like:

Can we have an optional dependency libsonata[mpi]? The advantage is that libsonata can control the version of libsonata_mpi it's compatible with, which is useful if the API of Hdf5Reader needs to change.
Can we have wrappers for libsonata_mpi.make_collective_reader in libsonata? The advantage is that one could then always do libsonata.make_collective_reader which would always succeed. If libsonata_mpi is available it will return a collective reader, otherwise it'll return the default reader.

With the above, code written to use collective IO, should automatically run even if the user can't install mpi4py or libsonata_mpi. Therefore, users can avoid the pain of installing the difficult dependencies; and devs don't need to worry about making it easy for their users.

mgeplf

LGTM, let's get #319 merged, and then hopefully this

include/bbp/sonata/hdf5_reader.h

include/bbp/sonata/population.h

src/read_bulk.hpp

mgeplf · 2023-11-30T08:06:56Z

Last chance for comments @matz-e / @ferdonline; otherwise I'll approve this at the end of the day.

1uc · 2023-12-01T08:30:22Z

@mgeplf we're in a reasonably good place to hold off on merging this. With all the preliminary work out of the way, we'll not suffer too much from merge conflicts and we can work on multiple other things (like cleaning up a bit more after #319).

We only need this in once we have it fully working and tested in neurodamus. Until then retaining the freedom to change details in might be nice.

This commit introduces the API for an Hdf5Reader. This reader abstracts the process of opening HDF5 files, and reading an `libsonata.Selection` from a dataset. The default reader calls the existing `_readSelection`.

CI_BRANCHES:BLUECONFIGS_BRANCH=weji/libsonata_mpi

## Context When using `WholeCell` load-balancing, the access pattern when reading parameters during synapse creation is extremely poor and is the main reason why we see long (10+ minutes) periods of severe performance degradation of our parallel filesystem when running slightly larger simulations on BB5. Using Darshan and several PoCs we established that the time required to read these parameters can be reduced by more than 8x and IOps can be reduced by over 1000x when using collective MPI-IO. Moreover, the "waiters" where reduced substantially as well. See BBPBGLIB-1070. Following those finding we concluded that neurodamus would need to use collective MPI-IO in the future. We've implemented most of the required changes directly in libsonata allowing others to benefit from the same optimizations should the need arise. See, BlueBrain/libsonata#309 BlueBrain/libsonata#307 and preparatory work: BlueBrain/libsonata#315 BlueBrain/libsonata#314 BlueBrain/libsonata#298 By instrumenting two simulations (SSCX and reduced MMB) we concluded that neurodamus was almost collective. However, certain attributes where read in different order on different MPI ranks. Maybe due to salting hashes differently on different MPI ranks. ## Scope This PR enables neurodamus to use collective IO for the simulation described above. ## Testing  We successfully ran the reduced MMB simulation, but since SSCX hasn't been converted to SONATA, we can't run that simulation. ## Review * [x] PR description is complete * [x] Coding style (imports, function length, New functions, classes or files) are good * [ ] Unit/Scientific test added * [ ] Updated Readme, in-code, developer documentation --------- Co-authored-by: Luc Grosheintz <luc.grosheintz@gmail.ch>

1uc force-pushed the 1uc/hdf5-reader branch from 6411979 to 2e25415 Compare November 7, 2023 12:33

ferdonline reviewed Nov 7, 2023

View reviewed changes

matz-e reviewed Nov 8, 2023

View reviewed changes

include/bbp/sonata/hdf5_reader.h Outdated Show resolved Hide resolved

1uc force-pushed the 1uc/hdf5-reader branch 11 times, most recently from 968b661 to 92321c7 Compare November 14, 2023 09:19

This was referenced Nov 14, 2023

Refactor edge_index::resolve. #314

Merged

Refactor _readSelection. #315

Merged

1uc force-pushed the 1uc/hdf5-reader branch from 92321c7 to 261a317 Compare November 14, 2023 10:29

1uc force-pushed the 1uc/hdf5-reader branch from 261a317 to 74fdd69 Compare November 22, 2023 14:58

1uc mentioned this pull request Nov 23, 2023

Read synapse parameters in a collective safe manner. BlueBrain/neurodamus#85

Merged

4 tasks

1uc force-pushed the 1uc/hdf5-reader branch from 74fdd69 to 5271f6b Compare November 29, 2023 08:07

1uc commented Nov 29, 2023

View reviewed changes

include/bbp/sonata/hdf5_reader.h Outdated Show resolved Hide resolved

1uc commented Nov 29, 2023

View reviewed changes

src/read_bulk.hpp Outdated Show resolved Hide resolved

1uc force-pushed the 1uc/hdf5-reader branch from 5271f6b to 0e1756b Compare November 29, 2023 08:17

1uc marked this pull request as ready for review November 29, 2023 08:58

mgeplf reviewed Nov 29, 2023

View reviewed changes

1uc force-pushed the 1uc/hdf5-reader branch 4 times, most recently from c573ee4 to 858407a Compare November 30, 2023 08:58

mgeplf mentioned this pull request Nov 30, 2023

Decide if a CanonicalSelection type would reduce pre-condition errors #323

Open

1uc requested a review from mgeplf December 12, 2023 07:14

mgeplf approved these changes Dec 12, 2023

View reviewed changes

1uc added 6 commits December 12, 2023 14:15

Implement Hdf5Reader API and default.

40acbbf

This commit introduces the API for an Hdf5Reader. This reader abstracts the process of opening HDF5 files, and reading an `libsonata.Selection` from a dataset. The default reader calls the existing `_readSelection`.

Simplify after Selection::Range change.

9ef9c54

Remove unused overload.

d2c401b

remove flattenRanges

869bb32

Use #ifdef APPLE.

eb390fe

Fix ABI issue with Hdf5Reader.

487e1c3

1uc force-pushed the 1uc/hdf5-reader branch from 5280fce to 487e1c3 Compare December 12, 2023 13:15

1uc merged commit a764da5 into master Dec 12, 2023

1uc deleted the 1uc/hdf5-reader branch December 12, 2023 13:53

1uc referenced this pull request in BlueBrain/neurodamus Dec 20, 2023

mpi4py workaround

ae1772a

CI_BRANCHES:BLUECONFIGS_BRANCH=weji/libsonata_mpi

Conversation

1uc commented Nov 7, 2023

Uh oh!

ferdonline Nov 7, 2023

Choose a reason for hiding this comment

Uh oh!

1uc Nov 8, 2023

Choose a reason for hiding this comment

Uh oh!

ferdonline commented Nov 7, 2023

Uh oh!

matz-e left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

1uc commented Nov 8, 2023

Uh oh!

1uc commented Nov 14, 2023

Uh oh!

Uh oh!

Uh oh!

1uc commented Nov 29, 2023

Uh oh!

mgeplf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mgeplf commented Nov 30, 2023

Uh oh!

1uc commented Dec 1, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants