Skip to content

storeChunk: independent calls#554

Merged
ax3l merged 3 commits intoopenPMD:devfrom
ax3l:fix-storeChunkCalls
Oct 21, 2019
Merged

storeChunk: independent calls#554
ax3l merged 3 commits intoopenPMD:devfrom
ax3l:fix-storeChunkCalls

Conversation

@ax3l
Copy link
Member

@ax3l ax3l commented Aug 22, 2019

Parallel test: allow to call storeChunk() an independent number of times from various ranks. This currently deadlocks on Series::flush() for both ADIOS1 and HDF5 (ok!).

@C0nsultant @franzpoeschel can you please take a look at this? At least for ADIOS1 this should be a totally fine workflow. I think that is a bug in our library to require a matched number of calls to storeChunk() from all participating ranks.

To Do

  • add more test cases. Currently test the corner-case
    • one calls storeChunk, another does not (fails); but we also need
    • some ranks call storeChunk times, others <M>!=<N> times
    • and the same tests for loadChunk

@ax3l
Copy link
Member Author

ax3l commented Oct 4, 2019

@guj just quickly mentioned the list of collective HDF5 calls again, so if we do not want to re-implement the buffer functionality of ADIOS for HDF5, this backend might just not be as flexible: https://support.hdfgroup.org/HDF5/doc/RM/CollectiveCalls.html

Nevertheless, I seam to remember for ADIOS1 that calls such as adios_write_byid/adios_schedule_read are not collective per-se (only in corner cases, such as transports to PHDF5) and should be callable independently. What we might be doing right now is that we accidentally group them together with collective calls in the backend?

@RemiLehe also mentioned we could try H5FD_MPIO_INDEPENDENT again for HDF5, but from our past experiences (see mention) this does not work well (deadlocks or performance penalties at scale)... (maybe better these days).

@guj
Copy link
Contributor

guj commented Oct 4, 2019

It should work if one calls adios_open/close methods in all ranks, and write var in some ranks.

@RemiLehe
Copy link
Member

RemiLehe commented Oct 4, 2019

@guj Thanks! Is that true for both adios1 and adios2?

@guj
Copy link
Contributor

guj commented Oct 4, 2019 via email

@ax3l ax3l force-pushed the fix-storeChunkCalls branch from 428ed55 to 1853ceb Compare October 5, 2019 00:37
@ax3l
Copy link
Member Author

ax3l commented Oct 15, 2019

I am not sure why this is not deadlocking for BP3/ADIOS1 anymore, but that is good news. Potentially I mismatched it while running and designing the tests.

Update: funny, ADIOS1 BP works for me locally (mpich@3.3.1) but deadlocks in CI (ancient openmpi@1.6.5). Could be still a bug in there or it's just the really old MPI. For old OpenMPI versions... currently testing <= 2.1.6, I can reproduce the deadlock locally as well.

@ax3l
Copy link
Member Author

ax3l commented Oct 16, 2019

So, these are my own checks on my local machine, Ubuntu 18.04.3 LTS, GCC 7.4.0, CMake 3.15.4. ADIOS1 and ADIOS2 build via Spack, PHDF5 taken from system.

MPI backend result
MPICH 3.3.1 ADIOS2 2.5.0 ✔️
MPICH 3.3.1 ADIOS 1.13.1
MPICH 3.3.1 ADIOS 1.13.1 (**)
MPICH 3.3a2 PHDF5 1.10.0.1
MPICH 3.3.1 PHDF5 1.8.13
MPICH 3.3.1 PHDF5 1.8.13 (*) ✔️

(*) with export OPENPMD_HDF5_INDEPENDENT=ON
(**) with ROMIO hints, although ADIOS should not be using MPI-I/O but rather POSIX I/O and MPI communication, such as

romio_cb_write disable
romio_no_indep_rw false

Now ADIOS1, ADIOS2, HDF5 build via Spack for OpenMPI 3.1.4:

MPI backend result
OpenMPI 3.1.4 ADIOS2 2.5.0 ✔️
OpenMPI 3.1.4 ADIOS 1.13.1
OpenMPI 3.1.4 PHDF5 1.10.5
OpenMPI 4.0.2 PHDF5 1.10.5
OpenMPI 4.0.2 PHDF5 1.10.5 (*) ✔️

(*) with export OPENPMD_HDF5_INDEPENDENT=ON

But, when executed on Cori-KNL (NERSC), with the following commands

module swap craype-haswell craype-mic-knl
module swap PrgEnv-intel PrgEnv-gnu  # GCC 8.2.0
module load adios/1.13.1

export CC="$(which cc)"
export CXX="$(which CC)"
export CRAYPE_LINK_TYPE=dynamic

cd $SCRATCH
git clone https://github.com/openPMD/openPMD-api.git
cd openPMD-api
git remote add ax3l https://github.com/ax3l/openPMD-api.git
git checkout ax3l/fix-storeChunkCalls
mkdir build
cd build
cmake .. -DopenPMD_USE_HDF5=OFF -DopenPMD_USE_ADIOS1=ON -DopenPMD_USE_ADIOS2=OFF -DopenPMD_USE_PYTHON=OFF -DMPIEXEC_EXECUTABLE=$(which srun)
make -j 8

salloc --time=1:00:00 -N 1 -C knl -q interactive
CTEST_OUTPUT_ON_FAILURE=1 ctest -V -R ParallelIO
# or manually
srun -n 2 bin/ParallelIOTests

ADIOS1 does not deadlock:

MPI backend result
cray-mpich/7.7.6 ADIOS 1.13.1 ✔️

Legend

❌ : deadlock
✔️ : works

@ax3l ax3l mentioned this pull request Oct 18, 2019
9 tasks
@ax3l
Copy link
Member Author

ax3l commented Oct 18, 2019

For PHDF5, I added a new environment variable, OPENPMD_HDF5_INDEPENDENT=[OFF], that allows to set the transfer to non-collective I/O #576. With that, PHDF5 I/O with non-collective calls to storeChunk worked for me. (We will also add a Series constructor option for this, but #569 still needs some love.)

I am still puzzled why the ADIOS1 implementation deadlocks. Maybe @franzpoeschel can spot it quicker than I do.

.travis.yml Outdated
# run tests
# Independent I/O in PHDF5
# https://github.com/openPMD/openPMD-api/pull/554
- OPENPMD_HDF5_INDEPENDENT=ON
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better if we add this for the specific test as JSON option #569, because otherwise we wildcard a good amount of potential issues in CI, but we do not have this in place yet.

Parallel test: allow to call storeChunk an independent number of
times from various ranks. This currently deadlocks on
`Series::flush()` for both ADIOS1 and HDF5.
@ax3l ax3l force-pushed the fix-storeChunkCalls branch from 1c3a70a to a089c92 Compare October 18, 2019 06:02
Add this via a JSON option to the specific tests where we want
this, otherwise we wildcard to much in our CI.
@ax3l ax3l force-pushed the fix-storeChunkCalls branch from a089c92 to 5f235f8 Compare October 18, 2019 06:03
@ax3l
Copy link
Member Author

ax3l commented Oct 18, 2019

printf debugging on my MPICH 3.3.1 setup locally:

$ mpiexec -prepend-rank -n 2 bin/ParallelIOTests adios_write_test_skip_chunk
[0] adios_init_noxml
[1] adios_init_noxml
[0] adios_read_init_method
[1] adios_read_init_method
[0] adios_declare_group
[0] adios_select_method
[1] adios_declare_group
[1] adios_select_method
[0] adios_define_var
[1] adios_define_var
[1] adios_define_var
[1] adios_define_var[0] adios_define_var
[0] adios_define_var
[1] 
[0] adios_define_var
[0] adios_define_var
[0] adios_define_var
[1] adios_define_var
[1] adios_define_var
[1] adios_define_var[1] 
[1] adios_open
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue[0] 
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue
[0] adios_define_attribute_byvalue[0] 
[0] adios_define_attribute_byvalue
[0] adios_open
[0] WARN : MPI_AMR method: No OST to use. Set num_ost=NNN in the adios config xml file.
[0] adios_close
[1] adios_write
[1] adios_write
[1] adios_write
[1] adios_write
[1] adios_write
[1] adios_write[1] 
[1] adios_close
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue[1] 
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue[1] 
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue[1] 
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_define_attribute_byvalue
[1] adios_open

For some reason, I see 3x adios_open in there.

Looks like a rouge work-around inside ParallelADIOS1IOHandlerImpl::~ParallelADIOS1IOHandlerImpl() is causing the problems.

This block which is also present in the serial backend seams to be the culpit:

for( auto& f : m_filePaths )
    if( m_openWriteFileHandles.find(f.second) == m_openWriteFileHandles.end() )
        m_openWriteFileHandles[f.second] = open_write(f.first);

Note: The m_openWriteFileHandles.find(f.second) == m_openWriteFileHandles.end() comparison has some code smell, it compares a shared pointer of string instead of the strings itself... This does not seam right.
Why is this even a std::unordered_map< std::shared_ptr< std::string >, int64_t > instead of a simple std::unordered_map< std::string, int64_t >?
This .find() construct is used at several places in the ADIOS1 implementation.

@guj
Copy link
Contributor

guj commented Oct 18, 2019

if a Series is constructed, there are two places adios_open() is called.
(1) Series::~Series(). this destructor calls a flush(), and this flush() leads to adios_open() when "while( !handler->m_work.empty() )". is true (ParallelADIOS1IOHandler.cpp: 135).
(2). Attributable::~Attributable(), because it is super class of Series.

In your test, rank 1 called adios_open() in both places, rank 0 has no work, so called at (2) only.
when (1) is true, CommonADIOS1IOHandler.cpp::766 calls open_write()
it is valid because no file has been created

I am wondering why (2) is necessary to call file creation in a destructor. Might make more sense to construct the file in the Series. constructor.

@ax3l
Copy link
Member Author

ax3l commented Oct 19, 2019

Thank you for finding these details!

I think although issuing an early flush() after series open will already work-around this quickly, we should investigate not to delay the adios_open. I think we will generally not benefit from delaying that command in the ADIOS1 backend after Series() creation since that's a point where we are collective anyway.

.flush() and storeChunk() on the other hand should ideally be non-collective/independent operations.

@ax3l
Copy link
Member Author

ax3l commented Oct 20, 2019

I think I might have now spotted the issue in ParallelADIOS1IOHandler.cpp and pushed a fix in the last commit :)

Files for writing should not be closed just to be opened again
immediately after. Fix deadlock in `storeChunk()` when called in
non-collective manner.
@ax3l ax3l force-pushed the fix-storeChunkCalls branch from f6694e1 to 15cc01e Compare October 21, 2019 02:04
@ax3l ax3l merged commit 22f9e95 into openPMD:dev Oct 21, 2019
@ax3l ax3l deleted the fix-storeChunkCalls branch October 21, 2019 15:35
@ax3l ax3l mentioned this pull request Jan 25, 2020
2 tasks
@ax3l ax3l added the MPI label Jan 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants