Workaround for independent writes to Iterations in parallel, better detection of BP5 which in turn uncovers more instances of the first issue#1619
Merged
ax3l merged 12 commits intoopenPMD:devfrom Jun 11, 2024
Conversation
237b394 to
8bbc170
Compare
franzpoeschel
commented
Jun 6, 2024
| int const my_first_step = i_mpi_rank * int(local_Nz); | ||
| int const all_last_step = last_step + (i_mpi_size - 1) * int(local_Nz); | ||
|
|
||
| bool participate_in_barrier = true; |
Contributor
Author
There was a problem hiding this comment.
@ax3l Can you please check if this bug also affects Hipace? Currently, the sequence of barriers and flushes dont match from rank to rank. This was uncovered only now, since flushing is effectively not collective in many situations, but this test now uses adios2::Engine::PerformDataWrite() of BP5 which is a bit stricter there.
Member
There was a problem hiding this comment.
Ok, great catch!
For HiPACE++, we changed the time stepping logic the other year, so that every MPI rank just writes to exactly one iteration. Thus, it cannot have this bug (anymore).
9dd2a14 to
72a465c
Compare
Somehow PerformDataWrite() leads to trouble with this pattern.
This reverts commit 36597bd. No longer needed after rebasing on fix-iteration-flush
It used Series::flush non-collectively
ed25000 to
0a05f10
Compare
ax3l
approved these changes
Jun 11, 2024
franzpoeschel
added a commit
to franzpoeschel/openPMD-api
that referenced
this pull request
Jun 28, 2024
Follow-up to openPMD#1619
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This somewhat fixes #1616 until we add a better solution. With this PR:
seriesFlush()will always flush the containing Iteration if called from within an Iteration (and will ignore missingdirtyannotations).At the same time, I added a better detection for BP5-specific features. Since this means that
adios2::Engine::PerformDataWrite()is used automatically more often, this uncovers further parallel flushing bugs. So, these two items are treated together in this PR.In a follow-up PR later on, as a more breaking change, we would also flush all open iterations in MPI-parallel contexts on
series.flush(), but for this we will first need functionality to reopen iterations after close #1592.TODO: