Skip to content

ParticleSpecies & RecordComponent Serialize#963

Merged
ax3l merged 5 commits intoopenPMD:devfrom
ax3l:topic-serialize
Jul 3, 2021
Merged

ParticleSpecies & RecordComponent Serialize#963
ax3l merged 5 commits intoopenPMD:devfrom
ax3l:topic-serialize

Conversation

@ax3l
Copy link
Member

@ax3l ax3l commented Apr 9, 2021

Preparation to derive information necessary to construct a new series that can access the an equivalent ParticleSpecies or RecordComponent (see #952).

@ax3l ax3l requested a review from franzpoeschel April 9, 2021 03:08
@ax3l ax3l force-pushed the topic-serialize branch from 3359bed to 751f6a8 Compare April 9, 2021 03:12
@ax3l ax3l force-pushed the topic-serialize branch from 751f6a8 to 792d0dd Compare April 23, 2021 06:49
@franzpoeschel
Copy link
Contributor

I have implemented some of the suggestions above on a branch building on this one https://github.com/franzpoeschel/openPMD-api/tree/topic-serialize-suggestion

@ax3l
Copy link
Member Author

ax3l commented Jun 30, 2021

Thank you for the suggestion, I like the openPMDGroup touch, that makes it way more general :)
I didn't realize we have a myPath member on Attributable already, hah!

@ax3l ax3l force-pushed the topic-serialize branch from eaed32b to 65e6260 Compare June 30, 2021 01:22
ax3l and others added 4 commits July 2, 2021 15:35
Preparation to derive information necessary to construct a
new series that can access the same RecordComponent.
- pickle Record Components
- it's aliiive
@ax3l ax3l force-pushed the topic-serialize branch 4 times, most recently from 7a1fdda to 8a28920 Compare July 3, 2021 01:18
@ax3l ax3l changed the title [WIP] ParticleSpecies & RecordComponent Serialize ParticleSpecies & RecordComponent Serialize Jul 3, 2021
// This is a big hack for now, but it works for our use
// case, which is spinning up remote serial read series
// for DASK.
static auto series = openPMD::Series(
Copy link
Member Author

@ax3l ax3l Jul 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how much of a hack this static member is to fix up lifetimes, but it seems to work pretty well :)
We only have little control when it gets destroyed, which is likely as late as process end/module unload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when a user wants to unpickle two different Series within the same application? The static auto series will not be constructed a second time and then be stuck with the old file path.
Do you do this for avoiding the Series from going out of scope?
Would something like return seriesAccessor( std::move( series ), group) help? This would ensure that the seriesAccessor takes no reference.

Copy link
Contributor

@franzpoeschel franzpoeschel Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I see the problem.

    add_pickle(
        cl,
        [](openPMD::Series & series, std::vector< std::string > const & group ) {
            uint64_t const n_it = std::stoull(group.at(1));
            return series.iterations[n_it].meshes[group.at(3)][group.at(4)];
        }
    )

This lets the Series go out of scope and returns only the record component..

I think that this is a fundamental design issue with our Python bindings: Python has garbage collection, but we don't use it. Realistically, calling del series should not be sufficient in Python to destroy the Series, since the series should stay active as long as there is an active dependent member like a contained RecordComponent.
So, moving away from a C++-centered memory model for our Python API should probably do the trick.

Three quick and dirty fixes, each quicker and dirtier than the last:

  1. Is it somehow possible in PyBind to quickly tell the garbage collection that in the above code the series and the returned MeshRecordComponent depend on each other? This should at least implement the above solution locally.
  2. Make a subclass PickledMeshRecordComponent that privately stores the series.
  3. Just do:
    static openPMD::Series series;
    series = openPMD::Series(...); // this is executed every time
    (This solution will only allow one deserialized object to be used at a time though)

Copy link
Member Author

@ax3l ax3l Jul 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fundamental design issue with our Python bindings: Python has garbage collection, but we don't use it.

Offline discussed: What we mean is we don't use the possibility to tell the garbage collector dependencies between our object hierarchy yet (in both directions) yet.

Is it somehow possible in PyBind to quickly tell the garbage collection that in the above code the series and the returned MeshRecordComponent depend on each other? This should at least implement the above solution locally.

Gotta investigate that, would be a nice and clean solution 👍

Currently not tested and likely to fail in this situation:

  • unpickle of multiple objects of the same class (I test different classes to be pickled/unpickled)

good follow-up to at least have the last object (of the same kind) always be in usable state.

@ax3l ax3l force-pushed the topic-serialize branch from 8a28920 to a696367 Compare July 3, 2021 05:12
@ax3l
Copy link
Member Author

ax3l commented Jul 3, 2021

And lift-off 🚀

openpmd_dask.mp4

DistributedDaskOpenPMD.zip

@ax3l ax3l merged commit 2fd519a into openPMD:dev Jul 3, 2021
@ax3l ax3l deleted the topic-serialize branch July 3, 2021 06:16
@ax3l ax3l mentioned this pull request Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants