ParticleSpecies & RecordComponent Serialize#963
Conversation
|
I have implemented some of the suggestions above on a branch building on this one https://github.com/franzpoeschel/openPMD-api/tree/topic-serialize-suggestion |
|
Thank you for the suggestion, I like the |
Preparation to derive information necessary to construct a new series that can access the same RecordComponent.
- pickle Record Components - it's aliiive
7a1fdda to
8a28920
Compare
| // This is a big hack for now, but it works for our use | ||
| // case, which is spinning up remote serial read series | ||
| // for DASK. | ||
| static auto series = openPMD::Series( |
There was a problem hiding this comment.
Not sure how much of a hack this static member is to fix up lifetimes, but it seems to work pretty well :)
We only have little control when it gets destroyed, which is likely as late as process end/module unload.
There was a problem hiding this comment.
What happens when a user wants to unpickle two different Series within the same application? The static auto series will not be constructed a second time and then be stuck with the old file path.
Do you do this for avoiding the Series from going out of scope?
Would something like return seriesAccessor( std::move( series ), group) help? This would ensure that the seriesAccessor takes no reference.
There was a problem hiding this comment.
Ah, I think I see the problem.
add_pickle(
cl,
[](openPMD::Series & series, std::vector< std::string > const & group ) {
uint64_t const n_it = std::stoull(group.at(1));
return series.iterations[n_it].meshes[group.at(3)][group.at(4)];
}
)This lets the Series go out of scope and returns only the record component..
I think that this is a fundamental design issue with our Python bindings: Python has garbage collection, but we don't use it. Realistically, calling del series should not be sufficient in Python to destroy the Series, since the series should stay active as long as there is an active dependent member like a contained RecordComponent.
So, moving away from a C++-centered memory model for our Python API should probably do the trick.
Three quick and dirty fixes, each quicker and dirtier than the last:
- Is it somehow possible in PyBind to quickly tell the garbage collection that in the above code the
seriesand the returnedMeshRecordComponentdepend on each other? This should at least implement the above solution locally. - Make a subclass
PickledMeshRecordComponentthat privately stores theseries. - Just do:
(This solution will only allow one deserialized object to be used at a time though)
static openPMD::Series series; series = openPMD::Series(...); // this is executed every time
There was a problem hiding this comment.
fundamental design issue with our Python bindings: Python has garbage collection, but we don't use it.
Offline discussed: What we mean is we don't use the possibility to tell the garbage collector dependencies between our object hierarchy yet (in both directions) yet.
Is it somehow possible in PyBind to quickly tell the garbage collection that in the above code the series and the returned MeshRecordComponent depend on each other? This should at least implement the above solution locally.
Gotta investigate that, would be a nice and clean solution 👍
Currently not tested and likely to fail in this situation:
- unpickle of multiple objects of the same class (I test different classes to be pickled/unpickled)
good follow-up to at least have the last object (of the same kind) always be in usable state.
|
And lift-off 🚀 openpmd_dask.mp4 |
Preparation to derive information necessary to construct a new series that can access the an equivalent
ParticleSpeciesorRecordComponent(see #952).