ParticleSpecies: Read to dask.dataframe#935
Conversation
examples/11_particle_dataframe.py
Outdated
|
|
||
| # example2: momentum histogram | ||
| h, bins = da.histogram(df["momentum_z"], bins=50, range=[-8.0e-23, 8.0e-23]) | ||
| # weights=df["weighting"] |
There was a problem hiding this comment.
Some issue if I pass this argument alone the lines "Series has no attribute chunks" deep inside Dask... Not sure if it refers to our series, though - I think not 😅
Also, let's ask the RAPIDS team tomorrow if we can also generate 2D and ND histograms. That would be tremendously helpful, but I cannot spot such a function in the docs.
Update: opened as dask/dask#7307
There was a problem hiding this comment.
It's a dask.dataframe.core.Series that does not have the chunks...
There was a problem hiding this comment.
Adding .to_dask_array() fixes this
| # TODO: implement available_chunks for constant record components | ||
| # and fall back to a single, big chunk here | ||
| if chunks is None: |
There was a problem hiding this comment.
@franzpoeschel I tried to query available_chunks from a constant BaseRecordComponent and realized this throws a backend error.
Probably the cleanest way for us to handle this would be to check for constant() in the frontend and return the full extend as a single chunk in that case, what do you think?
There was a problem hiding this comment.
Ah good catch, yeah that's probably the best solution.
| # TODO: implement available_chunks for constant record components | ||
| # and fall back to a single, big chunk here | ||
| if chunks is None: |
There was a problem hiding this comment.
Ah good catch, yeah that's probably the best solution.
| for k_r, r in particle_species.items(): | ||
| for k_rc, rc in r.items(): | ||
| if not rc.constant: | ||
| chunks = rc.available_chunks() |
There was a problem hiding this comment.
This assumes that chunks are equal across components. What happens if they're not? Will things just be less efficient or will things not work? In the latter case, we should probably guard for this case and throw an error.
There was a problem hiding this comment.
Yep, will just be less efficient.
(Also very unlikely.)
| "implemented, use pandas dataframes.") | ||
|
|
||
| def read_chunk(species, chunk): | ||
| stride = np.s_[chunk.offset[0]:chunk.extent[0]] |
There was a problem hiding this comment.
Similarly, this assumes that we are dealing with particle data (and hence 1D). Is this checked? We do have a Python class <openPMD.ParticleSpecies>, so this could theoretically be guarded against.
(Or are those lines enough checking?):
ParticleSpecies.to_df = particles_to_dataframe # noqa
ParticleSpecies.to_dask = particles_to_daskdataframe # noqa
There was a problem hiding this comment.
Good idea to check the chunk to be 1D, yep
Since this is implemented as species and ParticleSpecies, there should be little chance to accidentally pass in a field. Particle arrays are always 1D.
There was a problem hiding this comment.
Would suggest moving this to a module level function instead of a closure. While using closures should work, there's extra overhead pickling closures vs. module level functions
There was a problem hiding this comment.
Thank you for the review & continued guidance!
Fixed in #951
|
This pull request introduces 1 alert when merging 8c5c367 into 6e3b8b2 - view on LGTM.com new alerts:
|
| # example1: average momentum in z | ||
| print("<momentum_z>={}".format(df["momentum_z"].mean().compute())) | ||
|
|
||
| # example2: momentum histogram |
There was a problem hiding this comment.
2D histograms in dask for a phase space example:
Add a method that reads a particle species into a `dask.dataframe`. Feel the power 🔥 Co-authored-by: Dmitry Ganyushin <ganyushin@gmail.com>
|
This pull request introduces 1 alert when merging 62a5df4 into 24058e0 - view on LGTM.com new alerts:
|
ee44ed5 to
9b9b567
Compare
If all records are constant, use one large chunk.
|
This pull request introduces 1 alert when merging 0f5090a into 24058e0 - view on LGTM.com new alerts:
|
|
I think I found what we need for meshes: https://docs.dask.org/en/latest/array-api.html?highlight=from_array#other-functions Not entirely sure yet how to tell it that it needs to call our |
Add a method that reads a particle species into a
dask.dataframe.Feel the power 🔥
Cheers to @dmitry-ganyushin for helping with this!