At today's Pangeo Forge meeting, @alxmrs told us a bit more about Xarray Beam. I'm happy to note that we are aligning a bit around certain abstractions. I'm not proposing to merge or share code or anything at this stage. Just happy to see that we have independently arrived at some similar concepts. Comparing side by side may also suggest improvements.
Xarray Beam
https://xarray-beam.readthedocs.io/en/latest/data-model.html#keys-in-xarray-beam
To keep track of how individual records could be combined into a larger (virtual) dataset, Xarray-Beam defines a Key object. > Key objects consist of:
- offsets: integer offests for chunks from the origin in an immutabledict
- vars: The subset of variables included in each chunk, either as a frozenset, or as None to indicate “all variables”.
Pangeo Forge
We currently call this an "index," although I agree that key is a better name. Our use of indexes is less well documented as it is not really public API but more of an internal thing. We mention it here: https://pangeo-forge.readthedocs.io/en/latest/recipe_user_guide/file_patterns.html#inspect-a-filepattern
The index is its own special type of object used internally by recipes, a pangeo_forge_recipes.patterns.Index, (which is basically a tuple of one or more pangeo_forge_recipes.patterns.DimIndex objects).
The code is pretty straightforward and the comments explain things better.
https://github.com/pangeo-forge/pangeo-forge-recipes/blob/49997cb52cff466bd394c1348ef23981e782a4d9/pangeo_forge_recipes/patterns.py#L70-L114
These indexes are used both by FilePatterns (see #31) as well as for parallel writing of Zarr datasets.
cc @cisaacstern
At today's Pangeo Forge meeting, @alxmrs told us a bit more about Xarray Beam. I'm happy to note that we are aligning a bit around certain abstractions. I'm not proposing to merge or share code or anything at this stage. Just happy to see that we have independently arrived at some similar concepts. Comparing side by side may also suggest improvements.
Xarray Beam
https://xarray-beam.readthedocs.io/en/latest/data-model.html#keys-in-xarray-beam
Pangeo Forge
We currently call this an "index," although I agree that key is a better name. Our use of indexes is less well documented as it is not really public API but more of an internal thing. We mention it here: https://pangeo-forge.readthedocs.io/en/latest/recipe_user_guide/file_patterns.html#inspect-a-filepattern
The code is pretty straightforward and the comments explain things better.
https://github.com/pangeo-forge/pangeo-forge-recipes/blob/49997cb52cff466bd394c1348ef23981e782a4d9/pangeo_forge_recipes/patterns.py#L70-L114
These indexes are used both by FilePatterns (see #31) as well as for parallel writing of Zarr datasets.
cc @cisaacstern