-
-
Notifications
You must be signed in to change notification settings - Fork 371
add v3 store classes (zarr v3 support part 1) #874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Define the StoreV3 class and create v3 versions of most existing stores Add a test_storage_v3.py with test classes inheriting from their v2 counterparts. Only a subset of methods involving differences in v3 behavior were overridden.
Codecov Report
@@ Coverage Diff @@
## master #874 +/- ##
==========================================
- Coverage 99.94% 98.02% -1.93%
==========================================
Files 32 33 +1
Lines 11216 12359 +1143
==========================================
+ Hits 11210 12115 +905
- Misses 6 244 +238
|
|
@grlee77 : while you hammer out the actions fun, is there anything in particular you could use feedback on? |
I will try to summarize a bit here later today |
|
Probably my biggest question regarding this PR is how to handle extended data types. I have not done this in a way consistent with the spec yet. Specifically there is a description of how The
The unicode and byte array cases are currently not covered in our current Zarr test suite, but they are used in the Xarray test suite downstream. Does each of these need to have its own separate extension URI that we can point to in the I don't think we should be introducing all of the |
|
Actually, I see now that there are indeed already (draft) extensions for two of the above here: Let me update this PR to use |
Sounds good. I do think those URIs are fairly key to the extension mechanism, so before we get too far we should have the most core ones well-defined as good examples. cc: @alimanfoo |
c24305c to
9950a4d
Compare
I have not worked on any new URI text yet, but I did update The newly introduced |
|
Also, are any of these cases where we should be specifying a fallback?. An array of complex128 can be read as a double-length array of type float64 for instance with real and imaginary values interleaved? |
zarr_version should not be in the array metadata, only the base store metadata compressor should be absent when there is no compression
classmethods adapted from zarrita code
joshmoore
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so a dict seemed an easy JSON-compatible way to store that.
nods
should bytestring and unicode be combined into one?
I don't know the use case(s) for bytestring. Have you seen anything mentioned?
are any of these cases where we should be specifying a fallback?. An array of complex128 can be read as a double-length array of type float64 for instance with real and imaginary values interleaved?
Is the fallback the extent of "extension subclassing"? If so, it would be good to have examples. So far it's always been time->long or similar. The complex -> 2 floats would certainly test the concept further.
| from typing import cast, Union, Any, List, Mapping as MappingType, Optional | ||
|
|
||
| ZARR_FORMAT = 2 | ||
| ZARR_FORMAT_v3 = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is realistic, but I could see also adding ZARR_FORMAT_V2 such that ZARR_FORMAT=ZARR_FORMAT_V2 defines the current default with the hope of swapping that at some (well-defined?) point in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be fine to add that. not sure how this is used
currently ZARR_FORMAT_v3 is only used once to set the ZARR_FORMAT attribute in the Metadata3 class so we could just remove it if there is not another use for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the other ZARR_FORMAT is used mostly in test cases, but also when setting metadata in n5.py
|
|
||
| @staticmethod | ||
| def _valid_key(key: str) -> bool: | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ docs!
|
|
||
| # FLOAT_FILLS = {"NaN": np.nan, "Infinity": np.PINF, "-Infinity": np.NINF} | ||
|
|
||
| _default_entry_point_metadata_v3 = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a feeling how much this is going to grow? i.e. if we should plan for this to have its own module?
IIRC there is some test case in Xarray that currently tests zarr v2 with both unicode and bytestring arrays, but we don't currently have tests for either of those in this repository! |
|
The usual case for bytestring is to hold something like an image. |
avoids these tests running a second time when this file is called
|
closing, these commits along with additional fixes and tests are incorporated in #898 |
This PR contains changes to
storage.pyandmeta.pyfor v3 support. I split this out from the full v3 branch and will have follow up PRs adding v3 support to theArrayandGroupclasses as well as the higher level routines.This PR defines StoreV3 versions of most existing Store types (no N5Store or ABSStore, currently). This will need some rounds of revision/feedback and there are a number of "TODO" comments in code where I was unsure about desired behavior of some aspect.
For the test cases, I tried to avoid a lot of the duplication by subclassing the v2 test classes, just overriding methods where it was necessary to make changes. These v3 classes are currently in a separate file, but could instead just be appended to the existing file if that is preferred. Another possible approach is to try keeping just the existing classes, but add parameterization based on the Zarr version.
For test coverage, there are a handful of functions like
rmdirthat supportStoreLikeinputs that only have test coverage via higher-level functions which are not yet included in this PR, so there will currently be some reduction in test coverage unless that code is removed or we add test cases.[Description of PR]
TODO: