Skip to content

Make the default zarr_format infer when reading files? #2175

@TomAugspurger

Description

@TomAugspurger

Zarr version

v3

Numcodecs version

n/a

Python Version

n/a

Operating System

n/a

Installation

n/a

Description

While working on consolidated metadata, I got myself confused when I did

store = zarr.store.LocalStore(root="../zarr-v2/air-v2.zarr")
zarr.open_group(store=store)
...
File ~/gh/zarr-developers/zarr-python/src/zarr/abc/store.py:74, in Store._check_writable(self)
     72 def _check_writable(self) -> None:
     73     if self.mode.readonly:
---> 74         raise ValueError("store mode does not support writing")

ValueError: store mode does not support writing

I didn't include zarr_format=2 in the call to open_group, so we defaulted to V3. When we failed to find the V3 zarr.json in the store, we fell back to trying to create the group, hence the "store mode does not support writing."

How can we make this nicer? The tricky part is that inferring the Zarr version requires I/O to check for the presence of a .zgroup or .zarray file in the store.

I think my preference would be to temporarily change the default zarr_format in AsyncGroup.open and maybe AsyncArray.open to be "infer". We'll continue to try and load the V3 metadata first. If that fails we'll fall back to trying to parse the V2 metadata.

If we fail to load V2 metadata, we'd continue to fall back to the current behavior of switching over to create mode.

If we do successfully load V2 metadata, we'll use that (as if the user had passed zarr_format=2. I think we might want to also emit a warning that we'll require the user to explicitly provide zarr_format=2 in the future, assuming we don't want to pay the cost of an extra store lookup when we fail to load V3 metadata long term.

To disable inference (and save store lookup you might know will fail) you can pass zarr_format=3 or zarr_format=2.

As an alternative to that "try v3, fallback to v2" approach, we could concurrently try to load V3 and V2 metadata. This would be faster for users relying on inference, but would be slightly slower for the someday common(?) case of trying to load V3 metadata (and logs will get polluted with 404s / key errors when we failed to load the v2 metadata).

Steps to reproduce

zarr.open_group(store=store) on a store with Zarr V2 metadata.

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions