Skip to content

Switch default for Zarr reading/writing to consolidated=True? #5251

@shoyer

Description

@shoyer

Consolidated metadata was a new feature in Zarr v2.3, which was released over two year ago (March 22, 2019).

Since then, I have used consolidated=True every time I've written or opened a Zarr store. As far as I can tell, this is almost always a good idea:

  • With local storage, it usually doesn't really matter. You spend a bit of time writing the consolidated metadata and have one extra file on disk, but the overhead is typically negligible.
  • With Cloud object stores or network filesystems, it can matter quite a large amount. Without consolidated metadata, these systems can be unusably slow for opening datasets. Cloud storage is of course the main use-case for Zarr. If you're using a local disk, you might as well stick with single files such as netCDF.

I wonder if consolidated metadata is mature enough now that we could consider switching the default behavior in Xarray. From my perspective, this is a big "gotcha" for getting good performance with Zarr. More than one of my colleagues has been unimpressed with the performance of Zarr until they learned to set consolidated=True.

I would suggest doing this in way is almost entirely backwards compatible, with only a minor performance costs for reading non-consolidated datasets:

  • to_zarr() switches the default to consolidated=True. The consolidate_metadata() will thus happen by default.
  • open_zarr() switches the default to consolidated=None, which means "Try reading consolidated metadata, and fall-back to non-consolidated if that fails." This will be slightly slower for non-consolidated metadata due to the extra file-lookup, but given that opening with non-consolidated metadata already requires a moderately large number of file look-ups, I doubt anyone will notice the difference.

CC @rabernat

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions