Skip to content

Reading a list of zarr stores with intake-xarray #110

@sebastienlanglois

Description

@sebastienlanglois

Hi everyone and thank you for this great library!

Use case

I have a use case in which I would like to pass a list of zarr stores located in S3 to xr.open_mfdataset([ ]..., engine='zarr'). As per my understanding, it is not possible to do this with intake-xarray yet because the ZarrSource class assumes only one path as urlpath. In the beginning of 2021, there were pull requests to address this issue but these were abandoned as xarray was going to provide the logic for the use case. I believe that this has now been added in xarray.

How I addressed the use case

I was able to successfully address my use case by :

  • using the NetCDFSource class
  • passing to xarray_kwargs : engine (zarr), consolidated and storage_options arguments
  • by removing the fsspec logic which is required for files (netcdf, etc.) but is problematic with zarr stores.
    if self._can_be_local:
    url = fsspec.open_local(self.urlpath, **self.storage_options)
    else:
    # https://github.com/intake/filesystem_spec/issues/476#issuecomment-732372918
    url = fsspec.open(self.urlpath, **self.storage_options).open()

Question

I was wondering if we could rewrite the ZarrSource class similarly to NetCDFSource but with the fsspec logic removed in intake-xarray as it is already taken care of by xarray. This would allow intake-xarray to read one or multiple zarr stores and would address the issue of xr.open_zarr being depreciated. As I'm not familiar with the inner workings of intake, maybe there is a better way of doing this but if this approach sounds interesting, I'm available to submit a PR with the required tests.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions