-
Notifications
You must be signed in to change notification settings - Fork 36
Description
Hi everyone and thank you for this great library!
Use case
I have a use case in which I would like to pass a list of zarr stores located in S3 to xr.open_mfdataset([ ]..., engine='zarr'). As per my understanding, it is not possible to do this with intake-xarray yet because the ZarrSource class assumes only one path as urlpath. In the beginning of 2021, there were pull requests to address this issue but these were abandoned as xarray was going to provide the logic for the use case. I believe that this has now been added in xarray.
How I addressed the use case
I was able to successfully address my use case by :
- using the NetCDFSource class
- passing to xarray_kwargs : engine (zarr), consolidated and storage_options arguments
- by removing the fsspec logic which is required for files (netcdf, etc.) but is problematic with zarr stores.
intake-xarray/intake_xarray/netcdf.py
Lines 86 to 90 in f1ca02d
if self._can_be_local: url = fsspec.open_local(self.urlpath, **self.storage_options) else: # https://github.com/intake/filesystem_spec/issues/476#issuecomment-732372918 url = fsspec.open(self.urlpath, **self.storage_options).open()
Question
I was wondering if we could rewrite the ZarrSource class similarly to NetCDFSource but with the fsspec logic removed in intake-xarray as it is already taken care of by xarray. This would allow intake-xarray to read one or multiple zarr stores and would address the issue of xr.open_zarr being depreciated. As I'm not familiar with the inner workings of intake, maybe there is a better way of doing this but if this approach sounds interesting, I'm available to submit a PR with the required tests.
Thanks!