geotiff: support fsspec URIs in read_geotiff_dask (#1749)#1755
Conversation
…1749) read_geotiff_dask failed on s3://, gs://, az://, memory:// and other fsspec URIs because the metadata-only step (_read_geo_info) used a plain open(source, 'rb') call. The eager path already handled cloud URIs via _read_to_array + _CloudSource, so the dask graph's per-chunk pixel reads were already cloud-aware; only the upfront metadata read broke. Detect fsspec URIs in _read_geo_info and pull the file bytes via _CloudSource.read_all(). Local-path mmap fast path is unchanged. HTTP path is unchanged and continues to use _parse_cog_http_meta. Closes #1749.
There was a problem hiding this comment.
Pull request overview
This PR fixes open_geotiff(..., chunks=...) / read_geotiff_dask failures for fsspec-backed URIs (e.g., s3://, gs://, az://, memory://) by making the upfront metadata read (_read_geo_info) cloud-aware, aligning dask reads with the existing eager read path.
Changes:
- Route fsspec/cloud URIs in
_read_geo_infothrough_CloudSourceinstead ofopen(..., 'rb'). - Add a regression test that reads a GeoTIFF from
memory://using dask chunks and validates results against eager reads and the original array.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Adds an fsspec URI branch in _read_geo_info to avoid local open() for cloud/memory schemes. |
xrspatial/geotiff/tests/test_features.py |
Adds a regression test covering dask reads from an fsspec memory:// URI. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| elif isinstance(source, str) and _is_fsspec_uri(source): | ||
| # fsspec URI (s3://, gs://, az://, memory://, ...): pull the | ||
| # whole file via _CloudSource for metadata parsing. Per-chunk | ||
| # pixel reads in the dask graph go through _read_to_array | ||
| # which opens its own _CloudSource, so this fetch is metadata-only. |
The prior commit added an fsspec branch in _read_geo_info that called _CloudSource.read_all() to parse metadata. For a large COG on S3 that pulls the whole object into memory just to learn its shape/transform, which defeats the O(1) memory intent of _read_geo_info. Route fsspec URIs through _parse_cog_http_meta (same path as HTTP COGs). It only requires a read_range-having source, which _CloudSource satisfies, and grows a bounded buffer (capped at MAX_HTTP_HEADER_BYTES) until the IFD chain resolves. Applies to both _read_geo_info itself (used by the .raster accessor) and the read_geotiff_dask metadata prefetch. Per-chunk tile reads in the dask graph now also dispatch through _fetch_decode_cog_http_tiles with a _CloudSource instead of falling back to read_to_array's read_all path. Adds read_ranges and read_ranges_coalesced methods to _CloudSource so the tiled COG decode path can drive a cloud source the same way it drives an HTTP source. Extends the regression test to assert _CloudSource.read_all is not called during dask graph construction or chunk materialisation. Addresses PR #1755 review comment.
|
Fixed in 16b76ea. fsspec URIs now go through
Test |
Summary
read_geotiff_daskfailed ons3://,gs://,az://,memory://and other fsspec URIs withFileNotFoundError. The eager path already handled cloud URIs via_read_to_array+_CloudSource, but the dask path's metadata-only step (_read_geo_infoinxrspatial/geotiff/__init__.py) used a plainopen(source, 'rb')call._read_to_arraywhich dispatches to_CloudSource), so only the upfront metadata read needed fixing._read_geo_infothrough_CloudSource.read_all(). The local-pathmmapfast path is unchanged. The HTTP path (which uses_parse_cog_http_meta) is unchanged.Closes #1749.
Test plan
test_dask_path_fsspec_uri_1749inTestCloudStoragewrites a small TIFF, copies it into the fsspecmemory://filesystem, and verifiesopen_geotiff('memory:///...', chunks=4)returns a dask-backed DataArray whose values match both the eager read and the original array.TestCloudStorageclass still passes (6/6).xrspatial/geotiff/tests/suite shows the same pre-existing failures asmain(GPU/matplotlib unrelated to this change).