geotiff: bound eager fsspec reads with max_cloud_bytes#1932
Merged
Conversation
Eager reads from cloud sources used to call _CloudSource.read_all() unconditionally, pulling the entire object into memory before the TIFF header parsed or the max_pixels guard ran. A crafted s3://, gs://, or memory:// object could exhaust memory and bandwidth before any dimension check fired. Add a max_cloud_bytes budget (default 256 MiB, env override XRSPATIAL_GEOTIFF_MAX_CLOUD_BYTES, per-call kwarg on read_to_array and open_geotiff). _CloudSource already knows the object size from fsspec.size() at construction, so the check runs before any bytes are fetched. Pass max_cloud_bytes=None to opt out and restore pre-#1928 behaviour. The HTTP path already reads only what it needs via range requests and is unaffected. Raises a new CloudSizeLimitError (a ValueError subclass) when the cloud object exceeds the budget, with a message that points at the kwarg, the env var, and the dask windowed-read alternative.
Contributor
There was a problem hiding this comment.
Pull request overview
Adds a configurable byte budget for eager fsspec GeoTIFF reads so oversized cloud objects are rejected before full-object download.
Changes:
- Introduces
MAX_CLOUD_BYTES_DEFAULT, env/kwarg resolution, andCloudSizeLimitError. - Plumbs
max_cloud_bytesthroughopen_geotifffor eager reads. - Adds regression tests covering defaults, env overrides, explicit disable, public entry point behavior, and unaffected local/HTTP paths.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_reader.py |
Adds cloud byte-budget resolution and enforcement before _CloudSource.read_all(). |
xrspatial/geotiff/__init__.py |
Adds max_cloud_bytes to open_geotiff and forwards it to eager reads. |
xrspatial/geotiff/tests/test_cloud_read_byte_limit_1928.py |
Adds regression coverage for the new cloud eager-read size guard. |
Comments suppressed due to low confidence (1)
xrspatial/geotiff/tests/test_cloud_read_byte_limit_1928.py:63
- This default-precedence assertion (and the default-path reads later in the file) depends on the ambient
XRSPATIAL_GEOTIFF_MAX_CLOUD_BYTESbeing unset. If a developer or CI job has that variable set to a positive value,_resolve_max_cloud_bytes(_MAX_CLOUD_BYTES_SENTINEL)returns the env value instead ofMAX_CLOUD_BYTES_DEFAULT; clear it withmonkeypatch.delenv(...)before asserting default behavior, as other env-default tests in this suite do.
def test_sentinel_returns_default(self):
assert _resolve_max_cloud_bytes(
_MAX_CLOUD_BYTES_SENTINEL
) == MAX_CLOUD_BYTES_DEFAULT
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+16
to
+18
| import importlib.util | ||
| import os | ||
|
|
Comment on lines
+2999
to
+3001
| f"Pass max_cloud_bytes=None to override, or raise " | ||
| f"the limit via the XRSPATIAL_GEOTIFF_MAX_CLOUD_BYTES " | ||
| f"environment variable.") |
- Drop unused ``importlib.util`` and ``os`` imports from the cloud-read byte-limit test module (flake8 F401). - Tighten the unknown-size CloudSizeLimitError message: raising the byte budget does not unblock a source whose size is unknown, so the only working override is ``max_cloud_bytes=None``. Mention that explicitly.
This was referenced May 15, 2026
brendancol
added a commit
that referenced
this pull request
May 15, 2026
…1958) PR #1932 added the max_cloud_bytes kw-only param to open_geotiff but did not update _CANONICAL_ORDER in test_reader_kwarg_order_1935.py. The signature-parity test has been failing on main on every Python version ever since, blocking CI for every open PR. Insert max_cloud_bytes between max_pixels and on_gpu_failure to match the actual signature. The other readers (read_geotiff_dask, read_geotiff_gpu, read_vrt) don't expose this kwarg yet; the _assert_canonical helper intersects with the canonical tuple, so they keep passing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1928.
Summary
max_cloud_bytesbudget (default 256 MiB) checked against the compressed object size before any bytes are downloaded from an fsspec source (s3://,gs://,az://,abfs://,memory://)._CloudSourcealready pulls the object size viafsspec.size()at construction, so the guard is free.read_to_array(max_cloud_bytes=...)/open_geotiff(max_cloud_bytes=...). Env-wide override viaXRSPATIAL_GEOTIFF_MAX_CLOUD_BYTES. PassNoneto disable (pre-geotiff: eager fsspec cloud read pulls full object before dim guard #1928 behaviour).CloudSizeLimitError(ValueErrorsubclass) names the offending size, the budget, and the workarounds (raise the kwarg, set the env var, switch tochunks=...for a windowed dask read).Test plan
test_cloud_read_byte_limit_1928.pycover precedence (kwarg > env > default), explicitNone, oversized rejection, env-driven rejection, theopen_geotiffentry point, local-path no-op, and HTTP-path no-op.test_features.pystill pass (default budget is 256 MiB; test fixtures are well under).xrspatial/geotiff/testssuite: no new failures (8 pre-existing failures intest_predictor2_big_endian_gpu_1517.pyandtest_size_param_validation_gpu_vrt_1776.pyare stale monkeypatch / staletile_sizevalidation; unrelated to this PR).