geotiff: bound HTTP full-image strip reads (#2051) by brendancol · Pull Request #2057 · xarray-contrib/xarray-spatial

brendancol · 2026-05-18T18:26:04Z

Closes #2051.

Attack

A crafted GeoTIFF whose header declares a tiny raster (e.g. 100x100, well within _check_dimensions) can be served over HTTP with a multi-gigabyte body. _HTTPSource.read_all() had no Content-Length check and no streaming cap, so the full body was allocated before TIFF parsing got a chance to reject anything. The full-image stripped branch in _fetch_decode_cog_http_strips (window=None) was the entry point.

The windowed branch was already safe: it uses per-strip ranged GETs bounded by XRSPATIAL_COG_MAX_TILE_BYTES.

Fix (option b)

I went with the byte-budget cap (option b from the security report) rather than refactoring the full-image path to use ranged GETs (option a). Option a would have required either extracting _read_strips's post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. Option b is a smaller, more targeted change for the same threat.

_HTTPSource.read_all() gains max_bytes: int | None = None. When set, it rejects an oversized Content-Length up front, then streams the body and aborts past max_bytes + 1 bytes. The + 1 is the over-shoot probe.
_request() gains preload_content=False plumbing so the streaming path actually streams instead of buffering into resp.data. The redirect handler now releases connections on the streaming path so it does not leak pool slots.
_fetch_decode_cog_http_strips computes the budget from the strip table: max(offset + byte_count) + 4 MiB slack for the TIFF header and trailing metadata.
_compute_full_image_byte_budget handles missing / all-sparse strip tables by falling back to the per-strip safety cap.

max_bytes=None preserves the legacy unbounded behaviour for non-HTTP callers (cloud reads are already gated by max_cloud_bytes; file reads are bounded by file size).

Test plan

Content-Length advertised over budget raises OSError before the read.
Content-Length omitted entirely (chunked transfer) -> streaming cap catches the over-shoot.
Server lies low (advertises small, sends big) -> urllib3 truncates at the advertised length; behaviour locked in.
Legitimate full-image COG reads still pass end-to-end.
A legitimate COG body padded with 64 MiB of zeros is rejected.
Budget helper unit tests: strip-table max + slack, fallback for empty / all-sparse tables.

10 new tests, all green. Full HTTP regression suite (122 tests) still passes.

`_HTTPSource.read_all()` was unbounded: no Content-Length validation, no streaming cap. The full-image stripped HTTP path in `_fetch_decode_cog_http_strips` only ran `_check_dimensions` against the TIFF header's declared pixel count, then handed the full body to `_read_strips`. A tiny declared raster (100x100, well within `_check_dimensions`) could still be served as a multi-gigabyte body and the whole body was allocated before TIFF parsing rejected anything. Fix (option b from the security report): - `_HTTPSource.read_all()` gains a `max_bytes: int | None = None` parameter. When set, it rejects an oversized `Content-Length` up front via `OSError`, then streams the body (urllib3 `preload_content=False` / stdlib `resp.read` in 64 KiB chunks) and aborts past `max_bytes + 1`. The `+ 1` is the over-shoot probe: a body that exactly matches the cap passes, but a server that omits or lies about Content-Length is caught the moment the first extra byte arrives. - `_request()` gains `preload_content=False` plumbing and releases the connection on the redirect path so streaming responses do not leak pool slots. - `_fetch_decode_cog_http_strips` computes the budget from the strip table (`max(offset + byte_count) + 4 MiB slack` for TIFF header / trailing metadata) and passes it through. - New `_compute_full_image_byte_budget` helper handles the degenerate cases (missing or all-sparse strip table) by falling back to the per-strip safety cap. `max_bytes=None` preserves the legacy unbounded behaviour for the non-HTTP callers (cloud reads are already gated by `max_cloud_bytes`; file reads are bounded by file size). Tests cover: - Content-Length advertised over budget -> OSError before the read. - Server omits Content-Length entirely (chunked) -> streaming cap catches the over-shoot. - Server lies low (advertises small, sends big) -> urllib3 truncates at the advertised length, locked in as the expected behaviour. - Legitimate full-image COG reads still work end-to-end. - Attacker pads a legitimate COG body with 64 MiB of zeros -> read rejected by either the Content-Length pre-flight or the streaming cap. - Budget helper returns the strip-table maximum plus slack, and falls back to the per-strip cap for empty / degenerate tables. Closes #2051.

- Stdlib fallback: add three tests that force ``self._pool = None`` so the urllib.request branch in ``read_all`` is actually exercised. One for the Content-Length pre-flight check, one for the streaming cap on chunked bodies, and one for the legitimate-body pass-through. - Redirect on the streaming path: drain the redirect response body before ``release_conn`` so urllib3 can return the connection to the pool instead of closing it. 3xx bodies are tiny so the drain is cheap. - Doc tweak in ``_check_content_length``: spell out that returning silently on missing / unparseable headers is intentional because the streaming cap is the real defence. - Test docstring fix: the padded-body test rejects before the body is fully buffered, not before it is allocated at all.

#2050 / #2055 dropped the stdlib ``urllib.request`` HTTP fallback and made urllib3 a hard dependency. The three ``test_read_all_stdlib_fallback_*`` cells that exercised the fallback's byte-budget enforcement no longer have a code path to hit; the urllib3-only equivalents above keep the contract covered.

* geotiff: record security sweep pass 18 (MEDIUM Cat 1 finding) Re-audit of the geotiff subpackage on 2026-05-18 (deep-sweep p1). NEW MEDIUM (Cat 1): the eager read_geotiff_gpu path skips the per-tile byte cap that the CPU paths (_read_tiles, _fetch_decode_cog_http_tiles) enforce via _max_tile_bytes_from_env(). A malformed local TIFF with TileByteCounts pointing into a large file region can pass through GPU decode at sizes between 256 MiB (CPU cap) and ~90% of free VRAM (GPU sum guard). All other security categories verified clean: JPEG bomb cap (#1792), HTTP read_all byte budget (#2057), VRT XML cap, DOCTYPE rejection, path containment, SSRF defenses, dimension caps, IFD entry caps, MAX_IFDS, MAX_PIXEL_ARRAY_COUNT, GPU bounds guards, atomic writes, realpath canonicalization, dtype validation. * geotiff: apply per-tile byte cap on GPU read path (Cat 1 MEDIUM) The CPU readers ``_read_tiles`` and ``_fetch_decode_cog_http_tiles`` reject tiles whose declared ``TileByteCount`` exceeds the env-driven ``_max_tile_bytes_from_env()`` cap (default 256 MiB). The eager GPU read path skipped this check; ``validate_tile_layout`` only bounds the offsets array length, not the byte-count entries. A crafted local TIFF with multi-hundred-MB ``TileByteCount`` values could pass through to GPU decode where ``_check_gpu_memory`` catches only the aggregate sum at ~90% of free VRAM, leaving the per-tile budget asymmetric across the CPU and GPU contracts. Add the same per-tile loop in ``read_geotiff_gpu`` (after ``validate_tile_layout``) so a single oversized tile is rejected before any decode work runs. Mirrors the wording of the existing CPU guards so callers see consistent error messages across backends. Add ``test_gpu_tile_byte_cap_2026_05_18.py`` covering the rejection, the wording (forged value + cap appear in the error), the env override escape hatch, and the legit-read pass-through under the default cap. The tests use the same forged-TIFF helpers as the CPU companion suite ``test_local_tile_byte_cap_1664.py``. * geotiff: extend per-tile byte cap to dask + GPU paths (#2113 review) Self-review caught three issues. Address them. - Extend the per-tile cap to the dask + GPU chunked path. The eager GPU path's cap landed at the right spot but the chunked path (``_read_geotiff_gpu_dask_path`` -> ``_read_geotiff_gpu_chunked_gds`` or fall-through to ``read_geotiff_dask``) parsed IFDs and built a dask graph without applying the cap, so a forged file could slip past at graph-build time. The cap now fires at metadata parse time inside ``_read_geotiff_gpu_dask_path`` before any qualification probe, and is also applied inside ``_read_geotiff_gpu_chunked_gds`` for the case where that function is reached directly. - Hoist the TIFF byte-surgery helper out of the per-test file into ``tests/_tiff_surgery.py``. The CPU companion test and the new GPU test were carrying near-identical copies of the same struct-packing helper. Drift between the two would silently desync the cap tests. - Spell the sparse-tile pass-through behaviour in the eager and chunked GPU comments so a future reader does not add a ``bc > 0`` guard and accidentally reject sparse files. Sparse tiles (``byte_count == 0``) pass under any positive cap by design, mirroring the CPU path in ``_reader.py``. - Tighten ``test_env_override_lifts_cap`` to assert the cap message is *not* in any exception fired, rather than swallowing all errors via a bare ``except``. A regression that re-fires the cap through a different error path now produces a focused failure. - Drop the misleading underscore-prefix loop variables in the eager GPU cap loop; both names are read inside the body. - Cover the chunked + GPU cap with a parallel regression test.

github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026

brendancol added 3 commits May 18, 2026 13:33

brendancol force-pushed the issue-2051 branch from 34afad4 to 8cd5172 Compare May 18, 2026 20:34

brendancol merged commit 2e625bc into main May 18, 2026
4 of 5 checks passed

brendancol mentioned this pull request May 18, 2026

geotiff: first-class backend parity contract with a golden test corpus #1930

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geotiff: bound HTTP full-image strip reads (#2051)#2057

geotiff: bound HTTP full-image strip reads (#2051)#2057
brendancol merged 3 commits into
mainfrom
issue-2051

brendancol commented May 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

brendancol commented May 18, 2026

Attack

Fix (option b)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant