geotiff: bound HTTP full-image strip reads (#2051)#2057
Merged
Conversation
`_HTTPSource.read_all()` was unbounded: no Content-Length validation, no streaming cap. The full-image stripped HTTP path in `_fetch_decode_cog_http_strips` only ran `_check_dimensions` against the TIFF header's declared pixel count, then handed the full body to `_read_strips`. A tiny declared raster (100x100, well within `_check_dimensions`) could still be served as a multi-gigabyte body and the whole body was allocated before TIFF parsing rejected anything. Fix (option b from the security report): - `_HTTPSource.read_all()` gains a `max_bytes: int | None = None` parameter. When set, it rejects an oversized `Content-Length` up front via `OSError`, then streams the body (urllib3 `preload_content=False` / stdlib `resp.read` in 64 KiB chunks) and aborts past `max_bytes + 1`. The `+ 1` is the over-shoot probe: a body that exactly matches the cap passes, but a server that omits or lies about Content-Length is caught the moment the first extra byte arrives. - `_request()` gains `preload_content=False` plumbing and releases the connection on the redirect path so streaming responses do not leak pool slots. - `_fetch_decode_cog_http_strips` computes the budget from the strip table (`max(offset + byte_count) + 4 MiB slack` for TIFF header / trailing metadata) and passes it through. - New `_compute_full_image_byte_budget` helper handles the degenerate cases (missing or all-sparse strip table) by falling back to the per-strip safety cap. `max_bytes=None` preserves the legacy unbounded behaviour for the non-HTTP callers (cloud reads are already gated by `max_cloud_bytes`; file reads are bounded by file size). Tests cover: - Content-Length advertised over budget -> OSError before the read. - Server omits Content-Length entirely (chunked) -> streaming cap catches the over-shoot. - Server lies low (advertises small, sends big) -> urllib3 truncates at the advertised length, locked in as the expected behaviour. - Legitimate full-image COG reads still work end-to-end. - Attacker pads a legitimate COG body with 64 MiB of zeros -> read rejected by either the Content-Length pre-flight or the streaming cap. - Budget helper returns the strip-table maximum plus slack, and falls back to the per-strip cap for empty / degenerate tables. Closes #2051.
- Stdlib fallback: add three tests that force ``self._pool = None`` so the urllib.request branch in ``read_all`` is actually exercised. One for the Content-Length pre-flight check, one for the streaming cap on chunked bodies, and one for the legitimate-body pass-through. - Redirect on the streaming path: drain the redirect response body before ``release_conn`` so urllib3 can return the connection to the pool instead of closing it. 3xx bodies are tiny so the drain is cheap. - Doc tweak in ``_check_content_length``: spell out that returning silently on missing / unparseable headers is intentional because the streaming cap is the real defence. - Test docstring fix: the padded-body test rejects before the body is fully buffered, not before it is allocated at all.
#2050 / #2055 dropped the stdlib ``urllib.request`` HTTP fallback and made urllib3 a hard dependency. The three ``test_read_all_stdlib_fallback_*`` cells that exercised the fallback's byte-budget enforcement no longer have a code path to hit; the urllib3-only equivalents above keep the contract covered.
brendancol
added a commit
that referenced
this pull request
May 19, 2026
* geotiff: record security sweep pass 18 (MEDIUM Cat 1 finding) Re-audit of the geotiff subpackage on 2026-05-18 (deep-sweep p1). NEW MEDIUM (Cat 1): the eager read_geotiff_gpu path skips the per-tile byte cap that the CPU paths (_read_tiles, _fetch_decode_cog_http_tiles) enforce via _max_tile_bytes_from_env(). A malformed local TIFF with TileByteCounts pointing into a large file region can pass through GPU decode at sizes between 256 MiB (CPU cap) and ~90% of free VRAM (GPU sum guard). All other security categories verified clean: JPEG bomb cap (#1792), HTTP read_all byte budget (#2057), VRT XML cap, DOCTYPE rejection, path containment, SSRF defenses, dimension caps, IFD entry caps, MAX_IFDS, MAX_PIXEL_ARRAY_COUNT, GPU bounds guards, atomic writes, realpath canonicalization, dtype validation. * geotiff: apply per-tile byte cap on GPU read path (Cat 1 MEDIUM) The CPU readers ``_read_tiles`` and ``_fetch_decode_cog_http_tiles`` reject tiles whose declared ``TileByteCount`` exceeds the env-driven ``_max_tile_bytes_from_env()`` cap (default 256 MiB). The eager GPU read path skipped this check; ``validate_tile_layout`` only bounds the offsets array length, not the byte-count entries. A crafted local TIFF with multi-hundred-MB ``TileByteCount`` values could pass through to GPU decode where ``_check_gpu_memory`` catches only the aggregate sum at ~90% of free VRAM, leaving the per-tile budget asymmetric across the CPU and GPU contracts. Add the same per-tile loop in ``read_geotiff_gpu`` (after ``validate_tile_layout``) so a single oversized tile is rejected before any decode work runs. Mirrors the wording of the existing CPU guards so callers see consistent error messages across backends. Add ``test_gpu_tile_byte_cap_2026_05_18.py`` covering the rejection, the wording (forged value + cap appear in the error), the env override escape hatch, and the legit-read pass-through under the default cap. The tests use the same forged-TIFF helpers as the CPU companion suite ``test_local_tile_byte_cap_1664.py``. * geotiff: extend per-tile byte cap to dask + GPU paths (#2113 review) Self-review caught three issues. Address them. - Extend the per-tile cap to the dask + GPU chunked path. The eager GPU path's cap landed at the right spot but the chunked path (``_read_geotiff_gpu_dask_path`` -> ``_read_geotiff_gpu_chunked_gds`` or fall-through to ``read_geotiff_dask``) parsed IFDs and built a dask graph without applying the cap, so a forged file could slip past at graph-build time. The cap now fires at metadata parse time inside ``_read_geotiff_gpu_dask_path`` before any qualification probe, and is also applied inside ``_read_geotiff_gpu_chunked_gds`` for the case where that function is reached directly. - Hoist the TIFF byte-surgery helper out of the per-test file into ``tests/_tiff_surgery.py``. The CPU companion test and the new GPU test were carrying near-identical copies of the same struct-packing helper. Drift between the two would silently desync the cap tests. - Spell the sparse-tile pass-through behaviour in the eager and chunked GPU comments so a future reader does not add a ``bc > 0`` guard and accidentally reject sparse files. Sparse tiles (``byte_count == 0``) pass under any positive cap by design, mirroring the CPU path in ``_reader.py``. - Tighten ``test_env_override_lifts_cap`` to assert the cap message is *not* in any exception fired, rather than swallowing all errors via a bare ``except``. A regression that re-fires the cap through a different error path now produces a focused failure. - Drop the misleading underscore-prefix loop variables in the eager GPU cap loop; both names are read inside the body. - Cover the chunked + GPU cap with a parallel regression test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2051.
Attack
A crafted GeoTIFF whose header declares a tiny raster (e.g. 100x100, well within
_check_dimensions) can be served over HTTP with a multi-gigabyte body._HTTPSource.read_all()had no Content-Length check and no streaming cap, so the full body was allocated before TIFF parsing got a chance to reject anything. The full-image stripped branch in_fetch_decode_cog_http_strips(window=None) was the entry point.The windowed branch was already safe: it uses per-strip ranged GETs bounded by
XRSPATIAL_COG_MAX_TILE_BYTES.Fix (option b)
I went with the byte-budget cap (option b from the security report) rather than refactoring the full-image path to use ranged GETs (option a). Option a would have required either extracting
_read_strips's post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. Option b is a smaller, more targeted change for the same threat._HTTPSource.read_all()gainsmax_bytes: int | None = None. When set, it rejects an oversizedContent-Lengthup front, then streams the body and aborts pastmax_bytes + 1bytes. The+ 1is the over-shoot probe._request()gainspreload_content=Falseplumbing so the streaming path actually streams instead of buffering intoresp.data. The redirect handler now releases connections on the streaming path so it does not leak pool slots._fetch_decode_cog_http_stripscomputes the budget from the strip table:max(offset + byte_count) + 4 MiB slackfor the TIFF header and trailing metadata._compute_full_image_byte_budgethandles missing / all-sparse strip tables by falling back to the per-strip safety cap.max_bytes=Nonepreserves the legacy unbounded behaviour for non-HTTP callers (cloud reads are already gated bymax_cloud_bytes; file reads are bounded by file size).Test plan
OSErrorbefore the read.10 new tests, all green. Full HTTP regression suite (122 tests) still passes.