Skip to content

geotiff: bound HTTP full-image strip reads (#2051)#2057

Merged
brendancol merged 3 commits into
mainfrom
issue-2051
May 18, 2026
Merged

geotiff: bound HTTP full-image strip reads (#2051)#2057
brendancol merged 3 commits into
mainfrom
issue-2051

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2051.

Attack

A crafted GeoTIFF whose header declares a tiny raster (e.g. 100x100, well within _check_dimensions) can be served over HTTP with a multi-gigabyte body. _HTTPSource.read_all() had no Content-Length check and no streaming cap, so the full body was allocated before TIFF parsing got a chance to reject anything. The full-image stripped branch in _fetch_decode_cog_http_strips (window=None) was the entry point.

The windowed branch was already safe: it uses per-strip ranged GETs bounded by XRSPATIAL_COG_MAX_TILE_BYTES.

Fix (option b)

I went with the byte-budget cap (option b from the security report) rather than refactoring the full-image path to use ranged GETs (option a). Option a would have required either extracting _read_strips's post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. Option b is a smaller, more targeted change for the same threat.

  • _HTTPSource.read_all() gains max_bytes: int | None = None. When set, it rejects an oversized Content-Length up front, then streams the body and aborts past max_bytes + 1 bytes. The + 1 is the over-shoot probe.
  • _request() gains preload_content=False plumbing so the streaming path actually streams instead of buffering into resp.data. The redirect handler now releases connections on the streaming path so it does not leak pool slots.
  • _fetch_decode_cog_http_strips computes the budget from the strip table: max(offset + byte_count) + 4 MiB slack for the TIFF header and trailing metadata.
  • _compute_full_image_byte_budget handles missing / all-sparse strip tables by falling back to the per-strip safety cap.

max_bytes=None preserves the legacy unbounded behaviour for non-HTTP callers (cloud reads are already gated by max_cloud_bytes; file reads are bounded by file size).

Test plan

  • Content-Length advertised over budget raises OSError before the read.
  • Content-Length omitted entirely (chunked transfer) -> streaming cap catches the over-shoot.
  • Server lies low (advertises small, sends big) -> urllib3 truncates at the advertised length; behaviour locked in.
  • Legitimate full-image COG reads still pass end-to-end.
  • A legitimate COG body padded with 64 MiB of zeros is rejected.
  • Budget helper unit tests: strip-table max + slack, fallback for empty / all-sparse tables.

10 new tests, all green. Full HTTP regression suite (122 tests) still passes.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
`_HTTPSource.read_all()` was unbounded: no Content-Length validation,
no streaming cap. The full-image stripped HTTP path in
`_fetch_decode_cog_http_strips` only ran `_check_dimensions` against
the TIFF header's declared pixel count, then handed the full body to
`_read_strips`. A tiny declared raster (100x100, well within
`_check_dimensions`) could still be served as a multi-gigabyte body
and the whole body was allocated before TIFF parsing rejected anything.

Fix (option b from the security report):

- `_HTTPSource.read_all()` gains a `max_bytes: int | None = None`
  parameter. When set, it rejects an oversized `Content-Length` up
  front via `OSError`, then streams the body (urllib3
  `preload_content=False` / stdlib `resp.read` in 64 KiB chunks) and
  aborts past `max_bytes + 1`. The `+ 1` is the over-shoot probe: a
  body that exactly matches the cap passes, but a server that omits
  or lies about Content-Length is caught the moment the first extra
  byte arrives.
- `_request()` gains `preload_content=False` plumbing and releases
  the connection on the redirect path so streaming responses do not
  leak pool slots.
- `_fetch_decode_cog_http_strips` computes the budget from the strip
  table (`max(offset + byte_count) + 4 MiB slack` for TIFF header /
  trailing metadata) and passes it through.
- New `_compute_full_image_byte_budget` helper handles the
  degenerate cases (missing or all-sparse strip table) by falling
  back to the per-strip safety cap.

`max_bytes=None` preserves the legacy unbounded behaviour for the
non-HTTP callers (cloud reads are already gated by
`max_cloud_bytes`; file reads are bounded by file size).

Tests cover:
- Content-Length advertised over budget -> OSError before the read.
- Server omits Content-Length entirely (chunked) -> streaming cap
  catches the over-shoot.
- Server lies low (advertises small, sends big) -> urllib3 truncates
  at the advertised length, locked in as the expected behaviour.
- Legitimate full-image COG reads still work end-to-end.
- Attacker pads a legitimate COG body with 64 MiB of zeros -> read
  rejected by either the Content-Length pre-flight or the streaming
  cap.
- Budget helper returns the strip-table maximum plus slack, and
  falls back to the per-strip cap for empty / degenerate tables.

Closes #2051.
- Stdlib fallback: add three tests that force ``self._pool = None`` so
  the urllib.request branch in ``read_all`` is actually exercised. One
  for the Content-Length pre-flight check, one for the streaming cap
  on chunked bodies, and one for the legitimate-body pass-through.
- Redirect on the streaming path: drain the redirect response body
  before ``release_conn`` so urllib3 can return the connection to the
  pool instead of closing it. 3xx bodies are tiny so the drain is
  cheap.
- Doc tweak in ``_check_content_length``: spell out that returning
  silently on missing / unparseable headers is intentional because
  the streaming cap is the real defence.
- Test docstring fix: the padded-body test rejects before the body
  is fully buffered, not before it is allocated at all.
#2050 / #2055 dropped the stdlib ``urllib.request`` HTTP fallback
and made urllib3 a hard dependency. The three
``test_read_all_stdlib_fallback_*`` cells that exercised the
fallback's byte-budget enforcement no longer have a code path to
hit; the urllib3-only equivalents above keep the contract covered.
@brendancol brendancol merged commit 2e625bc into main May 18, 2026
4 of 5 checks passed
brendancol added a commit that referenced this pull request May 19, 2026
* geotiff: record security sweep pass 18 (MEDIUM Cat 1 finding)

Re-audit of the geotiff subpackage on 2026-05-18 (deep-sweep p1).
NEW MEDIUM (Cat 1): the eager read_geotiff_gpu path skips the
per-tile byte cap that the CPU paths (_read_tiles,
_fetch_decode_cog_http_tiles) enforce via _max_tile_bytes_from_env().
A malformed local TIFF with TileByteCounts pointing into a large
file region can pass through GPU decode at sizes between 256 MiB
(CPU cap) and ~90% of free VRAM (GPU sum guard).

All other security categories verified clean: JPEG bomb cap (#1792),
HTTP read_all byte budget (#2057), VRT XML cap, DOCTYPE rejection,
path containment, SSRF defenses, dimension caps, IFD entry caps,
MAX_IFDS, MAX_PIXEL_ARRAY_COUNT, GPU bounds guards, atomic writes,
realpath canonicalization, dtype validation.

* geotiff: apply per-tile byte cap on GPU read path (Cat 1 MEDIUM)

The CPU readers ``_read_tiles`` and ``_fetch_decode_cog_http_tiles``
reject tiles whose declared ``TileByteCount`` exceeds the env-driven
``_max_tile_bytes_from_env()`` cap (default 256 MiB). The eager GPU
read path skipped this check; ``validate_tile_layout`` only bounds
the offsets array length, not the byte-count entries. A crafted
local TIFF with multi-hundred-MB ``TileByteCount`` values could
pass through to GPU decode where ``_check_gpu_memory`` catches only
the aggregate sum at ~90% of free VRAM, leaving the per-tile budget
asymmetric across the CPU and GPU contracts.

Add the same per-tile loop in ``read_geotiff_gpu`` (after
``validate_tile_layout``) so a single oversized tile is rejected
before any decode work runs. Mirrors the wording of the existing CPU
guards so callers see consistent error messages across backends.

Add ``test_gpu_tile_byte_cap_2026_05_18.py`` covering the rejection,
the wording (forged value + cap appear in the error), the env
override escape hatch, and the legit-read pass-through under the
default cap. The tests use the same forged-TIFF helpers as the CPU
companion suite ``test_local_tile_byte_cap_1664.py``.

* geotiff: extend per-tile byte cap to dask + GPU paths (#2113 review)

Self-review caught three issues. Address them.

- Extend the per-tile cap to the dask + GPU chunked path. The eager
  GPU path's cap landed at the right spot but the chunked path
  (``_read_geotiff_gpu_dask_path`` -> ``_read_geotiff_gpu_chunked_gds``
  or fall-through to ``read_geotiff_dask``) parsed IFDs and built a
  dask graph without applying the cap, so a forged file could slip
  past at graph-build time. The cap now fires at metadata parse time
  inside ``_read_geotiff_gpu_dask_path`` before any qualification
  probe, and is also applied inside ``_read_geotiff_gpu_chunked_gds``
  for the case where that function is reached directly.

- Hoist the TIFF byte-surgery helper out of the per-test file into
  ``tests/_tiff_surgery.py``. The CPU companion test and the new GPU
  test were carrying near-identical copies of the same struct-packing
  helper. Drift between the two would silently desync the cap tests.

- Spell the sparse-tile pass-through behaviour in the eager and
  chunked GPU comments so a future reader does not add a ``bc > 0``
  guard and accidentally reject sparse files. Sparse tiles
  (``byte_count == 0``) pass under any positive cap by design, mirroring
  the CPU path in ``_reader.py``.

- Tighten ``test_env_override_lifts_cap`` to assert the cap message
  is *not* in any exception fired, rather than swallowing all errors
  via a bare ``except``. A regression that re-fires the cap through a
  different error path now produces a focused failure.

- Drop the misleading underscore-prefix loop variables in the eager
  GPU cap loop; both names are read inside the body.

- Cover the chunked + GPU cap with a parallel regression test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security] _HTTPSource.read_all() is unbounded — small declared raster can pull huge HTTP body

1 participant