Summary
_HTTPSource.read_all() in xrspatial/geotiff/_reader.py (around line 1219) does not validate Content-Length and does not cap the body it pulls down. A crafted GeoTIFF whose header declares a tiny raster (well within _check_dimensions) can still be served over HTTP with a multi-gigabyte body. The full-image strip path passes that body straight into _read_strips, allocating the whole thing before TIFF parsing has any chance to reject it.
Attack scenario
- Attacker hosts a URL that:
- Returns a valid TIFF header declaring (for example) a 100x100 image
- Has a
Content-Length of several GB (or no Content-Length at all, streaming forever)
- Victim calls
read_geotiff(url) or any path that lands in _fetch_decode_cog_http_strips with window is None
_check_dimensions(100, 100, samples, max_pixels) passes because the declared raster is small
source.read_all() happily downloads the whole body into memory before _read_strips runs
- Result: process OOMs or fills disk via swap
Affected code
xrspatial/geotiff/_reader.py line 1219 — _HTTPSource.read_all() has no max_bytes, no Content-Length check, no streaming cap
xrspatial/geotiff/_reader.py line 2416-2420 — _fetch_decode_cog_http_strips window is None branch calls read_all() after a pixel-count check that has nothing to do with the on-wire body size
The windowed branch in the same function is already safe: it uses per-strip ranged GETs bounded by _max_tile_bytes_from_env() (default 256 MiB per strip, env override XRSPATIAL_COG_MAX_TILE_BYTES).
Proposed fix
Option (b) from the security report: add a max_bytes parameter to read_all().
- Validate
Content-Length against max_bytes up front and raise OSError if it exceeds
- Stream the body and abort once
max_bytes + 1 bytes have arrived (catches lying servers and missing Content-Length)
- Caller in
_fetch_decode_cog_http_strips computes max_bytes from the strip table: roughly max(offsets[i] + byte_counts[i]) plus a small overhead for the TIFF header
Option (a) — routing the full-image path through the per-strip ranged-GET loop — is cleaner in principle but requires either extracting _read_strips's post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. The byte-budget cap is a smaller, more targeted change for the same threat.
Test plan
- Tiny declared raster + oversized HTTP body raises before the buffer is allocated
- Server lies about
Content-Length (advertises 100 bytes, sends 100 MB)
- Server omits
Content-Length entirely
- Legitimate full-image reads still work
Summary
_HTTPSource.read_all()inxrspatial/geotiff/_reader.py(around line 1219) does not validateContent-Lengthand does not cap the body it pulls down. A crafted GeoTIFF whose header declares a tiny raster (well within_check_dimensions) can still be served over HTTP with a multi-gigabyte body. The full-image strip path passes that body straight into_read_strips, allocating the whole thing before TIFF parsing has any chance to reject it.Attack scenario
Content-Lengthof several GB (or noContent-Lengthat all, streaming forever)read_geotiff(url)or any path that lands in_fetch_decode_cog_http_stripswithwindow is None_check_dimensions(100, 100, samples, max_pixels)passes because the declared raster is smallsource.read_all()happily downloads the whole body into memory before_read_stripsrunsAffected code
xrspatial/geotiff/_reader.pyline 1219 —_HTTPSource.read_all()has nomax_bytes, no Content-Length check, no streaming capxrspatial/geotiff/_reader.pyline 2416-2420 —_fetch_decode_cog_http_stripswindow is Nonebranch callsread_all()after a pixel-count check that has nothing to do with the on-wire body sizeThe windowed branch in the same function is already safe: it uses per-strip ranged GETs bounded by
_max_tile_bytes_from_env()(default 256 MiB per strip, env overrideXRSPATIAL_COG_MAX_TILE_BYTES).Proposed fix
Option (b) from the security report: add a
max_bytesparameter toread_all().Content-Lengthagainstmax_bytesup front and raiseOSErrorif it exceedsmax_bytes + 1bytes have arrived (catches lying servers and missing Content-Length)_fetch_decode_cog_http_stripscomputesmax_bytesfrom the strip table: roughlymax(offsets[i] + byte_counts[i])plus a small overhead for the TIFF headerOption (a) — routing the full-image path through the per-strip ranged-GET loop — is cleaner in principle but requires either extracting
_read_strips's post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. The byte-budget cap is a smaller, more targeted change for the same threat.Test plan
Content-Length(advertises 100 bytes, sends 100 MB)Content-Lengthentirely