Skip to content

[Security] _HTTPSource.read_all() is unbounded — small declared raster can pull huge HTTP body #2051

@brendancol

Description

@brendancol

Summary

_HTTPSource.read_all() in xrspatial/geotiff/_reader.py (around line 1219) does not validate Content-Length and does not cap the body it pulls down. A crafted GeoTIFF whose header declares a tiny raster (well within _check_dimensions) can still be served over HTTP with a multi-gigabyte body. The full-image strip path passes that body straight into _read_strips, allocating the whole thing before TIFF parsing has any chance to reject it.

Attack scenario

  1. Attacker hosts a URL that:
    • Returns a valid TIFF header declaring (for example) a 100x100 image
    • Has a Content-Length of several GB (or no Content-Length at all, streaming forever)
  2. Victim calls read_geotiff(url) or any path that lands in _fetch_decode_cog_http_strips with window is None
  3. _check_dimensions(100, 100, samples, max_pixels) passes because the declared raster is small
  4. source.read_all() happily downloads the whole body into memory before _read_strips runs
  5. Result: process OOMs or fills disk via swap

Affected code

  • xrspatial/geotiff/_reader.py line 1219 — _HTTPSource.read_all() has no max_bytes, no Content-Length check, no streaming cap
  • xrspatial/geotiff/_reader.py line 2416-2420 — _fetch_decode_cog_http_strips window is None branch calls read_all() after a pixel-count check that has nothing to do with the on-wire body size

The windowed branch in the same function is already safe: it uses per-strip ranged GETs bounded by _max_tile_bytes_from_env() (default 256 MiB per strip, env override XRSPATIAL_COG_MAX_TILE_BYTES).

Proposed fix

Option (b) from the security report: add a max_bytes parameter to read_all().

  • Validate Content-Length against max_bytes up front and raise OSError if it exceeds
  • Stream the body and abort once max_bytes + 1 bytes have arrived (catches lying servers and missing Content-Length)
  • Caller in _fetch_decode_cog_http_strips computes max_bytes from the strip table: roughly max(offsets[i] + byte_counts[i]) plus a small overhead for the TIFF header

Option (a) — routing the full-image path through the per-strip ranged-GET loop — is cleaner in principle but requires either extracting _read_strips's post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. The byte-budget cap is a smaller, more targeted change for the same threat.

Test plan

  • Tiny declared raster + oversized HTTP body raises before the buffer is allocated
  • Server lies about Content-Length (advertises 100 bytes, sends 100 MB)
  • Server omits Content-Length entirely
  • Legitimate full-image reads still work

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions