[Security] _HTTPSource.read_all() is unbounded — small declared raster can pull huge HTTP body

## Summary

`_HTTPSource.read_all()` in `xrspatial/geotiff/_reader.py` (around line 1219) does not validate `Content-Length` and does not cap the body it pulls down. A crafted GeoTIFF whose header declares a tiny raster (well within `_check_dimensions`) can still be served over HTTP with a multi-gigabyte body. The full-image strip path passes that body straight into `_read_strips`, allocating the whole thing before TIFF parsing has any chance to reject it.

## Attack scenario

1. Attacker hosts a URL that:
   - Returns a valid TIFF header declaring (for example) a 100x100 image
   - Has a `Content-Length` of several GB (or no `Content-Length` at all, streaming forever)
2. Victim calls `read_geotiff(url)` or any path that lands in `_fetch_decode_cog_http_strips` with `window is None`
3. `_check_dimensions(100, 100, samples, max_pixels)` passes because the declared raster is small
4. `source.read_all()` happily downloads the whole body into memory before `_read_strips` runs
5. Result: process OOMs or fills disk via swap

## Affected code

- `xrspatial/geotiff/_reader.py` line 1219 — `_HTTPSource.read_all()` has no `max_bytes`, no Content-Length check, no streaming cap
- `xrspatial/geotiff/_reader.py` line 2416-2420 — `_fetch_decode_cog_http_strips` `window is None` branch calls `read_all()` after a pixel-count check that has nothing to do with the on-wire body size

The windowed branch in the same function is already safe: it uses per-strip ranged GETs bounded by `_max_tile_bytes_from_env()` (default 256 MiB per strip, env override `XRSPATIAL_COG_MAX_TILE_BYTES`).

## Proposed fix

Option (b) from the security report: add a `max_bytes` parameter to `read_all()`.

- Validate `Content-Length` against `max_bytes` up front and raise `OSError` if it exceeds
- Stream the body and abort once `max_bytes + 1` bytes have arrived (catches lying servers and missing Content-Length)
- Caller in `_fetch_decode_cog_http_strips` computes `max_bytes` from the strip table: roughly `max(offsets[i] + byte_counts[i])` plus a small overhead for the TIFF header

Option (a) — routing the full-image path through the per-strip ranged-GET loop — is cleaner in principle but requires either extracting `_read_strips`'s post-decode logic (sparse strips, LERC masked_fill, predictor, planar layouts) or synthesising a full-file byte buffer from ranged GETs. The byte-budget cap is a smaller, more targeted change for the same threat.

## Test plan

- Tiny declared raster + oversized HTTP body raises before the buffer is allocated
- Server lies about `Content-Length` (advertises 100 bytes, sends 100 MB)
- Server omits `Content-Length` entirely
- Legitimate full-image reads still work

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Security] _HTTPSource.read_all() is unbounded — small declared raster can pull huge HTTP body #2051

Summary

Attack scenario

Affected code

Proposed fix

Test plan

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Security] _HTTPSource.read_all() is unbounded — small declared raster can pull huge HTTP body #2051

Description

Summary

Attack scenario

Affected code

Proposed fix

Test plan

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions