Skip to content

geotiff: .tif.ovr sidecar download bypasses max_cloud_bytes #2121

@brendancol

Description

@brendancol

Summary

The external .tif.ovr sidecar reader added in #2114 downloads the sidecar file
over HTTP or fsspec without applying the max_cloud_bytes byte budget that gates
the base GeoTIFF. A malicious server can serve a tiny base TIFF that passes the
cloud-budget check, plus a multi-gigabyte <base>.tif.ovr sidecar; opening the
file with overview_level >= 1 pulls the full sidecar body into memory and OOMs
the process.

Affected code

xrspatial/geotiff/_sidecar.py:load_sidecar():

  • HTTP / HTTPS path (lines 154-160): calls _HTTPSource(path).read_all() with
    no max_bytes. The _HTTPSource.read_all() docstring at _reader.py:1231
    notes that max_bytes=None "preserves the legacy unbounded behaviour."
  • fsspec path (lines 161-165): calls fsspec.open(path, "rb").read() with no
    size check. The base-file path checks fsspec.size() against max_cloud_bytes
    at _reader.py:3239-3260 and raises CloudSizeLimitError before any bytes
    are downloaded.

Local-file mmap (lines 148-153) is safe -- no download happens.

The unbounded download is reached from three production call sites:

  • _reader.py:3287 -- eager CPU read_to_array
  • _backends/gpu.py:311 -- GPU eager read_geotiff_gpu
  • geotiff/__init__.py:238 -- dask metadata helper

Impact

A user who passes max_cloud_bytes=100_000_000 to bound memory still risks OOM
when a sidecar is present. The sidecar is discovered automatically for any HTTP
or fsspec source, and any overview_level >= 1 triggers the download. No user
opt-in is needed to hit the unbounded read.

Fix

Thread max_cloud_bytes from the caller through load_sidecar and apply it on
both transports:

  • HTTP: pass the resolved budget into _HTTPSource(path).read_all(max_bytes=cloud_budget).
    The existing read_all already streams with a +1 overshoot detector when
    max_bytes is set.
  • fsspec: stat the sidecar via fsspec.size() before the read and raise
    CloudSizeLimitError when the declared size exceeds the budget.

max_cloud_bytes=None continues to mean unbounded, matching the base-file
semantics so callers that already gate the read upstream can opt out.

Tests

  • HTTP sidecar exceeding max_cloud_bytes raises (either CloudSizeLimitError
    or OSError from the streaming overshoot detector, depending on whether the
    server sends a Content-Length).
  • fsspec sidecar exceeding max_cloud_bytes raises CloudSizeLimitError before
    any bytes are read.
  • max_cloud_bytes=None preserves the legacy unbounded behaviour.
  • Local-file mmap path is unaffected.

Reference

Found by the geotiff security sweep on 2026-05-19 (Cat 1, HIGH).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghigh-priorityinput-validationInput validation and error messagesoomOut-of-memory risk with large datasets

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions