Skip to content

Apply tile byte cap to local files + SSRF hardening on HTTP geotiff reads #1664

@brendancol

Description

@brendancol

Describe the bug

Hardening report — not an active CVE. Two paper-cuts in the COG reader become real risk when a service accepts user-supplied URLs or paths.

  1. Tile byte cap is HTTP-only. In xrspatial/geotiff/_reader.py:1408-1416 the XRSPATIAL_COG_MAX_TILE_BYTES cap is checked inside _fetch_decode_cog_http_tiles but not inside _read_tiles or _read_strips. A crafted local TIFF with a huge TileByteCounts or StripByteCounts still walks into _decode_strip_or_tile carrying a giant slice. Compressed payloads can be small and decompress (deflate/zstd/lzw) into hundreds of MB or GB, so the decompressor becomes the OOM trigger even though mmap slicing itself is bounded by file size. The current test_local_path_unaffected_by_cap even pins the broken behavior in place by asserting the cap does NOT apply locally — that needs flipping.

  2. SSRF in _HTTPSource. xrspatial/geotiff/_reader.py:406-503 accepts arbitrary URLs with no:

    • scheme allow-list (file://, gopher://, etc.)
    • host filtering — localhost, 127.0.0.1, IPv6 loopback ::1, link-local 169.254.0.0/16 (cloud IMDS), and RFC1918 private ranges all reachable
    • explicit connect/read timeouts (urllib defaults to infinite; urllib3 has no per-call timeout in our usage)
    • explicit redirect cap

For a service that takes open_geotiff(url=user_input) this is a textbook SSRF surface. http://169.254.169.254/... probes cloud metadata; http://127.0.0.1:6379/ talks to internal Redis.

Expected behavior

  • Local tile and strip readers apply the same XRSPATIAL_COG_MAX_TILE_BYTES cap the HTTP path uses. Keep the existing 256 MiB default so legitimate local reads still work; the env var stays the single tuning knob.
  • _HTTPSource.__init__ rejects non-http(s) schemes by default. XRSPATIAL_GEOTIFF_ALLOWED_SCHEMES opts back in.
  • Hostname resolves through socket.getaddrinfo and any loopback, link-local, or RFC1918 private IP is rejected unless XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1.
  • Explicit timeouts: connect 10s, read 30s. Overridable via XRSPATIAL_GEOTIFF_HTTP_CONNECT_TIMEOUT and XRSPATIAL_GEOTIFF_HTTP_READ_TIMEOUT.
  • Explicit redirect cap of 5.

Backward-compatibility note

Code that currently does open_geotiff('http://127.0.0.1:...') (common in tests and dev) will need XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1. The PR calls this out and ships the escape hatch.

Additional context

Hardening, not an exploit. Defaults stay safe; advanced users get env-var overrides.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginput-validationInput validation and error messages

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions