Skip to content

geotiff: HTTP COG metadata parsing capped at 64 KiB drops large IFD chains #1718

@brendancol

Description

@brendancol

Describe the bug

_parse_cog_http_meta in xrspatial/geotiff/_reader.py:1536-1547 fetches the first 16 KiB of the COG, calls parse_all_ifds, and retries with a single 64 KiB GET if no IFDs came back. After that the byte buffer is final: any IFD whose offset falls past 64 KiB is silently dropped, and any tag whose offset/count array lives past 64 KiB cannot be resolved.

This works for typical web-tile-sized COGs but fails for:

  • Large tiled COGs where TileOffsets / TileByteCounts arrays land past the cap.
  • Files with deep overview pyramids whose IFD chain (next_ifd_offset) walks past 64 KiB.
  • Files with verbose tag content (long GeoAsciiParams, ICC profiles, GDAL_METADATA XML) that pushes the first IFD's tag values past the cap.

The same files read correctly via the local-file path, which uses ordinary file I/O and does not have a fixed window.

Expected behavior

_parse_cog_http_meta keeps fetching until it has all bytes needed to resolve every IFD in the chain (or fails with a clear error if the file is malformed). Two reasonable implementations:

  1. Grow the buffer in stages (e.g. 16K, 64K, 256K, 1M, 4M) until parse_all_ifds reports the chain is fully resolved.
  2. Read IFD bytes lazily via ranged GETs keyed on each next_ifd_offset, fetching only what is needed for each IFD's header + tag-value arrays.

Proposed fix

Approach #1 is simpler and stays compatible with parse_all_ifds's current single-buffer contract. Add a loop that doubles the fetch window when parse_all_ifds returns short and confirm via the parsed next_ifd_offset chain that we have all IFDs. Cap at a few MB to bound worst-case fetches. Add a test fixture with an IFD chain past 64 KiB to lock the behavior in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions