Describe the bug
_parse_cog_http_meta in xrspatial/geotiff/_reader.py:1536-1547 fetches the first 16 KiB of the COG, calls parse_all_ifds, and retries with a single 64 KiB GET if no IFDs came back. After that the byte buffer is final: any IFD whose offset falls past 64 KiB is silently dropped, and any tag whose offset/count array lives past 64 KiB cannot be resolved.
This works for typical web-tile-sized COGs but fails for:
- Large tiled COGs where TileOffsets / TileByteCounts arrays land past the cap.
- Files with deep overview pyramids whose IFD chain (
next_ifd_offset) walks past 64 KiB.
- Files with verbose tag content (long GeoAsciiParams, ICC profiles, GDAL_METADATA XML) that pushes the first IFD's tag values past the cap.
The same files read correctly via the local-file path, which uses ordinary file I/O and does not have a fixed window.
Expected behavior
_parse_cog_http_meta keeps fetching until it has all bytes needed to resolve every IFD in the chain (or fails with a clear error if the file is malformed). Two reasonable implementations:
- Grow the buffer in stages (e.g. 16K, 64K, 256K, 1M, 4M) until
parse_all_ifds reports the chain is fully resolved.
- Read IFD bytes lazily via ranged GETs keyed on each
next_ifd_offset, fetching only what is needed for each IFD's header + tag-value arrays.
Proposed fix
Approach #1 is simpler and stays compatible with parse_all_ifds's current single-buffer contract. Add a loop that doubles the fetch window when parse_all_ifds returns short and confirm via the parsed next_ifd_offset chain that we have all IFDs. Cap at a few MB to bound worst-case fetches. Add a test fixture with an IFD chain past 64 KiB to lock the behavior in.
Describe the bug
_parse_cog_http_metain xrspatial/geotiff/_reader.py:1536-1547 fetches the first 16 KiB of the COG, callsparse_all_ifds, and retries with a single 64 KiB GET if no IFDs came back. After that the byte buffer is final: any IFD whose offset falls past 64 KiB is silently dropped, and any tag whose offset/count array lives past 64 KiB cannot be resolved.This works for typical web-tile-sized COGs but fails for:
next_ifd_offset) walks past 64 KiB.The same files read correctly via the local-file path, which uses ordinary file I/O and does not have a fixed window.
Expected behavior
_parse_cog_http_metakeeps fetching until it has all bytes needed to resolve every IFD in the chain (or fails with a clear error if the file is malformed). Two reasonable implementations:parse_all_ifdsreports the chain is fully resolved.next_ifd_offset, fetching only what is needed for each IFD's header + tag-value arrays.Proposed fix
Approach #1 is simpler and stays compatible with
parse_all_ifds's current single-buffer contract. Add a loop that doubles the fetch window whenparse_all_ifdsreturns short and confirm via the parsednext_ifd_offsetchain that we have all IFDs. Cap at a few MB to bound worst-case fetches. Add a test fixture with an IFD chain past 64 KiB to lock the behavior in.