Skip to content

geotiff: streaming dask writer picks classic TIFF too late near the 4 GB boundary #1785

@brendancol

Description

@brendancol

Describe the bug

write_streaming() in xrspatial/geotiff/_writer.py decides between classic TIFF and BigTIFF using only the uncompressed pixel volume at lines 1652-1658:

uncompressed_bytes = height * width * bytes_per_sample * samples
UINT32_MAX = 0xFFFFFFFF
if bigtiff is not None:
    use_bigtiff = bigtiff
else:
    use_bigtiff = uncompressed_bytes > UINT32_MAX

The eager _assemble_tiff path adds the IFD, strip/tile tables, geo tags, and per-tile compressed payload estimate on top of the raw pixel count before deciding -- so it promotes to BigTIFF a little earlier than the streaming path. Streaming uses the bare pixel bound, so for a raster just under 4 GiB the IFD and strip table can push the real file size over UINT32_MAX while classic-TIFF was already selected. The failure mode is a late struct.error / overflow during LONG packing of strip offsets, well after the writer has committed to a layout.

The right fix is not to make streaming as eager-accurate as the eager path -- it can't know the compressed payload up-front -- but to reserve a conservative header/IFD overhead and promote when uncompressed_bytes + reserved_overhead >= UINT32_MAX.

Expected behavior

write_streaming() should add a conservative reserved-overhead constant to uncompressed_bytes before comparing against UINT32_MAX, so the BigTIFF decision survives the actual IFD layout and codec overhead. The threshold should be >= rather than > once the overhead is included, since the classic-TIFF format cannot address offsets equal to UINT32_MAX.

Categories

  • Cat 4 (error handling): late struct.error instead of an up-front BigTIFF promotion
  • Cat 5 (backend inconsistency): eager and streaming write paths disagree on when to promote to BigTIFF

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions