Skip to content

Stream tile writes per dask chunk segment to bound peak memory in to_geotiff #1485

@brendancol

Description

@brendancol

Description

write_streaming() in xrspatial/geotiff/_writer.py claims to stream output one tile-row at a time. For each tile-row it calls dask_data[r0:r1, :].compute(), which materializes the entire row as a single numpy array before tiles are sliced out.

For a 100,000-pixel-wide raster with tile_size=256 and float32 data that buffer is ~100 MB per row. At float64 with three bands it is ~600 MB. The function name and docstring imply a much tighter memory ceiling, so users hitting wide rasters can OOM unexpectedly.

Proposed change

Add streaming_buffer_bytes (default 256 MB) to to_geotiff and pass it through to write_streaming. When a tile-row exceeds the budget, segment it horizontally into chunks that fit. Compute each segment, write its tiles, free the buffer.

Most rasters fit in one segment, so behavior is unchanged. Wide rasters get bounded peak memory.

Existing eager / GPU / COG paths are not touched.

Acceptance criteria

  • Round-trip equality on a synthetic dask raster wider than one segment.
  • A regression test with a tight budget (4 MB) on a wide raster completes without OOM.
  • Existing test_streaming_write.py tests still pass.

Metadata

Metadata

Assignees

No one assigned

    Labels

    oomOut-of-memory risk with large datasetsperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions