Summary
read_vrt(chunks=...) reads the entire VRT mosaic into a numpy array first, then wraps the result in a dask array via .chunk(). The chunks kwarg provides no memory protection. Setting gpu=True, chunks=... is worse: the full mosaic is assembled on host RAM before cupy.asarray() copies it across to the device.
Where
xrspatial/geotiff/__init__.py:3879-3882 calls _read_vrt_internal(...) eagerly:
arr, vrt = _read_vrt_internal(
source, window=window, band=band, max_pixels=max_pixels,
missing_sources=missing_sources,
)
Then xrspatial/geotiff/__init__.py:4074-4079 chunks the already-materialised array:
if chunks is not None:
...
result = result.chunk(chunk_dict)
read_geotiff_dask (regular TIFFs) takes a different path that constructs delayed per-tile reads, so the bug is specific to the VRT branch.
Repro
A VRT covering a few terabytes of source data with chunks=(1024, 1024) will OOM before any compute is requested, even though the intent of chunks= is exactly to avoid that.
Proposed fix
Build a real dask graph in the chunked path: for each destination chunk, schedule a delayed call into _read_vrt_internal with that chunk's destination window. Each task reads only the sources that intersect its window.
Implementation notes:
- A task-count cap is needed. A small
chunks value over a multi-million-pixel VRT would otherwise generate millions of tasks and stall the scheduler. Reuse the convention from read_geotiff_dask.
- The eager path (
chunks=None) stays as is.
gpu=True, chunks=... should produce a dask-cupy array where each block is materialised directly on the device, avoiding the host-side mosaic step.
- The
vrt.holes book-keeping needs to merge per-task hole lists into the final DataArray attrs.
Tests
- Regression: build a VRT covering a region wider than the chunk size, call
read_vrt(..., chunks=(64, 64)), assert result.data.dask has multiple tasks before any compute and that no full materialisation occurs (e.g. by checking that the underlying VRT reader is called once per chunk rather than once for the whole mosaic).
- Cross-backend parity with the eager path: compute the chunked result and compare to the eager result for the same VRT.
Summary
read_vrt(chunks=...)reads the entire VRT mosaic into a numpy array first, then wraps the result in a dask array via.chunk(). Thechunkskwarg provides no memory protection. Settinggpu=True, chunks=...is worse: the full mosaic is assembled on host RAM beforecupy.asarray()copies it across to the device.Where
xrspatial/geotiff/__init__.py:3879-3882calls_read_vrt_internal(...)eagerly:Then
xrspatial/geotiff/__init__.py:4074-4079chunks the already-materialised array:read_geotiff_dask(regular TIFFs) takes a different path that constructs delayed per-tile reads, so the bug is specific to the VRT branch.Repro
A VRT covering a few terabytes of source data with
chunks=(1024, 1024)will OOM before any compute is requested, even though the intent ofchunks=is exactly to avoid that.Proposed fix
Build a real dask graph in the chunked path: for each destination chunk, schedule a delayed call into
_read_vrt_internalwith that chunk's destination window. Each task reads only the sources that intersect its window.Implementation notes:
chunksvalue over a multi-million-pixel VRT would otherwise generate millions of tasks and stall the scheduler. Reuse the convention fromread_geotiff_dask.chunks=None) stays as is.gpu=True, chunks=...should produce a dask-cupy array where each block is materialised directly on the device, avoiding the host-side mosaic step.vrt.holesbook-keeping needs to merge per-task hole lists into the final DataArray attrs.Tests
read_vrt(..., chunks=(64, 64)), assertresult.data.daskhas multiple tasks before any compute and that no full materialisation occurs (e.g. by checking that the underlying VRT reader is called once per chunk rather than once for the whole mosaic).