geotiff: read_vrt(chunks=...) materialises the full mosaic before chunking

## Summary

`read_vrt(chunks=...)` reads the entire VRT mosaic into a numpy array first, then wraps the result in a dask array via `.chunk()`. The `chunks` kwarg provides no memory protection. Setting `gpu=True, chunks=...` is worse: the full mosaic is assembled on host RAM before `cupy.asarray()` copies it across to the device.

## Where

`xrspatial/geotiff/__init__.py:3879-3882` calls `_read_vrt_internal(...)` eagerly:

```python
arr, vrt = _read_vrt_internal(
    source, window=window, band=band, max_pixels=max_pixels,
    missing_sources=missing_sources,
)
```

Then `xrspatial/geotiff/__init__.py:4074-4079` chunks the already-materialised array:

```python
if chunks is not None:
    ...
    result = result.chunk(chunk_dict)
```

`read_geotiff_dask` (regular TIFFs) takes a different path that constructs delayed per-tile reads, so the bug is specific to the VRT branch.

## Repro

A VRT covering a few terabytes of source data with `chunks=(1024, 1024)` will OOM before any compute is requested, even though the intent of `chunks=` is exactly to avoid that.

## Proposed fix

Build a real dask graph in the chunked path: for each destination chunk, schedule a delayed call into `_read_vrt_internal` with that chunk's destination window. Each task reads only the sources that intersect its window.

Implementation notes:

- A task-count cap is needed. A small `chunks` value over a multi-million-pixel VRT would otherwise generate millions of tasks and stall the scheduler. Reuse the convention from `read_geotiff_dask`.
- The eager path (`chunks=None`) stays as is.
- `gpu=True, chunks=...` should produce a dask-cupy array where each block is materialised directly on the device, avoiding the host-side mosaic step.
- The `vrt.holes` book-keeping needs to merge per-task hole lists into the final DataArray attrs.

## Tests

- Regression: build a VRT covering a region wider than the chunk size, call `read_vrt(..., chunks=(64, 64))`, assert `result.data.dask` has multiple tasks before any compute and that no full materialisation occurs (e.g. by checking that the underlying VRT reader is called once per chunk rather than once for the whole mosaic).
- Cross-backend parity with the eager path: compute the chunked result and compare to the eager result for the same VRT.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

geotiff: read_vrt(chunks=...) materialises the full mosaic before chunking #1814

Summary

Where

Repro

Proposed fix

Tests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

geotiff: read_vrt(chunks=...) materialises the full mosaic before chunking #1814

Description

Summary

Where

Repro

Proposed fix

Tests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions