Skip to content

Make read_vrt chunks mode lazy#1807

Merged
brendancol merged 5 commits into
mainfrom
issue-1798
May 13, 2026
Merged

Make read_vrt chunks mode lazy#1807
brendancol merged 5 commits into
mainfrom
issue-1798

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1798.\n\nAdds a lazy CPU dask VRT path that builds windowed tasks instead of assembling the full mosaic before chunking.\n\nTested: pytest xrspatial/geotiff/tests/test_read_vrt_lazy_chunks_1798.py

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 14:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses #1798 by making read_vrt(chunks=...) on CPU genuinely lazy: instead of assembling the full VRT mosaic eagerly and then chunking, it builds a dask array composed of per-window read tasks.

Changes:

  • Added a new CPU dask path for read_vrt(chunks=...) that constructs windowed delayed tasks (_read_vrt_dask).
  • Added _vrt_effective_dtype to infer a stable output dtype for the lazy dask graph.
  • Added regression tests covering value parity vs eager reads and ensuring no source reads/warnings during lazy construction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/__init__.py Adds a new lazy dask backend for CPU read_vrt(chunks=...) and routes chunked CPU reads through it.
xrspatial/geotiff/tests/test_read_vrt_lazy_chunks_1798.py Adds tests for the new lazy chunked VRT read behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3839 to +3842
rows = list(range(0, height, ch_h))
cols = list(range(0, width, ch_w))
out_has_band_axis = band is None and n_bands > 1

Comment on lines +3843 to +3851
@dask.delayed
def _read_chunk(chunk_window):
chunk_da = read_vrt(
source, dtype=dtype, window=chunk_window, band=band,
chunks=None, gpu=False, max_pixels=max_pixels,
)
arr = np.asarray(chunk_da.values)
if arr.dtype != out_dtype:
arr = arr.astype(out_dtype)
@brendancol
Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown
Contributor

Copilot AI commented May 13, 2026

@copilot resolve the merge conflicts in this pull request

Resolved and pushed in 7329dd9.

Resolves conflict in xrspatial/geotiff/__init__.py: keeps the
`_read_vrt_dask` dispatch hook from the PR branch. All other
geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804,
#1805, #1806) were already integrated into the working tree by the
prior 7329dd9 commit; this merge just records the parent so git
recognises the reconciliation.
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
@brendancol brendancol merged commit 78291a1 into main May 13, 2026
10 checks passed
Copilot AI added a commit that referenced this pull request May 13, 2026
Keep _read_vrt_chunked dispatch (handles gpu=True + chunks=) over the
non-GPU-capable _read_vrt_dask added in #1807. Remove the now-dead
_read_vrt_dask and _vrt_effective_dtype functions that were only
reachable via the superseded dispatch branch.

Auto-merged from main: _vrt.py (VRT resample window inverse #1704 +
XML size cap #1815 + minimal source window #1821), test files
test_read_vrt_lazy_chunks_1798.py, test_vrt_dstrect_resample_cap_1737.py,
test_vrt_resample_window_inverse_1704.py, test_vrt_xml_size_cap_1815.py.

Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_vrt chunks mode materializes the full mosaic before chunking

3 participants