Honor max_pixels for VRT source reads#1803
Merged
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR ensures read_vrt(..., max_pixels=...) enforces the caller’s pixel-budget on every VRT SimpleSource materialization, including the underlying source GeoTIFF window reads, and prevents “max_pixels safety limit” failures from being silently treated as missing-source holes (per #1796).
Changes:
- Forward
max_pixelsinto each VRT sourceread_to_array(...)call. - Re-raise
max_pixelssafety-limitValueErrors instead of swallowing them in the lenient “skip missing/unreadable source” fallback path. - Add a regression test covering the bypass scenario (tiny VRT output with oversized
SrcRect).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/geotiff/_vrt.py |
Threads max_pixels through VRT source reads and ensures safety-limit errors propagate rather than becoming holes. |
xrspatial/geotiff/tests/test_vrt_source_max_pixels_1796.py |
Adds regression coverage that a small VRT cannot force an oversized source-window decode under a tight max_pixels. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+879
to
+882
| if (isinstance(e, ValueError) | ||
| and 'exceed' in str(e) | ||
| and 'safety limit' in str(e)): | ||
| raise |
This was referenced May 13, 2026
brendancol
added a commit
that referenced
this pull request
May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the `_read_vrt_dask` dispatch hook from the PR branch. All other geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804, #1805, #1806) were already integrated into the working tree by the prior 7329dd9 commit; this merge just records the parent so git recognises the reconciliation.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
* Make VRT chunks read lazily (#1798) * Cap lazy VRT dask task graphs * Merge origin/main into issue-1798 Agent-Logs-Url: https://github.com/xarray-contrib/xarray-spatial/sessions/27f4131a-2907-4ca0-bf00-f303ab61f2e9 Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com> * geotiff: per-tile dim check uses default cap, not caller budget (#1823) PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
brendancol
added a commit
that referenced
this pull request
May 13, 2026
* geotiff: cap VRT XML read size (closes #1815) * geotiff: per-tile dim check uses default cap, not caller budget (#1823) PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
* geotiff: reject ambiguous 3D writer inputs (#1812) to_geotiff and write_geotiff_gpu used to silently mishandle 3D DataArrays whose leading dim was not in _BAND_DIM_NAMES = ('band', 'bands', 'channel'). The moveaxis that puts (band, y, x) into the on-disk (y, x, band) layout was skipped, the writer kept the leading axis as the spatial y axis, and the round-trip produced a TIFF with silently swapped axes -- on read-back, out[:, :, 0].sum() != arr[0].sum(). Reject ambiguous 3D layouts at all three writer entry points (eager to_geotiff, dask streaming, write_geotiff_gpu) via the shared _validate_3d_writer_dims helper. Accepted layouts: (band, y, x) or (y, x, band) with band-name aliases bands/channel and spatial-name aliases lat/lon/latitude/longitude/row/col. Anything else raises ValueError with an actionable message (rename the non-spatial dim or transpose). Surfaced by the 2026-05-13 metadata propagation sweep. * geotiff: remove unused _AMBIGUOUS_3D_INPUTS test list (#1820 review) * geotiff: per-tile dim check uses default cap, not caller budget (#1823) PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
…1817) * geotiff: apply nodata mask against post-MinIsWhite sentinel (#1809) MinIsWhite inversion was running before the sentinel-to-NaN nodata mask on all four read backends. Because the inversion rewrites the sentinel value (uint8 nodata=0 -> 255, float32 nodata=-9999 -> 9999), the post-inversion equality check matched the wrong pixels: * stored values that equalled the sentinel survived as iinfo.max - sentinel instead of NaN * stored values that happened to equal iinfo.max - sentinel were incorrectly turned into NaN Introduces _miniswhite_inverted_nodata() in _reader.py and stashes the inverted sentinel on geo_info._mask_nodata. Every backend (eager numpy, eager GPU, GPU stripped fallback, dask chunk closure) routes its mask through the new field while attrs['nodata'] keeps the original sentinel for write-side round-trip. The dask graph builder picks up the IFD photometric off geo_info via _read_geo_info / _parse_cog_http_meta so the closure nodata is inverted at graph-build time. 9 regression tests in test_miniswhite_nodata_1809.py cover uint8 with nodata=0, uint16 with nodata=65535, float32 with nodata=-9999 across numpy, dask, and GPU backends, plus no-collision and no-nodata controls. Closes #1809 * geotiff: address Copilot review comments on #1817 Remove unused os/tempfile imports from the MinIsWhite regression test and route the GPU tiled read's CPU-fallback branches through GeoInfo._mask_nodata so a sparse-tile, planar=2 auto-fallback, or final-fallback decode of a MinIsWhite raster masks against the post-inversion sentinel rather than the original. * geotiff: per-tile dim check uses default cap, not caller budget (#1823) PR #1803 forwarded the caller's max_pixels to read_to_array inside read_vrt's source loop so a tiny VRT output cannot force a huge source decode (#1796). The output-window check at the source read enforces that correctly. A separate per-tile dimension check at the same call sites also consumed the caller's max_pixels, so a caller setting max_pixels as an output budget (e.g. 10_000) failed the per-tile sanity check on any normal source whose default tile size is 256x256 (= 65_536 pixels). Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window check at the same functions continues to enforce the user-supplied max_pixels, preserving the #1796 protection.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1796.\n\nThreads max_pixels into each VRT source read and prevents safety-limit errors from being swallowed as missing-source holes.\n\nTested: pytest xrspatial/geotiff/tests/test_vrt_source_max_pixels_1796.py