Skip to content

Honor max_pixels for VRT source reads#1803

Merged
brendancol merged 2 commits into
mainfrom
issue-1796
May 13, 2026
Merged

Honor max_pixels for VRT source reads#1803
brendancol merged 2 commits into
mainfrom
issue-1796

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1796.\n\nThreads max_pixels into each VRT source read and prevents safety-limit errors from being swallowed as missing-source holes.\n\nTested: pytest xrspatial/geotiff/tests/test_vrt_source_max_pixels_1796.py

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 14:56
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ensures read_vrt(..., max_pixels=...) enforces the caller’s pixel-budget on every VRT SimpleSource materialization, including the underlying source GeoTIFF window reads, and prevents “max_pixels safety limit” failures from being silently treated as missing-source holes (per #1796).

Changes:

  • Forward max_pixels into each VRT source read_to_array(...) call.
  • Re-raise max_pixels safety-limit ValueErrors instead of swallowing them in the lenient “skip missing/unreadable source” fallback path.
  • Add a regression test covering the bypass scenario (tiny VRT output with oversized SrcRect).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
xrspatial/geotiff/_vrt.py Threads max_pixels through VRT source reads and ensures safety-limit errors propagate rather than becoming holes.
xrspatial/geotiff/tests/test_vrt_source_max_pixels_1796.py Adds regression coverage that a small VRT cannot force an oversized source-window decode under a tight max_pixels.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_vrt.py Outdated
Comment on lines +879 to +882
if (isinstance(e, ValueError)
and 'exceed' in str(e)
and 'safety limit' in str(e)):
raise
@brendancol brendancol merged commit 6cc96d7 into main May 13, 2026
1 of 11 checks passed
brendancol added a commit that referenced this pull request May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the
`_read_vrt_dask` dispatch hook from the PR branch. All other
geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804,
#1805, #1806) were already integrated into the working tree by the
prior 7329dd9 commit; this merge just records the parent so git
recognises the reconciliation.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
* Make VRT chunks read lazily (#1798)

* Cap lazy VRT dask task graphs

* Merge origin/main into issue-1798

Agent-Logs-Url: https://github.com/xarray-contrib/xarray-spatial/sessions/27f4131a-2907-4ca0-bf00-f303ab61f2e9

Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: brendancol <433221+brendancol@users.noreply.github.com>
brendancol added a commit that referenced this pull request May 13, 2026
* geotiff: cap VRT XML read size (closes #1815)

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
* geotiff: reject ambiguous 3D writer inputs (#1812)

to_geotiff and write_geotiff_gpu used to silently mishandle 3D
DataArrays whose leading dim was not in _BAND_DIM_NAMES = ('band',
'bands', 'channel'). The moveaxis that puts (band, y, x) into the
on-disk (y, x, band) layout was skipped, the writer kept the leading
axis as the spatial y axis, and the round-trip produced a TIFF with
silently swapped axes -- on read-back, out[:, :, 0].sum() !=
arr[0].sum().

Reject ambiguous 3D layouts at all three writer entry points (eager
to_geotiff, dask streaming, write_geotiff_gpu) via the shared
_validate_3d_writer_dims helper. Accepted layouts: (band, y, x) or
(y, x, band) with band-name aliases bands/channel and spatial-name
aliases lat/lon/latitude/longitude/row/col. Anything else raises
ValueError with an actionable message (rename the non-spatial dim
or transpose).

Surfaced by the 2026-05-13 metadata propagation sweep.

* geotiff: remove unused _AMBIGUOUS_3D_INPUTS test list (#1820 review)

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
brendancol added a commit that referenced this pull request May 13, 2026
…1817)

* geotiff: apply nodata mask against post-MinIsWhite sentinel (#1809)

MinIsWhite inversion was running before the sentinel-to-NaN nodata
mask on all four read backends. Because the inversion rewrites the
sentinel value (uint8 nodata=0 -> 255, float32 nodata=-9999 -> 9999),
the post-inversion equality check matched the wrong pixels:

* stored values that equalled the sentinel survived as iinfo.max -
  sentinel instead of NaN
* stored values that happened to equal iinfo.max - sentinel were
  incorrectly turned into NaN

Introduces _miniswhite_inverted_nodata() in _reader.py and stashes the
inverted sentinel on geo_info._mask_nodata. Every backend (eager numpy,
eager GPU, GPU stripped fallback, dask chunk closure) routes its mask
through the new field while attrs['nodata'] keeps the original sentinel
for write-side round-trip. The dask graph builder picks up the IFD
photometric off geo_info via _read_geo_info / _parse_cog_http_meta so
the closure nodata is inverted at graph-build time.

9 regression tests in test_miniswhite_nodata_1809.py cover uint8 with
nodata=0, uint16 with nodata=65535, float32 with nodata=-9999 across
numpy, dask, and GPU backends, plus no-collision and no-nodata controls.

Closes #1809

* geotiff: address Copilot review comments on #1817

Remove unused os/tempfile imports from the MinIsWhite regression test
and route the GPU tiled read's CPU-fallback branches through
GeoInfo._mask_nodata so a sparse-tile, planar=2 auto-fallback, or
final-fallback decode of a MinIsWhite raster masks against the
post-inversion sentinel rather than the original.

* geotiff: per-tile dim check uses default cap, not caller budget (#1823)

PR #1803 forwarded the caller's max_pixels to read_to_array inside
read_vrt's source loop so a tiny VRT output cannot force a huge source
decode (#1796). The output-window check at the source read enforces that
correctly. A separate per-tile dimension check at the same call sites
also consumed the caller's max_pixels, so a caller setting max_pixels as
an output budget (e.g. 10_000) failed the per-tile sanity check on any
normal source whose default tile size is 256x256 (= 65_536 pixels).

Use MAX_PIXELS_DEFAULT for the per-tile dim check at the two call sites
in _read_tiles (local) and _read_tiles_cog_http (HTTP). The output-window
check at the same functions continues to enforce the user-supplied
max_pixels, preserving the #1796 protection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

VRT source reads do not honor max_pixels safety limit

2 participants