read_vrt: SrcRect/DstRect scaling not applied to source array placement#1699
Merged
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes xrspatial.geotiff._vrt.read_vrt so VRT sources with differing <SrcRect> and <DstRect> sizes are properly nearest-neighbour resampled before being placed into the destination buffer, matching GDAL SimpleSource semantics and avoiding broadcast errors / “holes”.
Changes:
- Add
_resample_nearesthelper and updateread_vrtto resample whenSrcRectandDstRectsizes differ. - Parse and record
<ComplexSource><ResampleAlg>in the VRT parser (informational for now; still uses nearest). - Add regression tests covering downsample/upsample/non-integer scaling, windowed reads, band mixes, and nodata behavior (issue #1694).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_vrt.py |
Implements SrcRect/DstRect-aware resampling (nearest) during source placement; parses ResampleAlg. |
xrspatial/geotiff/tests/test_vrt_scaled_rects_1694.py |
New regression tests validating scaled-rect placement, window behavior, and nodata preservation. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| from __future__ import annotations | ||
|
|
||
| import numpy as np | ||
| import pytest |
Comment on lines
+396
to
+414
| src_h, src_w = src_arr.shape[:2] | ||
| if src_h == out_h and src_w == out_w: | ||
| return src_arr | ||
| # Fast paths for integer ratios -- common when a VRT downsamples by | ||
| # 2x / 4x or upsamples by an integer factor. ``np.repeat`` is a | ||
| # cheaper nearest-neighbour upsample than the gather below. | ||
| if (out_h % src_h == 0 and out_w % src_w == 0 | ||
| and out_h >= src_h and out_w >= src_w): | ||
| ry = out_h // src_h | ||
| rx = out_w // src_w | ||
| return np.repeat(np.repeat(src_arr, ry, axis=0), rx, axis=1) | ||
| if (src_h % out_h == 0 and src_w % out_w == 0 | ||
| and src_h >= out_h and src_w >= out_w): | ||
| sy = src_h // out_h | ||
| sx = src_w // out_w | ||
| # Sample the centre of each output pixel's source block (offset | ||
| # by half a block, floored) so the integer-ratio downsample | ||
| # agrees with the general non-integer path for the same shapes. | ||
| return src_arr[sy // 2::sy, sx // 2::sx][:out_h, :out_w] |
Comment on lines
+535
to
+550
| # destination cell, so we read the full source rect, apply | ||
| # nodata masking, resample to ``(dr.y_size, dr.x_size)`` with | ||
| # nearest-neighbour (matching GDAL's SimpleSource semantics), | ||
| # and *then* take the clip subwindow. Reading the whole | ||
| # source rect first keeps the resample math local to the | ||
| # source/destination shapes; trying to resample a windowed | ||
| # source slice independently introduces fence-post errors at | ||
| # the clip boundary and breaks the integer-ratio fast paths. | ||
| # See issue #1694. | ||
| needs_resample = (sr.y_size != dr.y_size | ||
| or sr.x_size != dr.x_size) | ||
| if needs_resample: | ||
| read_r0 = sr.y_off | ||
| read_c0 = sr.x_off | ||
| read_r1 = sr.y_off + sr.y_size | ||
| read_c1 = sr.x_off + sr.x_size |
read_vrt previously copied a decoded source array directly into the destination buffer with no resample step. When SrcRect.size != DstRect.size that produced a broadcast error (downsample) or left holes in the destination (upsample) because only the top-left SrcRect-sized region was written. Read the full source rect, apply nodata masking, then nearest-neighbour resample to the DstRect shape before clipping into the window. Matches GDAL's SimpleSource semantics. ResampleAlg is parsed off ComplexSource for future higher-quality modes; nearest is used regardless for now. Fixes #1694
e04d13e to
63acb7d
Compare
* ``_resample_nearest`` now rejects empty source arrays up front. A SimpleSource with ``SrcRect xSize=0`` or ``ySize=0`` -- or a windowed read that clamps to an empty slice -- would otherwise reach the integer-ratio fast paths and divide / modulo by zero on ``src_h``/``src_w``. Raising a clear ``ValueError`` surfaces the bad VRT instead of an opaque ``ZeroDivisionError``. * Added a parametrized regression test covering the (0,5), (5,0), and (0,0) source shapes. * Filed issue #1704 for the read-only-the-window-subset optimization the reviewer flagged on the ``needs_resample`` branch. Left a ``TODO(#1704)`` at the full-SrcRect read site referencing it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
read_vrtdid not resample source pixel data when a source band's<SrcRect>size differed from its<DstRect>size. The decoded source array was assigned directly into the destination buffer with no resample step, so downsampling raisedValueError: could not broadcast input array from shape (S,S) into shape (D,D)and upsampling left holes -- only the top-leftsr.x_size/sr.y_sizepixels of each destination cell were written.When
sr.size != dr.size, the fix reads the full source rect, applies nodata masking, nearest-neighbour resamples to(dr.y_size, dr.x_size), then clips into the window. Matches GDALSimpleSourcesemantics.<ComplexSource><ResampleAlg>is parsed off the XML so a follow-up can add bilinear / cubic, but nearest is used regardless for now.What was broken
xrspatial/geotiff/_vrt.py:464-472computed a source-coordinate window fromscale_x = sr.x_size / dr.x_sizeandscale_y = sr.y_size / dr.y_size. It then calledread_to_array(window=...), which returned an array sized to the source window, and copied it directly into the destination slice at L577-583 with no resample step. The "size mismatch" guard at L573-575 comparedsrc_arr.shapeagainstout_r1 - out_r0whereout_r1 = out_r0 + src_arr.shape[0], so the guard never tripped.Reproducer:
What's fixed
A new
_resample_nearesthelper. Integer-ratio paths short-circuit tonp.repeat(upsample) or strided slicing (downsample); non-integer ratios use a per-axis nearest-index gather. The centre-of-output-pixel rule (floor((out_idx + 0.5) * src_size / out_size)) matches GDAL's nearest-neighbour rounding.read_vrtnow reads the full source rect whensr.size != dr.size, runs nodata masking, resamples to(dr.y_size, dr.x_size), then clips to the window subregion. Same-size sources still use windowed reads as before -- the windowed path is cheaper and the resample is only invoked when the source rect actually needs scaling.Nodata masking runs before the resample so the sentinel survives nearest sampling. NaN flows through the float-masking branch; the integer-into-float promotion runs against the source-shaped array and the resulting NaN propagates through the resample.
Test plan
pytest xrspatial/geotiff/tests/test_vrt_scaled_rects_1694.py-- 7 new tests passpytest xrspatial/geotiff/tests/test_vrt_*.py-- 96 passed, no VRT regressionspytest xrspatial/geotiff/tests/-- 1696 passed, 3 failures are a pre-existing matplotlib/paletteRecursionErrorunrelated to VRTNew tests cover downsample 4x4 -> 2x2 (no broadcast error, nearest-neighbour values match), upsample 2x2 -> 4x4 (each source pixel repeated 2x2, no holes), non-integer scale 3x3 -> 2x2 (general index-mapping path), per-band scale mix (band 1 downsampled, band 2 native),
window=(0,0,1,1)on a downsampled source (returns the right pixel), nodata preservation across a 4x4 -> 2x2 resample with all-sentinel samples (all NaN), and a mixed sentinel / valid source -> mixed NaN / valid destination case.Fixes #1694