Skip to content

read_vrt: SrcRect/DstRect scaling not applied to source array placement#1699

Merged
brendancol merged 2 commits into
mainfrom
fix-vrt-scaled-rects-2026-05-12
May 12, 2026
Merged

read_vrt: SrcRect/DstRect scaling not applied to source array placement#1699
brendancol merged 2 commits into
mainfrom
fix-vrt-scaled-rects-2026-05-12

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented May 12, 2026

read_vrt did not resample source pixel data when a source band's <SrcRect> size differed from its <DstRect> size. The decoded source array was assigned directly into the destination buffer with no resample step, so downsampling raised ValueError: could not broadcast input array from shape (S,S) into shape (D,D) and upsampling left holes -- only the top-left sr.x_size/sr.y_size pixels of each destination cell were written.

When sr.size != dr.size, the fix reads the full source rect, applies nodata masking, nearest-neighbour resamples to (dr.y_size, dr.x_size), then clips into the window. Matches GDAL SimpleSource semantics. <ComplexSource><ResampleAlg> is parsed off the XML so a follow-up can add bilinear / cubic, but nearest is used regardless for now.

What was broken

xrspatial/geotiff/_vrt.py:464-472 computed a source-coordinate window from scale_x = sr.x_size / dr.x_size and scale_y = sr.y_size / dr.y_size. It then called read_to_array(window=...), which returned an array sized to the source window, and copied it directly into the destination slice at L577-583 with no resample step. The "size mismatch" guard at L573-575 compared src_arr.shape against out_r1 - out_r0 where out_r1 = out_r0 + src_arr.shape[0], so the guard never tripped.

Reproducer:

import numpy as np, tempfile, os
from xrspatial.geotiff._writer import write
from xrspatial.geotiff._vrt import read_vrt

with tempfile.TemporaryDirectory() as tmp:
    arr = np.arange(16, dtype=np.uint16).reshape(4, 4)
    src_path = os.path.join(tmp, 'src.tif')
    write(arr, src_path, compression='none', tiled=False)

    vrt_path = os.path.join(tmp, 'down.vrt')
    with open(vrt_path, 'w') as f:
        f.write(f'''<VRTDataset rasterXSize="2" rasterYSize="2">
  <GeoTransform>0.0, 2.0, 0.0, 0.0, 0.0, -2.0</GeoTransform>
  <VRTRasterBand dataType="UInt16" band="1">
    <SimpleSource>
      <SourceFilename relativeToVRT="0">{src_path}</SourceFilename>
      <SourceBand>1</SourceBand>
      <SrcRect xOff="0" yOff="0" xSize="4" ySize="4"/>
      <DstRect xOff="0" yOff="0" xSize="2" ySize="2"/>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>''')

    read_vrt(vrt_path)
    # ValueError: could not broadcast input array from shape (4,4) into shape (2,2)

What's fixed

A new _resample_nearest helper. Integer-ratio paths short-circuit to np.repeat (upsample) or strided slicing (downsample); non-integer ratios use a per-axis nearest-index gather. The centre-of-output-pixel rule (floor((out_idx + 0.5) * src_size / out_size)) matches GDAL's nearest-neighbour rounding.

read_vrt now reads the full source rect when sr.size != dr.size, runs nodata masking, resamples to (dr.y_size, dr.x_size), then clips to the window subregion. Same-size sources still use windowed reads as before -- the windowed path is cheaper and the resample is only invoked when the source rect actually needs scaling.

Nodata masking runs before the resample so the sentinel survives nearest sampling. NaN flows through the float-masking branch; the integer-into-float promotion runs against the source-shaped array and the resulting NaN propagates through the resample.

Test plan

  • pytest xrspatial/geotiff/tests/test_vrt_scaled_rects_1694.py -- 7 new tests pass
  • All 7 new tests fail before the fix (verified by stash-and-rerun)
  • pytest xrspatial/geotiff/tests/test_vrt_*.py -- 96 passed, no VRT regressions
  • pytest xrspatial/geotiff/tests/ -- 1696 passed, 3 failures are a pre-existing matplotlib/palette RecursionError unrelated to VRT

New tests cover downsample 4x4 -> 2x2 (no broadcast error, nearest-neighbour values match), upsample 2x2 -> 4x4 (each source pixel repeated 2x2, no holes), non-integer scale 3x3 -> 2x2 (general index-mapping path), per-band scale mix (band 1 downsampled, band 2 native), window=(0,0,1,1) on a downsampled source (returns the right pixel), nodata preservation across a 4x4 -> 2x2 resample with all-sentinel samples (all NaN), and a mixed sentinel / valid source -> mixed NaN / valid destination case.

Fixes #1694

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 12, 2026
@brendancol brendancol requested a review from Copilot May 12, 2026 18:12
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes xrspatial.geotiff._vrt.read_vrt so VRT sources with differing <SrcRect> and <DstRect> sizes are properly nearest-neighbour resampled before being placed into the destination buffer, matching GDAL SimpleSource semantics and avoiding broadcast errors / “holes”.

Changes:

  • Add _resample_nearest helper and update read_vrt to resample when SrcRect and DstRect sizes differ.
  • Parse and record <ComplexSource><ResampleAlg> in the VRT parser (informational for now; still uses nearest).
  • Add regression tests covering downsample/upsample/non-integer scaling, windowed reads, band mixes, and nodata behavior (issue #1694).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
xrspatial/geotiff/_vrt.py Implements SrcRect/DstRect-aware resampling (nearest) during source placement; parses ResampleAlg.
xrspatial/geotiff/tests/test_vrt_scaled_rects_1694.py New regression tests validating scaled-rect placement, window behavior, and nodata preservation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

from __future__ import annotations

import numpy as np
import pytest
Comment thread xrspatial/geotiff/_vrt.py
Comment on lines +396 to +414
src_h, src_w = src_arr.shape[:2]
if src_h == out_h and src_w == out_w:
return src_arr
# Fast paths for integer ratios -- common when a VRT downsamples by
# 2x / 4x or upsamples by an integer factor. ``np.repeat`` is a
# cheaper nearest-neighbour upsample than the gather below.
if (out_h % src_h == 0 and out_w % src_w == 0
and out_h >= src_h and out_w >= src_w):
ry = out_h // src_h
rx = out_w // src_w
return np.repeat(np.repeat(src_arr, ry, axis=0), rx, axis=1)
if (src_h % out_h == 0 and src_w % out_w == 0
and src_h >= out_h and src_w >= out_w):
sy = src_h // out_h
sx = src_w // out_w
# Sample the centre of each output pixel's source block (offset
# by half a block, floored) so the integer-ratio downsample
# agrees with the general non-integer path for the same shapes.
return src_arr[sy // 2::sy, sx // 2::sx][:out_h, :out_w]
Comment thread xrspatial/geotiff/_vrt.py
Comment on lines +535 to +550
# destination cell, so we read the full source rect, apply
# nodata masking, resample to ``(dr.y_size, dr.x_size)`` with
# nearest-neighbour (matching GDAL's SimpleSource semantics),
# and *then* take the clip subwindow. Reading the whole
# source rect first keeps the resample math local to the
# source/destination shapes; trying to resample a windowed
# source slice independently introduces fence-post errors at
# the clip boundary and breaks the integer-ratio fast paths.
# See issue #1694.
needs_resample = (sr.y_size != dr.y_size
or sr.x_size != dr.x_size)
if needs_resample:
read_r0 = sr.y_off
read_c0 = sr.x_off
read_r1 = sr.y_off + sr.y_size
read_c1 = sr.x_off + sr.x_size
read_vrt previously copied a decoded source array directly into the
destination buffer with no resample step. When SrcRect.size !=
DstRect.size that produced a broadcast error (downsample) or left
holes in the destination (upsample) because only the top-left
SrcRect-sized region was written.

Read the full source rect, apply nodata masking, then nearest-neighbour
resample to the DstRect shape before clipping into the window. Matches
GDAL's SimpleSource semantics. ResampleAlg is parsed off ComplexSource
for future higher-quality modes; nearest is used regardless for now.

Fixes #1694
* ``_resample_nearest`` now rejects empty source arrays up front. A
  SimpleSource with ``SrcRect xSize=0`` or ``ySize=0`` -- or a windowed
  read that clamps to an empty slice -- would otherwise reach the
  integer-ratio fast paths and divide / modulo by zero on
  ``src_h``/``src_w``.  Raising a clear ``ValueError`` surfaces the bad
  VRT instead of an opaque ``ZeroDivisionError``.
* Added a parametrized regression test covering the (0,5), (5,0), and
  (0,0) source shapes.
* Filed issue #1704 for the read-only-the-window-subset optimization
  the reviewer flagged on the ``needs_resample`` branch.  Left a
  ``TODO(#1704)`` at the full-SrcRect read site referencing it.
@brendancol brendancol merged commit ed8b22f into main May 12, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

read_vrt: SrcRect/DstRect scaling not applied to source array placement

2 participants