Description
open_geotiff(path, window=...) on the eager (numpy) path produces a confusing CoordinateValidationError when the window extends past the source extent in any direction. read_to_array correctly clamps the window to file bounds and returns a smaller array, but the eager code path in open_geotiff uses the unclamped window indices to build the y/x coordinate arrays. The resulting coord arrays have a different length than the returned data, so xarray refuses to construct the DataArray.
This affects both negative starts (e.g. window=(-5, -5, 5, 5)) and partial out-of-bounds windows at the right/bottom edges (e.g. window=(5, 5, 15, 15) on a 10x10 raster).
The dask path (read_geotiff_dask) rejects out-of-bounds windows with a clear ValueError("window=... is outside the source extent ..."). The eager path silently lets the bad window through to xarray.
Reproduction
import numpy as np
import tempfile, os
import xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff
arr = np.arange(100, dtype=np.float32).reshape(10, 10)
da = xr.DataArray(arr, dims=['y', 'x'], coords={'y': np.arange(10), 'x': np.arange(10)},
attrs={'transform': (1.0, 0.0, 0.0, 0.0, -1.0, 10.0)})
with tempfile.NamedTemporaryFile(suffix='.tif', delete=False) as f:
path = f.name
to_geotiff(da, path)
# Eager path: out-of-bounds window
try:
result = open_geotiff(path, window=(5, 5, 15, 15))
print('shape:', result.shape)
except Exception as e:
print(f'Eager error: {type(e).__name__}: {e}')
# Dask path: same window
try:
result = open_geotiff(path, window=(5, 5, 15, 15), chunks=4)
print('shape:', result.shape)
except Exception as e:
print(f'Dask error: {type(e).__name__}: {e}')
os.unlink(path)
Output:
Eager error: CoordinateValidationError: conflicting sizes for dimension 'y': length 5 on the data but length 10 on coordinate 'y'
Dask error: ValueError: window=(5, 5, 15, 15) is outside the source extent (10x10) or has non-positive size.
Root cause
In xrspatial/geotiff/__init__.py around lines 562-572:
if window is not None:
r0, c0, r1, c1 = window # NOT clamped
t = geo_info.transform
if geo_info.raster_type == RASTER_PIXEL_IS_POINT:
full_x = np.arange(c0, c1, dtype=np.float64) * t.pixel_width + t.origin_x
full_y = np.arange(r0, r1, dtype=np.float64) * t.pixel_height + t.origin_y
else:
full_x = np.arange(c0, c1, dtype=np.float64) * t.pixel_width + t.origin_x + t.pixel_width * 0.5
full_y = np.arange(r0, r1, dtype=np.float64) * t.pixel_height + t.origin_y + t.pixel_height * 0.5
coords = {'y': full_y, 'x': full_x}
read_to_array already clamped its window to (max(0, r0), max(0, c0), min(height, r1), min(width, c1)) and returned an array of the clamped size. The coord arrays here use the unclamped r0/r1/c0/c1, producing arrays whose length differs from the data.
Fix scope
Match the dask path's pre-check: raise ValueError up front for out-of-bounds windows on the eager path, with the same message format. Alternatively, clamp the window before computing coords. The dask path's behavior is the documented contract; the eager path should match.
Categories
- Cat 3: Off-by-one / boundary handling in neighbourhood (window) operations
- Cat 5: Backend inconsistency -- eager vs dask paths produce different errors on the same input
Severity
MEDIUM
Description
open_geotiff(path, window=...)on the eager (numpy) path produces a confusingCoordinateValidationErrorwhen the window extends past the source extent in any direction.read_to_arraycorrectly clamps the window to file bounds and returns a smaller array, but the eager code path inopen_geotiffuses the unclamped window indices to build the y/x coordinate arrays. The resulting coord arrays have a different length than the returned data, so xarray refuses to construct the DataArray.This affects both negative starts (e.g.
window=(-5, -5, 5, 5)) and partial out-of-bounds windows at the right/bottom edges (e.g.window=(5, 5, 15, 15)on a 10x10 raster).The dask path (
read_geotiff_dask) rejects out-of-bounds windows with a clearValueError("window=... is outside the source extent ..."). The eager path silently lets the bad window through to xarray.Reproduction
Output:
Root cause
In
xrspatial/geotiff/__init__.pyaround lines 562-572:read_to_arrayalready clamped its window to(max(0, r0), max(0, c0), min(height, r1), min(width, c1))and returned an array of the clamped size. The coord arrays here use the unclampedr0/r1/c0/c1, producing arrays whose length differs from the data.Fix scope
Match the dask path's pre-check: raise
ValueErrorup front for out-of-bounds windows on the eager path, with the same message format. Alternatively, clamp the window before computing coords. The dask path's behavior is the documented contract; the eager path should match.Categories
Severity
MEDIUM