Skip to content

open_geotiff eager path windowed read coord/data shape mismatch on out-of-bounds windows #1634

@brendancol

Description

@brendancol

Description

open_geotiff(path, window=...) on the eager (numpy) path produces a confusing CoordinateValidationError when the window extends past the source extent in any direction. read_to_array correctly clamps the window to file bounds and returns a smaller array, but the eager code path in open_geotiff uses the unclamped window indices to build the y/x coordinate arrays. The resulting coord arrays have a different length than the returned data, so xarray refuses to construct the DataArray.

This affects both negative starts (e.g. window=(-5, -5, 5, 5)) and partial out-of-bounds windows at the right/bottom edges (e.g. window=(5, 5, 15, 15) on a 10x10 raster).

The dask path (read_geotiff_dask) rejects out-of-bounds windows with a clear ValueError("window=... is outside the source extent ..."). The eager path silently lets the bad window through to xarray.

Reproduction

import numpy as np
import tempfile, os
import xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff

arr = np.arange(100, dtype=np.float32).reshape(10, 10)
da = xr.DataArray(arr, dims=['y', 'x'], coords={'y': np.arange(10), 'x': np.arange(10)},
                  attrs={'transform': (1.0, 0.0, 0.0, 0.0, -1.0, 10.0)})

with tempfile.NamedTemporaryFile(suffix='.tif', delete=False) as f:
    path = f.name
to_geotiff(da, path)

# Eager path: out-of-bounds window
try:
    result = open_geotiff(path, window=(5, 5, 15, 15))
    print('shape:', result.shape)
except Exception as e:
    print(f'Eager error: {type(e).__name__}: {e}')

# Dask path: same window
try:
    result = open_geotiff(path, window=(5, 5, 15, 15), chunks=4)
    print('shape:', result.shape)
except Exception as e:
    print(f'Dask error: {type(e).__name__}: {e}')

os.unlink(path)

Output:

Eager error: CoordinateValidationError: conflicting sizes for dimension 'y': length 5 on the data but length 10 on coordinate 'y'
Dask error: ValueError: window=(5, 5, 15, 15) is outside the source extent (10x10) or has non-positive size.

Root cause

In xrspatial/geotiff/__init__.py around lines 562-572:

if window is not None:
    r0, c0, r1, c1 = window  # NOT clamped
    t = geo_info.transform
    if geo_info.raster_type == RASTER_PIXEL_IS_POINT:
        full_x = np.arange(c0, c1, dtype=np.float64) * t.pixel_width + t.origin_x
        full_y = np.arange(r0, r1, dtype=np.float64) * t.pixel_height + t.origin_y
    else:
        full_x = np.arange(c0, c1, dtype=np.float64) * t.pixel_width + t.origin_x + t.pixel_width * 0.5
        full_y = np.arange(r0, r1, dtype=np.float64) * t.pixel_height + t.origin_y + t.pixel_height * 0.5
    coords = {'y': full_y, 'x': full_x}

read_to_array already clamped its window to (max(0, r0), max(0, c0), min(height, r1), min(width, c1)) and returned an array of the clamped size. The coord arrays here use the unclamped r0/r1/c0/c1, producing arrays whose length differs from the data.

Fix scope

Match the dask path's pre-check: raise ValueError up front for out-of-bounds windows on the eager path, with the same message format. Alternatively, clamp the window before computing coords. The dask path's behavior is the documented contract; the eager path should match.

Categories

  • Cat 3: Off-by-one / boundary handling in neighbourhood (window) operations
  • Cat 5: Backend inconsistency -- eager vs dask paths produce different errors on the same input

Severity

MEDIUM

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions