Summary
A uint TIFF whose GDAL_NODATA tag is set to a negative sentinel (e.g. -9999) raises OverflowError on read instead of masking the nodata pixels to NaN. The -9999 sentinel is common on unsigned-integer rasters because many GDAL pipelines write the tag as a string without checking dtype range. Both rasterio and GDAL promote the array to float and mask in this case; xarray-spatial fails the cast.
Reproduction
import numpy as np
import xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff
arr = np.array([[1, 2, 3], [4, 5, 6]], dtype=np.uint16)
da = xr.DataArray(arr, dims=['y', 'x'])
to_geotiff(da, '/tmp/test.tif', crs=4326, nodata=-9999)
open_geotiff('/tmp/test.tif')
# OverflowError: Python integer -9999 out of bounds for uint16
Affected backends
The same cast pattern lives in four places, so every backend hits the OverflowError:
xrspatial/geotiff/__init__.py ~line 557: numpy eager read (open_geotiff)
xrspatial/geotiff/__init__.py ~line 625: _apply_nodata_mask_gpu (cupy read)
xrspatial/geotiff/__init__.py ~line 1528: _delayed_read_window (dask read)
xrspatial/geotiff/_reader.py: _resolve_masked_fill and _sparse_fill_value (LERC masked fill + sparse tile fill)
Each does arr.dtype.type(int(nodata)) or dtype.type(int(v)) with no range check.
Expected behavior
For any integer dtype with an out-of-range nodata sentinel, promote to float64 and skip the sentinel comparison (no value in the file can match anyway). Keep attrs['nodata'] set to the user-declared value so write round-trips preserve it. This matches rasterio's behavior.
Severity
HIGH (Cat 2 NaN propagation). This crashes the read of a class of TIFFs that are widespread in practice. Behavior matches across numpy / cupy / dask, so it is a shared accuracy bug rather than a backend divergence.
Summary
A uint TIFF whose
GDAL_NODATAtag is set to a negative sentinel (e.g.-9999) raisesOverflowErroron read instead of masking the nodata pixels to NaN. The-9999sentinel is common on unsigned-integer rasters because many GDAL pipelines write the tag as a string without checking dtype range. Both rasterio and GDAL promote the array to float and mask in this case; xarray-spatial fails the cast.Reproduction
Affected backends
The same cast pattern lives in four places, so every backend hits the OverflowError:
xrspatial/geotiff/__init__.py~line 557: numpy eager read (open_geotiff)xrspatial/geotiff/__init__.py~line 625:_apply_nodata_mask_gpu(cupy read)xrspatial/geotiff/__init__.py~line 1528:_delayed_read_window(dask read)xrspatial/geotiff/_reader.py:_resolve_masked_filland_sparse_fill_value(LERC masked fill + sparse tile fill)Each does
arr.dtype.type(int(nodata))ordtype.type(int(v))with no range check.Expected behavior
For any integer dtype with an out-of-range nodata sentinel, promote to float64 and skip the sentinel comparison (no value in the file can match anyway). Keep
attrs['nodata']set to the user-declared value so write round-trips preserve it. This matches rasterio's behavior.Severity
HIGH (Cat 2 NaN propagation). This crashes the read of a class of TIFFs that are widespread in practice. Behavior matches across numpy / cupy / dask, so it is a shared accuracy bug rather than a backend divergence.