geotiff: guard int(nodata) on NaN/Inf GDAL_NODATA strings (#1774)#1778
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a GeoTIFF read crash when integer rasters declare a non-finite GDAL_NODATA string (e.g., "nan", "inf"), by guarding int(nodata) casts with np.isfinite(nodata) across the eager (numpy), dask, and GPU masking paths. It adds regression tests to ensure NaN/Inf nodata sentinels become a no-op for integer rasters while preserving attrs['nodata'] for round-trips.
Changes:
- Guard integer nodata casts (
int(nodata)) withnp.isfinite(nodata)inopen_geotiff,_apply_nodata_mask_gpu,read_geotiff_daskdtype resolution, and_delayed_read_window. - Add a new test module covering NaN/Inf nodata strings across eager, dask, and GPU backends, plus a finite-sentinel non-regression check.
- Update the internal sweep tracking CSV entry for geotiff accuracy pass #21 / issue #1774.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
xrspatial/geotiff/__init__.py |
Adds np.isfinite gating around integer nodata masking/dtype resolution to avoid int(NaN/Inf) crashes across eager/dask/GPU paths. |
xrspatial/geotiff/tests/test_nodata_nan_int_1774.py |
New regression tests for NaN/Inf nodata strings on integer TIFFs across eager, dask, and GPU paths. |
.claude/sweep-accuracy-state.csv |
Records the accuracy-sweep status entry noting the fix for #1774. |
Comments suppressed due to low confidence (3)
xrspatial/geotiff/init.py:990
- Same as the eager path:
int(nodata)will silently truncate fractional sentinels (e.g. nodata=3.5 -> 3) and may mask valid pixels for integer arrays. Consider requiringfloat(nodata).is_integer()(or equivalent) in addition tonp.isfinitebefore theint(...)cast, so fractional GDAL_NODATA strings behave as a no-op for integer dtypes.
if not np.isfinite(nodata):
return arr_gpu
nodata_int = int(nodata)
info = np.iinfo(arr_dtype)
if not (info.min <= nodata_int <= info.max):
xrspatial/geotiff/init.py:2074
effective_dtypepromotion usesnp.isfinite(nodata)and thenint(nodata). If the file's nodata tag is a fractional string (e.g. "3.5" on a uint16 file),int(3.5)truncates to 3 and can incorrectly promote/mask against the wrong sentinel. Consider adding an integerness check (e.g.float(nodata).is_integer()) alongsidenp.isfinitebefore casting to int.
if (nodata is not None
and file_dtype.kind in ('u', 'i')
and np.isfinite(nodata)):
try:
_nd_int = int(nodata)
_info = np.iinfo(file_dtype)
if _info.min <= _nd_int <= _info.max:
effective_dtype = np.dtype('float64')
xrspatial/geotiff/init.py:2322
- This branch now gates on
np.isfinite(nodata)but still doesnodata_int = int(nodata). Fractional sentinels (e.g. "3.5") would be truncated to 3 and could incorrectly mask integer pixels. Consider also checkingfloat(nodata).is_integer()(or equivalent) before the int cast so fractional GDAL_NODATA strings are treated as a no-op for integer arrays.
elif arr.dtype.kind in ('u', 'i') and np.isfinite(nodata):
# Out-of-range sentinels (e.g. uint16 + nodata=-9999)
# cannot match any pixel; skip the cast that would
# otherwise raise OverflowError and leave arr unchanged.
# Non-finite sentinels ("NaN" / "Inf" GDAL_NODATA strings)
# also cannot match an integer pixel and would raise
# ValueError on ``int(nodata)``; the ``np.isfinite`` gate
# mirrors ``_resolve_masked_fill`` in ``_reader.py``
# (#1774).
nodata_int = int(nodata)
info = np.iinfo(arr.dtype)
if info.min <= nodata_int <= info.max:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if np.isfinite(nodata): | ||
| nodata_int = int(nodata) | ||
| info = np.iinfo(arr.dtype) | ||
| if info.min <= nodata_int <= info.max: | ||
| mask = arr == arr.dtype.type(nodata_int) |
open_geotiff / read_geotiff_dask / _apply_nodata_mask_gpu used to crash with ValueError: cannot convert float NaN to integer when reading an integer TIFF whose GDAL_NODATA tag was the string "nan" / "inf" / "-inf". _geotags.py:extract_geo_info parses the tag through float(nodata_str) so a "nan" tag surfaces as Python NaN; the integer mask code then called int(nodata) without checking finiteness. Three sites in xrspatial/geotiff/__init__.py needed the gate (eager numpy, _apply_nodata_mask_gpu, _delayed_read_window) plus the read_geotiff_dask effective_dtype branch. Sibling helpers _resolve_masked_fill and _sparse_fill_value in _reader.py already guard with not math.isnan(v) and not math.isinf(v), so this is the unfinished pass of #1581. A non-finite sentinel on an integer file cannot match any pixel value, so the mask is a no-op and the file dtype is preserved; attrs['nodata'] still carries the raw NaN/Inf sentinel so a write round-trip keeps the original GDAL_NODATA tag. 15 regression tests in test_nodata_nan_int_1774.py cover the eager numpy path (3 NaN string variants + 6 Inf string variants), the dask path (NaN + Inf), the GPU helper (NaN + Inf + finite regression guard), and the in-range finite sentinel regression guard on the eager path. All 2023 existing geotiff tests still pass.
Copilot review on #1778 flagged that the np.isfinite(nodata) guard added for NaN/Inf sentinels still lets a fractional sentinel through to int(nodata). A "3.5" GDAL_NODATA on a uint16 file would truncate to 3 and silently mask real pixel value 3. Pair the np.isfinite check with float(nodata).is_integer() at all four sites (open_geotiff eager path, _apply_nodata_mask_gpu, read_geotiff_dask effective_dtype, _delayed_read_window). Matches the existing _writer.py / _vrt.py pattern used for #1564 and #1616 (VRT fractional NoDataValue on integer bands stays a no-op). Add 5 regression tests: fractional NaN-like parametrize (3 variants), truncation-aliasing guard ("30.5" must not mask pixel value 30), dask path no-op, and a GPU helper no-op.
6f5c5eb to
1dd393f
Compare
|
Good catch. Pushed The new check matches the existing |
Summary
Closes #1774.
open_geotiff/read_geotiff_dask/_apply_nodata_mask_gpuused to crash withValueError: cannot convert float NaN to integerwhen reading an integer TIFF whoseGDAL_NODATAtag was the string"nan"/"inf"/"-inf"._geotags.py:extract_geo_infoparses the tag throughfloat(nodata_str), so a"nan"tag surfaces as Python NaN; three sites inxrspatial/geotiff/__init__.pythen calledint(nodata)without checking finiteness.The fix gates each
int(nodata)cast onnp.isfinite(nodata), mirroring the_resolve_masked_fill/_sparse_fill_valuehelpers in_reader.py(the unfinished pass of #1581). A non-finite sentinel on an integer file cannot match any pixel value, so the mask is a no-op and the file dtype is preserved;attrs['nodata']still carries the raw NaN/Inf sentinel so a write round-trip keeps the original GDAL_NODATA tag.xrspatial/geotiff/__init__.py: gateint(nodata)onnp.isfinite(nodata)at the eager numpy path (open_geotiff), the GPU helper (_apply_nodata_mask_gpu), and the dask delayed reader (_delayed_read_window). Theread_geotiff_daskeffective-dtype branch already usedtry/exceptbut is tightened with the same gate for readability.xrspatial/geotiff/tests/test_nodata_nan_int_1774.py: 15 regression tests across all three backends and the finite-sentinel regression guard.Test plan
test_nodata_nan_int_1774.py(3 NaN variants + 6 Inf variants on the eager numpy path, in-range finite sentinel still masks, dask NaN + Inf, GPU NaN + Inf + finite).test_nodata_out_of_range_1581.py,test_nodata_attr_aliases_1582.py,test_nodata_no_extra_copy_1553.py,test_gpu_nodata_1542.py).test_predictor2_big_endian_gpu_1517.pyare unrelated -- they referencexrspatial.geotiff.read_to_arraywhich was hidden from the public namespace in geotiff: read_to_array leaks into public namespace but is not in __all__ or docs #1708. 3 pre-existing matplotlib palette failures intest_features.pyare also unrelated.)