Skip to content

geotiff: guard int(nodata) on NaN/Inf GDAL_NODATA strings (#1774)#1778

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-13
May 13, 2026
Merged

geotiff: guard int(nodata) on NaN/Inf GDAL_NODATA strings (#1774)#1778
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-geotiff-2026-05-13

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #1774.

open_geotiff / read_geotiff_dask / _apply_nodata_mask_gpu used to crash with ValueError: cannot convert float NaN to integer when reading an integer TIFF whose GDAL_NODATA tag was the string "nan" / "inf" / "-inf". _geotags.py:extract_geo_info parses the tag through float(nodata_str), so a "nan" tag surfaces as Python NaN; three sites in xrspatial/geotiff/__init__.py then called int(nodata) without checking finiteness.

The fix gates each int(nodata) cast on np.isfinite(nodata), mirroring the _resolve_masked_fill / _sparse_fill_value helpers in _reader.py (the unfinished pass of #1581). A non-finite sentinel on an integer file cannot match any pixel value, so the mask is a no-op and the file dtype is preserved; attrs['nodata'] still carries the raw NaN/Inf sentinel so a write round-trip keeps the original GDAL_NODATA tag.

  • xrspatial/geotiff/__init__.py: gate int(nodata) on np.isfinite(nodata) at the eager numpy path (open_geotiff), the GPU helper (_apply_nodata_mask_gpu), and the dask delayed reader (_delayed_read_window). The read_geotiff_dask effective-dtype branch already used try/except but is tightened with the same gate for readability.
  • xrspatial/geotiff/tests/test_nodata_nan_int_1774.py: 15 regression tests across all three backends and the finite-sentinel regression guard.

Test plan

  • 15 new tests in test_nodata_nan_int_1774.py (3 NaN variants + 6 Inf variants on the eager numpy path, in-range finite sentinel still masks, dask NaN + Inf, GPU NaN + Inf + finite).
  • All 34 pre-existing nodata-related tests still pass (test_nodata_out_of_range_1581.py, test_nodata_attr_aliases_1582.py, test_nodata_no_extra_copy_1553.py, test_gpu_nodata_1542.py).
  • All 2023 other geotiff tests still pass. (7 pre-existing failures in test_predictor2_big_endian_gpu_1517.py are unrelated -- they reference xrspatial.geotiff.read_to_array which was hidden from the public namespace in geotiff: read_to_array leaks into public namespace but is not in __all__ or docs #1708. 3 pre-existing matplotlib palette failures in test_features.py are also unrelated.)

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 12:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a GeoTIFF read crash when integer rasters declare a non-finite GDAL_NODATA string (e.g., "nan", "inf"), by guarding int(nodata) casts with np.isfinite(nodata) across the eager (numpy), dask, and GPU masking paths. It adds regression tests to ensure NaN/Inf nodata sentinels become a no-op for integer rasters while preserving attrs['nodata'] for round-trips.

Changes:

  • Guard integer nodata casts (int(nodata)) with np.isfinite(nodata) in open_geotiff, _apply_nodata_mask_gpu, read_geotiff_dask dtype resolution, and _delayed_read_window.
  • Add a new test module covering NaN/Inf nodata strings across eager, dask, and GPU backends, plus a finite-sentinel non-regression check.
  • Update the internal sweep tracking CSV entry for geotiff accuracy pass #21 / issue #1774.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
xrspatial/geotiff/__init__.py Adds np.isfinite gating around integer nodata masking/dtype resolution to avoid int(NaN/Inf) crashes across eager/dask/GPU paths.
xrspatial/geotiff/tests/test_nodata_nan_int_1774.py New regression tests for NaN/Inf nodata strings on integer TIFFs across eager, dask, and GPU paths.
.claude/sweep-accuracy-state.csv Records the accuracy-sweep status entry noting the fix for #1774.
Comments suppressed due to low confidence (3)

xrspatial/geotiff/init.py:990

  • Same as the eager path: int(nodata) will silently truncate fractional sentinels (e.g. nodata=3.5 -> 3) and may mask valid pixels for integer arrays. Consider requiring float(nodata).is_integer() (or equivalent) in addition to np.isfinite before the int(...) cast, so fractional GDAL_NODATA strings behave as a no-op for integer dtypes.
        if not np.isfinite(nodata):
            return arr_gpu
        nodata_int = int(nodata)
        info = np.iinfo(arr_dtype)
        if not (info.min <= nodata_int <= info.max):

xrspatial/geotiff/init.py:2074

  • effective_dtype promotion uses np.isfinite(nodata) and then int(nodata). If the file's nodata tag is a fractional string (e.g. "3.5" on a uint16 file), int(3.5) truncates to 3 and can incorrectly promote/mask against the wrong sentinel. Consider adding an integerness check (e.g. float(nodata).is_integer()) alongside np.isfinite before casting to int.
    if (nodata is not None
            and file_dtype.kind in ('u', 'i')
            and np.isfinite(nodata)):
        try:
            _nd_int = int(nodata)
            _info = np.iinfo(file_dtype)
            if _info.min <= _nd_int <= _info.max:
                effective_dtype = np.dtype('float64')

xrspatial/geotiff/init.py:2322

  • This branch now gates on np.isfinite(nodata) but still does nodata_int = int(nodata). Fractional sentinels (e.g. "3.5") would be truncated to 3 and could incorrectly mask integer pixels. Consider also checking float(nodata).is_integer() (or equivalent) before the int cast so fractional GDAL_NODATA strings are treated as a no-op for integer arrays.
            elif arr.dtype.kind in ('u', 'i') and np.isfinite(nodata):
                # Out-of-range sentinels (e.g. uint16 + nodata=-9999)
                # cannot match any pixel; skip the cast that would
                # otherwise raise OverflowError and leave arr unchanged.
                # Non-finite sentinels ("NaN" / "Inf" GDAL_NODATA strings)
                # also cannot match an integer pixel and would raise
                # ValueError on ``int(nodata)``; the ``np.isfinite`` gate
                # mirrors ``_resolve_masked_fill`` in ``_reader.py``
                # (#1774).
                nodata_int = int(nodata)
                info = np.iinfo(arr.dtype)
                if info.min <= nodata_int <= info.max:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/__init__.py Outdated
Comment on lines +906 to +910
if np.isfinite(nodata):
nodata_int = int(nodata)
info = np.iinfo(arr.dtype)
if info.min <= nodata_int <= info.max:
mask = arr == arr.dtype.type(nodata_int)
open_geotiff / read_geotiff_dask / _apply_nodata_mask_gpu used to crash
with ValueError: cannot convert float NaN to integer when reading an
integer TIFF whose GDAL_NODATA tag was the string "nan" / "inf" / "-inf".
_geotags.py:extract_geo_info parses the tag through float(nodata_str) so
a "nan" tag surfaces as Python NaN; the integer mask code then called
int(nodata) without checking finiteness.

Three sites in xrspatial/geotiff/__init__.py needed the gate (eager
numpy, _apply_nodata_mask_gpu, _delayed_read_window) plus the
read_geotiff_dask effective_dtype branch. Sibling helpers
_resolve_masked_fill and _sparse_fill_value in _reader.py already
guard with not math.isnan(v) and not math.isinf(v), so this is the
unfinished pass of #1581.

A non-finite sentinel on an integer file cannot match any pixel value,
so the mask is a no-op and the file dtype is preserved; attrs['nodata']
still carries the raw NaN/Inf sentinel so a write round-trip keeps the
original GDAL_NODATA tag.

15 regression tests in test_nodata_nan_int_1774.py cover the eager
numpy path (3 NaN string variants + 6 Inf string variants), the dask
path (NaN + Inf), the GPU helper (NaN + Inf + finite regression
guard), and the in-range finite sentinel regression guard on the eager
path. All 2023 existing geotiff tests still pass.
Copilot review on #1778 flagged that the np.isfinite(nodata) guard added
for NaN/Inf sentinels still lets a fractional sentinel through to
int(nodata). A "3.5" GDAL_NODATA on a uint16 file would truncate to 3
and silently mask real pixel value 3.

Pair the np.isfinite check with float(nodata).is_integer() at all four
sites (open_geotiff eager path, _apply_nodata_mask_gpu,
read_geotiff_dask effective_dtype, _delayed_read_window). Matches the
existing _writer.py / _vrt.py pattern used for #1564 and #1616 (VRT
fractional NoDataValue on integer bands stays a no-op).

Add 5 regression tests: fractional NaN-like parametrize (3 variants),
truncation-aliasing guard ("30.5" must not mask pixel value 30), dask
path no-op, and a GPU helper no-op.
@brendancol brendancol force-pushed the deep-sweep-accuracy-geotiff-2026-05-13 branch from 6f5c5eb to 1dd393f Compare May 13, 2026 13:03
@brendancol
Copy link
Copy Markdown
Contributor Author

Good catch. Pushed 1dd393f adding float(nodata).is_integer() alongside the np.isfinite gate at all four sites. Without it, a fractional GDAL_NODATA="3.5" on a uint16 file would truncate via int(3.5) == 3 and silently mask real pixel value 3.

The new check matches the existing _writer.py:280-282 and _vrt.py:827-829 pattern that handles the same case on the VRT side (#1564 / #1616). Added 5 regression tests, including a "30.5 must not mask pixel 30" truncation-aliasing guard so the integerness gate cannot regress to a bare np.isfinite.

@brendancol brendancol merged commit 94382b5 into main May 13, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: int(nan) crash on integer TIFF with GDAL_NODATA="nan"

2 participants