Skip to content

geotiff: add mask_nodata kwarg to preserve integer source dtypes (closes #2052)#2058

Merged
brendancol merged 1 commit into
mainfrom
issue-2052
May 18, 2026
Merged

geotiff: add mask_nodata kwarg to preserve integer source dtypes (closes #2052)#2058
brendancol merged 1 commit into
mainfrom
issue-2052

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2052.

Problem

open_geotiff(path, dtype="uint16") on a uint16 file whose nodata
sentinel matched real pixels raised ValueError. The masking block
promoted the array to float64 before the dtype= cast ran, and the
cast then rejected the float-to-int conversion.

The docstring promised "Pass dtype=... to keep the source dtype",
but the most common integer-nodata case never reached the cast with
its source dtype intact.

Fix

Add mask_nodata: bool = True to open_geotiff, read_geotiff_dask,
read_geotiff_gpu, and read_vrt. When mask_nodata=False:

  • skip the sentinel-to-NaN promotion;
  • keep the source dtype (or honour dtype=);
  • attrs['nodata'] still carries the raw sentinel so callers can
    mask themselves.

Default stays True, so existing callers see no change.

The eager reader's existing local named mask_nodata (the resolved
sentinel value used for comparison) is renamed to nodata_sentinel
to free up the kwarg name. Docstrings on all four readers document
the new kwarg and the dtype= interaction.

Threading

The kwarg flows through every public reader entry, including:

  • open_geotiff dispatcher (forwards to all backends)
  • read_geotiff_dask (gates the effective_dtype promotion and the
    per-chunk _delayed_read_window mask)
  • read_geotiff_gpu (gates _apply_nodata_mask_gpu on the eager
    path; gates the chunked GDS declared_dtype calculation and the
    per-chunk task)
  • read_vrt (gates _apply_integer_sentinel_mask on eager and
    chunked paths)

The pre-existing kwarg-order canonical-list test (#1935) picks up
the new entry.

Tests

10 new tests in test_mask_nodata_kwarg_2052.py:

  • regression repro: dtype="uint16" raised before, now works with
    mask_nodata=False
  • default mask_nodata=True still promotes to float64 + NaN
  • no-match case: both modes return the same uint16 array
  • float file: NaN nodata is a no-op either way
  • dtype="uint32" integer-to-integer cast works with the opt-out
  • dask path: integer source dtype survives with mask_nodata=False
  • dask path: default still promotes
  • dask path: dtype= + mask_nodata=False round-trip

Test plan

  • pytest xrspatial/geotiff/tests/test_mask_nodata_kwarg_2052.py -x
  • pytest xrspatial/geotiff/tests/ -k "nodata or dtype" -q
  • Full geotiff sweep: only pre-existing failures
    (test_predictor2_big_endian_gpu_1517, test_size_param_validation_gpu_vrt_1776)
    remain, and both reproduce on main.

…dtypes (#2052)

open_geotiff(path, dtype="uint16") used to raise on a uint16 file
whose nodata sentinel matched real pixels. The masking block promoted
the array to float64 before the dtype= cast ran, and the cast then
rejected float-to-int. The docstring promised "Pass dtype=... to keep
the source dtype", but for the common integer-nodata case that
contract was unreachable.

Add mask_nodata: bool = True to open_geotiff, read_geotiff_dask,
read_geotiff_gpu, and read_vrt. mask_nodata=False skips the
sentinel-to-NaN step (and the float64 promotion that comes with it)
so the source dtype survives. attrs['nodata'] still carries the raw
sentinel either way, so downstream code can mask explicitly.

The existing local variable named mask_nodata in the eager reader is
renamed to nodata_sentinel to free up the name. Docstrings on all
four public readers document the new kwarg and its interaction with
dtype=. Default behaviour is unchanged.

Threads through every public reader entry, including the dask graph
declared-dtype calculation, the GPU chunked GDS path, and the VRT
chunked path. test_reader_kwarg_order_1935 picks up the new kwarg in
its canonical-order list.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
@brendancol brendancol merged commit 959ed0e into main May 18, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] open_geotiff(dtype=...) fails when integer nodata sentinel matches pixels

1 participant