Describe the bug
_set_nodata_attrs in xrspatial/geotiff/_attrs.py:473 sets attrs['masked_nodata'] based purely on the final array dtype:
attrs['masked_nodata'] = bool(np.dtype(array_dtype).kind == 'f')
When a float raster with a non-NaN nodata sentinel is read with mask_nodata=False, the eager path skips the sentinel-to-NaN replacement (xrspatial/geotiff/__init__.py:575), the optional dtype cast doesn't change the kind (__init__.py:616), and _set_nodata_attrs is then called with the float dtype (__init__.py:624). The result: the buffer still holds literal sentinel values like -9999, but attrs['masked_nodata'] says True. Anything downstream that trusts the attr ("NaN means missing, sentinels have been replaced") treats -9999 pixels as valid data.
The dask, GPU, and GPU+dask paths follow the same dtype-only pattern. VRT inlines float NaN-masking unconditionally so its dtype-driven attr happens to match buffer state.
Repro
import numpy as np
import xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff
src = xr.DataArray(
np.array([[1.0, 2.0, -9999.0], [4.0, -9999.0, 6.0]], dtype=np.float32),
coords={'y': np.array([0.5, 1.5]), 'x': np.array([0.5, 1.5, 2.5])},
dims=('y', 'x'),
attrs={'nodata': -9999.0},
)
to_geotiff(src, 'with_nodata.tif')
out = open_geotiff('with_nodata.tif', mask_nodata=False)
print(out.values) # [[1, 2, -9999], [4, -9999, 6]] -- literal
print(out.attrs['masked_nodata']) # True -- WRONG
Expected behavior
attrs['masked_nodata'] should be True iff the reader actually replaced sentinel pixels with NaN (or the buffer is NaN-aware as a result). With mask_nodata=False, the function did not mask, so the attr should be False.
Fix
Thread an explicit masked: bool argument through _set_nodata_attrs and have every read path compute it from the actual masking decision instead of inferring from dtype. For the eager / dask / GPU paths the rule is masked = mask_nodata AND final_dtype.kind == 'f'. For VRT the inline NaN-masking on float sources runs regardless of mask_nodata, so the existing dtype-driven rule stays correct there.
Seven call sites need the update: __init__.py:624, _backends/dask.py:328, _backends/gpu.py:426 / :745 / :1214, _backends/vrt.py:323 / :684.
Related
Describe the bug
_set_nodata_attrsinxrspatial/geotiff/_attrs.py:473setsattrs['masked_nodata']based purely on the final array dtype:When a float raster with a non-NaN nodata sentinel is read with
mask_nodata=False, the eager path skips the sentinel-to-NaN replacement (xrspatial/geotiff/__init__.py:575), the optional dtype cast doesn't change the kind (__init__.py:616), and_set_nodata_attrsis then called with the float dtype (__init__.py:624). The result: the buffer still holds literal sentinel values like -9999, butattrs['masked_nodata']saysTrue. Anything downstream that trusts the attr ("NaN means missing, sentinels have been replaced") treats -9999 pixels as valid data.The dask, GPU, and GPU+dask paths follow the same dtype-only pattern. VRT inlines float NaN-masking unconditionally so its dtype-driven attr happens to match buffer state.
Repro
Expected behavior
attrs['masked_nodata']should beTrueiff the reader actually replaced sentinel pixels with NaN (or the buffer is NaN-aware as a result). Withmask_nodata=False, the function did not mask, so the attr should beFalse.Fix
Thread an explicit
masked: boolargument through_set_nodata_attrsand have every read path compute it from the actual masking decision instead of inferring from dtype. For the eager / dask / GPU paths the rule ismasked = mask_nodata AND final_dtype.kind == 'f'. For VRT the inline NaN-masking on float sources runs regardless ofmask_nodata, so the existing dtype-driven rule stays correct there.Seven call sites need the update:
__init__.py:624,_backends/dask.py:328,_backends/gpu.py:426 / :745 / :1214,_backends/vrt.py:323 / :684.Related
attrs['masked_nodata']to split "declared sentinel" from "NaN-masked" semantics.