Skip to content

Bug: attrs['masked_nodata'] reports True when masking was disabled #2092

@brendancol

Description

@brendancol

Describe the bug

_set_nodata_attrs in xrspatial/geotiff/_attrs.py:473 sets attrs['masked_nodata'] based purely on the final array dtype:

attrs['masked_nodata'] = bool(np.dtype(array_dtype).kind == 'f')

When a float raster with a non-NaN nodata sentinel is read with mask_nodata=False, the eager path skips the sentinel-to-NaN replacement (xrspatial/geotiff/__init__.py:575), the optional dtype cast doesn't change the kind (__init__.py:616), and _set_nodata_attrs is then called with the float dtype (__init__.py:624). The result: the buffer still holds literal sentinel values like -9999, but attrs['masked_nodata'] says True. Anything downstream that trusts the attr ("NaN means missing, sentinels have been replaced") treats -9999 pixels as valid data.

The dask, GPU, and GPU+dask paths follow the same dtype-only pattern. VRT inlines float NaN-masking unconditionally so its dtype-driven attr happens to match buffer state.

Repro

import numpy as np
import xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff

src = xr.DataArray(
    np.array([[1.0, 2.0, -9999.0], [4.0, -9999.0, 6.0]], dtype=np.float32),
    coords={'y': np.array([0.5, 1.5]), 'x': np.array([0.5, 1.5, 2.5])},
    dims=('y', 'x'),
    attrs={'nodata': -9999.0},
)
to_geotiff(src, 'with_nodata.tif')

out = open_geotiff('with_nodata.tif', mask_nodata=False)
print(out.values)               # [[1, 2, -9999], [4, -9999, 6]] -- literal
print(out.attrs['masked_nodata'])  # True -- WRONG

Expected behavior

attrs['masked_nodata'] should be True iff the reader actually replaced sentinel pixels with NaN (or the buffer is NaN-aware as a result). With mask_nodata=False, the function did not mask, so the attr should be False.

Fix

Thread an explicit masked: bool argument through _set_nodata_attrs and have every read path compute it from the actual masking decision instead of inferring from dtype. For the eager / dask / GPU paths the rule is masked = mask_nodata AND final_dtype.kind == 'f'. For VRT the inline NaN-masking on float sources runs regardless of mask_nodata, so the existing dtype-driven rule stays correct there.

Seven call sites need the update: __init__.py:624, _backends/dask.py:328, _backends/gpu.py:426 / :745 / :1214, _backends/vrt.py:323 / :684.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions