Summary
read_geotiff_dask declares the output dtype as float64 for integer rasters paired with an in-range nodata sentinel, but per-chunk dtype handling in _delayed_read_window only promotes a chunk to float64 when that chunk actually contains a sentinel pixel. When the sentinel falls in a non-first chunk, dask preallocates the concatenated output from the first chunk's dtype (uint16), then casts subsequent float64 chunks back to uint16, replacing NaN with 0 and emitting RuntimeWarning: invalid value encountered in cast.
Net effect: the declared float64 array silently becomes uint16 at compute time, and nodata pixels become 0 instead of NaN. The numpy eager path is correct.
Repro
import numpy as np, tempfile, os
from xrspatial.geotiff import open_geotiff
from xrspatial.geotiff._writer import write
with tempfile.TemporaryDirectory() as d:
path = os.path.join(d, 't.tif')
arr = np.arange(64, dtype=np.uint16).reshape(8, 8) + 1
arr[6:8, 6:8] = 65535 # sentinel only in bottom-right
write(arr, path, nodata=65535, compression='none', tiled=False)
eager = open_geotiff(path)
dk = open_geotiff(path, chunks=4)
r = dk.compute()
print(eager.dtype, np.isnan(eager.values).sum()) # float64 4
print(dk.dtype, r.dtype, (r.values[6:8, 6:8] == 0).all()) # float64 uint16 True
Root cause
In xrspatial/geotiff/__init__.py:
read_geotiff_dask computes effective_dtype = float64 for masked-int rasters (line 1510 onward) and declares each dask block with dtype=target_dtype (line 1654).
_delayed_read_window (line 1714) only calls arr.astype(np.float64) inside if mask.any():; if no sentinel pixel is in the chunk, arr stays at the file's integer dtype.
- The per-chunk
arr.astype(target_dtype) cast only runs when the user passed an explicit dtype= kwarg (the caller threads target_dtype=target_dtype if dtype is not None else None, line 1650), so the float promotion is not enforced on chunks the mask missed.
Proposed fix
Always cast the chunk to the resolved effective_dtype (float64 for masked-int paths) before returning, regardless of whether that chunk's mask hit. Thread the effective dtype unconditionally through _delayed_read_window. The optional out-of-range guard already preserved at line 1511 keeps integer-dtype output when the sentinel can never match.
Scope
Categories: 4 (dtype/nodata semantics), 5 (backend-inconsistent metadata: eager numpy and GPU promote correctly; dask path does not).
Severity: HIGH -- silent NaN -> 0 conversion in real masked rasters.
Summary
read_geotiff_daskdeclares the output dtype asfloat64for integer rasters paired with an in-range nodata sentinel, but per-chunk dtype handling in_delayed_read_windowonly promotes a chunk tofloat64when that chunk actually contains a sentinel pixel. When the sentinel falls in a non-first chunk, dask preallocates the concatenated output from the first chunk's dtype (uint16), then casts subsequentfloat64chunks back touint16, replacingNaNwith0and emittingRuntimeWarning: invalid value encountered in cast.Net effect: the declared
float64array silently becomesuint16at compute time, and nodata pixels become0instead ofNaN. The numpy eager path is correct.Repro
Root cause
In
xrspatial/geotiff/__init__.py:read_geotiff_daskcomputeseffective_dtype = float64for masked-int rasters (line 1510 onward) and declares each dask block withdtype=target_dtype(line 1654)._delayed_read_window(line 1714) only callsarr.astype(np.float64)insideif mask.any():; if no sentinel pixel is in the chunk,arrstays at the file's integer dtype.arr.astype(target_dtype)cast only runs when the user passed an explicitdtype=kwarg (the caller threadstarget_dtype=target_dtype if dtype is not None else None, line 1650), so the float promotion is not enforced on chunks the mask missed.Proposed fix
Always cast the chunk to the resolved
effective_dtype(float64for masked-int paths) before returning, regardless of whether that chunk's mask hit. Thread the effective dtype unconditionally through_delayed_read_window. The optional out-of-range guard already preserved at line 1511 keeps integer-dtype output when the sentinel can never match.Scope
Categories: 4 (dtype/nodata semantics), 5 (backend-inconsistent metadata: eager numpy and GPU promote correctly; dask path does not).
Severity: HIGH -- silent NaN -> 0 conversion in real masked rasters.