Describe the bug
When a VRT XML declares dataType="Float32" or "Float64" but the underlying source GeoTIFFs are integer with an in-range GDAL_NODATA sentinel, read_vrt leaves the sentinel pixels as literal float values instead of NaN-masking them. The result is a float array whose attrs['nodata'] advertises 65535.0, but pixels at the sentinel positions still contain 65535.0 rather than np.nan. NaN-aware downstream code (np.isnan, xr.where, slope/aspect, statistics) sees those pixels as valid data and gets wrong answers.
Repro
import numpy as np, tempfile, os
from xrspatial.geotiff import read_vrt
from xrspatial.geotiff._writer import write
band0 = np.array([[1, 2], [3, 65535]], dtype=np.uint16)
with tempfile.TemporaryDirectory() as td:
p0 = os.path.join(td, 'b0.tif')
write(band0, p0, nodata=65535, compression='none', tiled=False)
vrt = os.path.join(td, 'mismatch.vrt')
with open(vrt, 'w') as f:
f.write(f'''<VRTDataset rasterXSize="2" rasterYSize="2">
<GeoTransform>0.0, 1.0, 0.0, 0.0, 0.0, -1.0</GeoTransform>
<VRTRasterBand dataType="Float32" band="1">
<NoDataValue>65535</NoDataValue>
<SimpleSource>
<SourceFilename relativeToVRT="0">{p0}</SourceFilename>
<SourceBand>1</SourceBand>
<SrcRect xOff="0" yOff="0" xSize="2" ySize="2"/>
<DstRect xOff="0" yOff="0" xSize="2" ySize="2"/>
</SimpleSource>
</VRTRasterBand>
</VRTDataset>''')
r = read_vrt(vrt)
print(r.dtype) # float32
print(r.attrs['nodata']) # 65535.0
print(r.values[1, 1]) # 65535.0 -- should be nan
print(np.isnan(r.values[1, 1])) # False
Expected behavior
r.values[1, 1] should be np.nan after the read, matching the eager and dask paths for single-file int rasters with an in-range sentinel. attrs['nodata'] should still surface the original sentinel for downstream code that needs to round-trip on write.
Root cause
_vrt.read_vrt in xrspatial/geotiff/_vrt.py NaN-masks src_arr only when src_arr.dtype.kind == 'f' (lines 371-374). When the source is integer with a sentinel, the int sentinel pixels are assigned into the float result buffer at lines 393-398, where the cast turns them into literal float sentinel values. The outer read_vrt in xrspatial/geotiff/__init__.py then runs its int-to-float promotion only when arr.dtype.kind in ('u', 'i') (line 3059), so a float-typed VRT result never gets its sentinel pixels masked.
Int source + int (or wider int) dataType works fine because the outer promotion fires. Float source files get masked inline. Only the int source + float dataType combination leaks the sentinel.
Scope
- numpy eager (
read_vrt(...)): broken
- dask (
read_vrt(..., chunks=...)): broken; chunking happens after the eager decode so the bug propagates
- gpu (
read_vrt(..., gpu=True)): broken; cupy upload happens after the eager decode
- write side: unaffected
Suggested fix
In _vrt.read_vrt, NaN-mask integer source arrays before placing them into the float result. Use the same in-range gating the outer read_vrt applies so out-of-range sentinels stay no-op rather than tripping OverflowError.
Describe the bug
When a VRT XML declares
dataType="Float32"or"Float64"but the underlying source GeoTIFFs are integer with an in-rangeGDAL_NODATAsentinel,read_vrtleaves the sentinel pixels as literal float values instead of NaN-masking them. The result is a float array whoseattrs['nodata']advertises65535.0, but pixels at the sentinel positions still contain65535.0rather thannp.nan. NaN-aware downstream code (np.isnan,xr.where, slope/aspect, statistics) sees those pixels as valid data and gets wrong answers.Repro
Expected behavior
r.values[1, 1]should benp.nanafter the read, matching the eager and dask paths for single-file int rasters with an in-range sentinel.attrs['nodata']should still surface the original sentinel for downstream code that needs to round-trip on write.Root cause
_vrt.read_vrtinxrspatial/geotiff/_vrt.pyNaN-maskssrc_arronly whensrc_arr.dtype.kind == 'f'(lines 371-374). When the source is integer with a sentinel, the int sentinel pixels are assigned into the floatresultbuffer at lines 393-398, where the cast turns them into literal float sentinel values. The outerread_vrtinxrspatial/geotiff/__init__.pythen runs its int-to-float promotion only whenarr.dtype.kind in ('u', 'i')(line 3059), so a float-typed VRT result never gets its sentinel pixels masked.Int source + int (or wider int) dataType works fine because the outer promotion fires. Float source files get masked inline. Only the int source + float dataType combination leaks the sentinel.
Scope
read_vrt(...)): brokenread_vrt(..., chunks=...)): broken; chunking happens after the eager decode so the bug propagatesread_vrt(..., gpu=True)): broken; cupy upload happens after the eager decodeSuggested fix
In
_vrt.read_vrt, NaN-mask integer source arrays before placing them into the float result. Use the same in-range gating the outerread_vrtapplies so out-of-range sentinels stay no-op rather than trippingOverflowError.