Skip to content

geotiff: read_vrt silently drops SimpleSource <NODATA>0</NODATA> #1655

@brendancol

Description

@brendancol

Summary

xrspatial.geotiff._vrt.read_vrt silently treats <NODATA>0</NODATA> on a SimpleSource as if the element were absent. Pixels equal to 0.0 in the source file are returned as valid data instead of being masked to NaN. The root cause is the or truthiness fallback at line 370 of _vrt.py:

src_nodata = src.nodata or nodata

Python evaluates 0.0 or nodata to nodata because 0.0 is falsy, so a SimpleSource that declares <NODATA>0</NODATA> is replaced by the band-level <NoDataValue> (or None if there isn't one).

The in-code comment acknowledges the quirk:

# ``src.nodata or nodata`` is kept for backward compatibility but
# intentionally treats ``0.0`` as unset (a long-standing quirk of this reader).

But the resulting behavior is silently wrong for any VRT that pairs sources with sentinel 0.0 (a common convention for unsigned imagery where 0 marks "no data").

Reproduction

import numpy as np
import tempfile, os
from xrspatial.geotiff._writer import write
from xrspatial.geotiff._geotags import GeoTransform
from xrspatial.geotiff._vrt import read_vrt

tmp = tempfile.mkdtemp()
arr = np.array([[1.0, 0.0, 3.0, 0.0]], dtype=np.float32)
src = os.path.join(tmp, "src.tif")
write(arr, src, geo_transform=GeoTransform(0, 0, 1, -1), crs_epsg=4326,
      compression='none', tiled=False)

vrt_xml = f'''<VRTDataset rasterXSize="4" rasterYSize="1">
  <SRS>EPSG:4326</SRS>
  <GeoTransform>0.0, 1.0, 0.0, 0.0, 0.0, -1.0</GeoTransform>
  <VRTRasterBand dataType="Float32" band="1">
    <SimpleSource>
      <SourceFilename relativeToVRT="0">{src}</SourceFilename>
      <SourceBand>1</SourceBand>
      <SrcRect xOff="0" yOff="0" xSize="4" ySize="1"/>
      <DstRect xOff="0" yOff="0" xSize="4" ySize="1"/>
      <NODATA>0.0</NODATA>
    </SimpleSource>
  </VRTRasterBand>
</VRTDataset>
'''
vrt_path = os.path.join(tmp, "test.vrt")
with open(vrt_path, 'w') as f:
    f.write(vrt_xml)

result, _ = read_vrt(vrt_path)
# Expected: [[1.0, nan, 3.0, nan]]
# Actual:   [[1.0, 0.0, 3.0, 0.0]]
print(result)
print(f"NaN count: {np.isnan(result).sum()}, expected: 2")

Severity

Medium. Silent data corruption only when (a) <NODATA> on the SimpleSource is 0 (any cast equivalent), (b) the band has no <NoDataValue> of its own, or the band-level sentinel differs from 0. Datasets that declare both fall back to the band-level value and look correct, masking the bug from most existing tests.

Suggested fix

Replace the or truthiness shortcut with an explicit None check so legitimate 0.0 sentinels survive:

src_nodata = src.nodata if src.nodata is not None else nodata

Same change in the integer branch a few lines down. Add a regression test that exercises the <NODATA>0</NODATA> case.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions