Skip to content

geotiff: raise on unknown VRT dataType; support UInt64/Int64 (#1783)#1789

Merged
brendancol merged 2 commits into
mainfrom
issue-1783
May 13, 2026
Merged

geotiff: raise on unknown VRT dataType; support UInt64/Int64 (#1783)#1789
brendancol merged 2 commits into
mainfrom
issue-1783

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1783.

Summary

  • xrspatial/geotiff/_vrt.py resolved <VRTRasterBand dataType="..."> with _DTYPE_MAP.get(name, np.float32). Any unknown name fell back to Float32 with no warning. Complex types (CInt16, CInt32, CFloat32, CFloat64) silently dropped the imaginary component, 64-bit integer types lost precision, and typos produced wrong values.
  • Added UInt64 -> np.uint64 and Int64 -> np.int64 to _DTYPE_MAP (GDAL 3.5+ names, round-trip cleanly through numpy).
  • Split the resolution in parse_vrt: a missing dataType attribute still defaults to Float32 (GDAL's documented default, no behaviour change). An attribute that is present but not in _DTYPE_MAP now raises ValueError naming the band, the offending dataType, and the supported set. The message also flags complex types as unsupported.

Test plan

  • pytest -xvs xrspatial/geotiff/tests/test_vrt_dtype_1783.py (11 tests pass)
  • pytest xrspatial/geotiff/tests/ -k vrt (261 pass, the only deselected failure on this worktree is test_size_param_validation_gpu_vrt_1776::test_tile_size_positive_works which is pre-existing and unrelated -- broken by the tile_size-multiple-of-16 requirement merged in geotiff: require tile_size to be a multiple of 16 (#1767) #1781)
  • CInt16 / CInt32 / CFloat32 / CFloat64 each raise with a message naming the band, the dataType, and "complex"
  • Garbage and Flaot32 (typo) raise
  • UInt64 round-trips 2**64 - 1 losslessly; Int64 round-trips iinfo(int64).min / max
  • A <VRTRasterBand> with no dataType attribute still reads as Float32

Unknown ``<VRTRasterBand dataType="...">`` values used to fall back to
Float32 via ``_DTYPE_MAP.get(name, np.float32)``. Complex types
(CInt16, CInt32, CFloat32, CFloat64) silently lost the imaginary
component, 64-bit integer types lost precision, and typos produced
wrong values.

* Add ``UInt64`` -> ``np.uint64`` and ``Int64`` -> ``np.int64`` to
  ``_DTYPE_MAP`` (GDAL 3.5+ names, lossless round-trip).
* In ``parse_vrt``, distinguish a missing ``dataType`` attribute
  (still defaults to ``Float32`` per the GDAL spec) from one that is
  present but unsupported. The latter raises ``ValueError`` naming
  the band, the offending dataType, and the supported types, and
  notes that complex types are explicitly unsupported.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 13:36
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens VRT dtype handling in the GeoTIFF VRT reader to prevent silent data corruption when a VRT declares an unsupported or misspelled VRTRasterBand dataType, and extends dtype support to 64-bit integers.

Changes:

  • Add UInt64 and Int64 to the VRT dataType → NumPy dtype mapping.
  • In parse_vrt, keep the GDAL-default behavior for missing dataType, but raise ValueError when dataType is present and unknown/unsupported (including complex types).
  • Add regression tests covering: complex dtype rejection, garbage/typo rejection, UInt64/Int64 round-trips, and the missing-attribute default.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
xrspatial/geotiff/_vrt.py Adds 64-bit integer dtype support and changes dataType resolution to raise on unknown/unsupported values.
xrspatial/geotiff/tests/test_vrt_dtype_1783.py New regression tests validating the new dtype resolution behavior and 64-bit integer support.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/_vrt.py Outdated
Comment on lines 310 to 312
dtype = np.dtype(_DTYPE_MAP[dtype_name])
nodata_str = _text(band_elem, 'NoDataValue')
nodata = float(nodata_str) if nodata_str else None
Comment thread xrspatial/geotiff/_vrt.py Outdated
return _xml_quoteattr(str(value))

# Lazy imports to avoid circular dependency
# Lazy imports to avoid circular dependency.
Addresses Copilot review feedback on PR #1789. Now that UInt64 / Int64
are supported VRT dataTypes, parsing <NoDataValue> as float silently
loses precision for large integer sentinels. 2**64-1 rounds up to
2**64 in float64, which then fails the iinfo range check in the
downstream masking pipeline and the mask never fires; the sentinel
survives in the output as a 1.8e19 float that looks like real data.

* New helper _parse_band_nodata(text, dtype) in _vrt.py parses
  integer-band sentinels as int and falls back to float only when
  the value is non-integer or out of range for the dtype. Used for
  both <NoDataValue> at the band level and <NODATA> on each source.
* _Source / _VRTBand .nodata typed as Union[int, float, None].
* _sentinel_for_dtype in __init__.py grows a fast int path so an int
  sentinel does not go through float() and lose precision at the
  64-bit extremes.
* Integer-source-to-float-band masking branch in _vrt.read_vrt
  accepts int sentinels directly.
* Removed the stale "Lazy imports to avoid circular dependency"
  comment above _DTYPE_MAP (it is a dtype map, not imports).
* Tests in test_vrt_dtype_1783.py cover UInt64 max, INT64_MIN,
  Int32 -9999, Float32 nan, Float64 scientific notation, and an
  end-to-end masking test that puts 2**64-1 in the data and
  verifies the result is NaN.
@brendancol brendancol merged commit 735320c into main May 13, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: VRT unknown dataType silently maps to float32

2 participants