A round-trip of a no-georef TIFF through open_geotiff -> to_geotiff -> open_geotiff silently injects a synthetic attrs['transform'] and switches the y/x coord dtype from int64 to float64.
The read side documents the no-georef contract: integer pixel coords [0, 1, 2, ...] with int64 dtype, no attrs['transform'] (#1710, #1753). The write side does not honour the same signal -- _coords_to_transform infers a unit GeoTransform(origin_x=-0.5, origin_y=-0.5, pixel_width=1.0, pixel_height=1.0) from [0, 1, 2, ...] coords and writes it as real ModelPixelScale / ModelTiepoint tags. On the next read the file looks like a georeferenced raster and the no-georef branch never fires.
Reproduction
import numpy as np
import tempfile, os
import tifffile
from xrspatial.geotiff import open_geotiff, to_geotiff
with tempfile.TemporaryDirectory() as td:
src = os.path.join(td, "src.tif")
arr = np.arange(20, dtype=np.float32).reshape(4, 5)
tifffile.imwrite(src, arr, photometric='minisblack', planarconfig='contig')
da = open_geotiff(src)
print(da.coords['y'].dtype, da.coords['y'].values)
# int64 [0 1 2 3]
print('transform' in da.attrs)
# False
out = os.path.join(td, "out.tif")
to_geotiff(da, out, compression='none')
da2 = open_geotiff(out)
print(da2.coords['y'].dtype, da2.coords['y'].values)
# float64 [0. 1. 2. 3.]
print(da2.attrs.get('transform'))
# (1.0, 0.0, -0.5, 0.0, 1.0, -0.5)
Impact
- Cat 2 (coords): coord dtype silently changes between input and output (HIGH per sweep criteria).
- Cat 1 (attrs): writer adds an inferred
transform attr the input never had (MEDIUM).
- Downstream code that branches on
'transform' in da.attrs to detect georef status will mis-classify the round-tripped file as georeferenced.
Cause
xrspatial/geotiff/_writers/eager.py:519, _writers/eager.py:839, _writers/gpu.py:350 all fall back to _coords_to_transform(data) when attrs.get('transform') is missing. _coords_to_transform (_coords.py:235) reads da.coords[xdim].values, computes pixel_width = float(x[1] - x[0]), and returns GeoTransform(...) -- with no check for the integer-dtype signal that the read side uses to mean "no georef".
Affects every writer path: to_geotiff eager, to_geotiff dask streaming (via the same fallback), to_geotiff(vrt=...) per-tile, write_geotiff_gpu. The GPU dask+cupy writer goes through the same helper.
Proposed fix
_coords_to_transform returns None when either spatial coord array has an integer dtype (which only happens from the read-side no-georef fallback). Float coord arrays continue to produce a GeoTransform. Callers already handle None correctly: write() skips the geokey emission when geo_transform is None.
A round-trip of a no-georef TIFF through
open_geotiff->to_geotiff->open_geotiffsilently injects a syntheticattrs['transform']and switches the y/x coord dtype fromint64tofloat64.The read side documents the no-georef contract: integer pixel coords
[0, 1, 2, ...]withint64dtype, noattrs['transform'](#1710, #1753). The write side does not honour the same signal --_coords_to_transforminfers a unitGeoTransform(origin_x=-0.5, origin_y=-0.5, pixel_width=1.0, pixel_height=1.0)from[0, 1, 2, ...]coords and writes it as realModelPixelScale/ModelTiepointtags. On the next read the file looks like a georeferenced raster and the no-georef branch never fires.Reproduction
Impact
transformattr the input never had (MEDIUM).'transform' in da.attrsto detect georef status will mis-classify the round-tripped file as georeferenced.Cause
xrspatial/geotiff/_writers/eager.py:519,_writers/eager.py:839,_writers/gpu.py:350all fall back to_coords_to_transform(data)whenattrs.get('transform')is missing._coords_to_transform(_coords.py:235) readsda.coords[xdim].values, computespixel_width = float(x[1] - x[0]), and returnsGeoTransform(...)-- with no check for the integer-dtype signal that the read side uses to mean "no georef".Affects every writer path:
to_geotiffeager,to_geotiffdask streaming (via the same fallback),to_geotiff(vrt=...)per-tile,write_geotiff_gpu. The GPU dask+cupy writer goes through the same helper.Proposed fix
_coords_to_transformreturnsNonewhen either spatial coord array has an integer dtype (which only happens from the read-side no-georef fallback). Float coord arrays continue to produce aGeoTransform. Callers already handleNonecorrectly:write()skips the geokey emission whengeo_transform is None.