Skip to content

Bug: integer spatial coords silently strip georef on write #2087

@brendancol

Description

@brendancol

Describe the bug

A DataArray with integer-spaced x/y coords (e.g. x=[100,101,102], y=[200,199]) writes through to_geotiff with no transform tags and reads back as pixel coords x=[0,1,2], y=[0,1] with no georef. The projection metadata is silently lost.

The cause is coords_to_transform at xrspatial/geotiff/_coords.py:353, which returns None for any integer x or y dtype. The fail-closed validator at _coords.py:272 mirrors that, so the writer doesn't raise either. Both branches were intended as a no-georef sentinel for files round-tripped through open_geotiff (which emits np.arange(N, dtype=int64) for x/y when the source file has no GeoTIFF transform tags, #1710 / #1753 / #1949). The sentinel is too broad: integer dtype on its own catches both the read-side arange placeholder and any user-authored projected grid that happens to use integer-spaced coords.

Repro

import numpy as np
import xarray as xr
from xrspatial.geotiff import to_geotiff, open_geotiff

da = xr.DataArray(
    np.zeros((2, 3), dtype=np.float32),
    coords={'y': np.array([200, 199]), 'x': np.array([100, 101, 102])},
    dims=('y', 'x'),
)
to_geotiff(da, 'out.tif')

out = open_geotiff('out.tif')
print(out.coords['x'].values)  # [0 1 2] -- georef lost
print(out.coords['y'].values)  # [0 1]
print(out.attrs.get('transform'))  # None

Expected behavior

Either the projected grid round-trips with its coords intact, or the write raises so the caller knows the georef is being dropped. Silent loss of the transform is the worst outcome.

Fix

Tighten the sentinel to match exactly what the reader emits: dtype is int64 AND np.array_equal(coord, np.arange(len(coord), dtype=int64)). Only that pattern is treated as no-georef. Anything else (user-authored [100,101,102], subset [2,3,4], subsampled [0,2,4], non-uniform [1,2,5]) falls through to the existing float-coord path, which either synthesizes a real transform from the spacing or raises NonUniformCoordsError.

Two _coords.py sites need the change: the validator at :272 and coords_to_transform at :353.

Trade-off

Users who subset or subsample a no-georef DataArray and write it will get a file with a real transform where they previously got a no-georef file. Coord values round-trip exactly; only the dtype flips int→float on subsequent reads. That's a behavior change but a more honest one — the subset has a well-defined origin offset that should be preserved.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginput-validationInput validation and error messages

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions