Skip to content

geotiff: tighten no-georef sentinel for integer coords (#2087)#2091

Merged
brendancol merged 2 commits into
mainfrom
issue-2087
May 19, 2026
Merged

geotiff: tighten no-georef sentinel for integer coords (#2087)#2091
brendancol merged 2 commits into
mainfrom
issue-2087

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Tighten the writer's no-georef sentinel from "any integer dtype on either axis" to the exact reader pattern: int64 ascending contiguous-step-1 arange on both axes.
  • User-authored integer-coord projected grids like x=[100,101,102], y=[200,199] no longer silently lose their transform. They get a real transform written, or raise NonUniformCoordsError.
  • The legitimate no-georef round-trip is preserved. The reader's np.arange(start, stop, dtype=int64) placeholder still matches on both axes, including for windowed reads.

Background

coords_to_transform at xrspatial/geotiff/_coords.py:353 returned None whenever either x or y had an integer dtype. The fail-closed validator at _coords.py:272 mirrored that. The intent was to round-trip the read-side np.arange(N, dtype=int64) placeholder that coords_from_pixel_geometry emits for files with no GeoTIFF transform tags (#1710, #1753, #1949). The dtype check was too broad: any user-authored projected grid with integer-spaced coords matched and lost its georef silently.

Repro from the original report

da = xr.DataArray(
    np.zeros((2, 3), dtype=np.float32),
    coords={'y': np.array([200, 199]), 'x': np.array([100, 101, 102])},
    dims=('y', 'x'),
)
to_geotiff(da, 'out.tif')
out = open_geotiff('out.tif')
# Before: out.coords['x'] == [0, 1, 2], out.attrs['transform'] is None
# After:  out.coords['x'] == [100.0, 101.0, 102.0], transform present

Sentinel logic

Both call sites now route through a single _is_no_georef_sentinel(coord) helper:

coord.dtype == np.int64
  AND len(coord) >= 1
  AND np.array_equal(coord, np.arange(coord[0], coord[0] + len(coord), dtype=int64))

Both axes must match for the no-georef path to fire. Mixed dtypes (one int64-arange, one float), non-int64 int dtypes (int32, int16, uint32), descending coords, non-unit step, and non-uniform spacing all fall through.

Trade-offs

  • A user who authors a DataArray whose coords match the reader pattern exactly (int64 ascending step 1 on both axes, no attrs['transform'] set) still gets no-georef treatment. Documented in the helper docstring; the explicit attrs['transform'] escape hatch still works.
  • A user who subsamples (da.isel(x=slice(None, None, 2))) or coarsens a no-georef array no longer falls under the sentinel. The resulting step-2 int64 array gets a real transform written. Coord values round-trip exactly; only the dtype flips on subsequent reads.

Backend coverage

All four writer backends (numpy, dask, cupy, dask+cupy) call coords_to_transform and the validator through the same code path. The fix lives in _coords.py so every backend gets it.

Closes #2087.

Test plan

  • pytest xrspatial/geotiff/tests/test_int_coord_sentinel_2087.py -- 17 new cases pass
  • pytest xrspatial/geotiff/tests/test_no_georef_writer_round_trip_1949.py -- 13 pass; 5 cases that pinned the old over-broad sentinel were updated to the new contract
  • pytest xrspatial/geotiff/tests/ -k "coord or georef or transform or roundtrip" -- 513 pass, no new failures

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
brendancol added a commit that referenced this pull request May 19, 2026
Self-review nits on PR #2091:
- Remove the unused ``import io`` from
  test_int_coord_sentinel_2087.py.
- Rename
  ``test_user_authored_ascending_int_y_writes_real_transform`` to
  ``test_both_axes_ascending_int64_step1_trade_off``. The old name
  suggested the test verified a real transform was written, but the
  body actually pins the documented trade-off corner where both
  axes match the read-side arange pattern and the sentinel
  resolves to no-georef. The new name and comment line up with
  that intent.
The writer's no-georef sentinel treated any integer dtype on either
x or y as the read-side ``np.arange(N, dtype=int64)`` placeholder,
which also caught user-authored projected grids with integer-spaced
coords. ``x=[100,101,102], y=[200,199]`` wrote with no transform
tags and read back as pixel coords with no georef -- silent
geospatial corruption.

Tighten the sentinel to the exact reader pattern: ``int64``
ascending contiguous-step-1 arange, on both axes. Anything else
(descending, non-unit step, non-uniform spacing, non-``int64`` int
dtypes) falls through to the existing float-coord path, which
either synthesises a real transform or raises
``NonUniformCoordsError``.

Both ``_coords.py:272`` (validator) and ``_coords.py:353``
(``coords_to_transform``) now route through a single
``_is_no_georef_sentinel`` helper.

Coord values round-trip exactly through the new path; dtype flips
int->float on subsequent reads because the file carries a real
transform. The legitimate no-georef round-trip
(``np.arange(N, dtype=int64)`` on both axes, including windowed
reads that start at non-zero) is preserved -- both axes still
match the sentinel.

The five tests in test_no_georef_writer_round_trip_1949.py that
pinned the old over-broad sentinel are updated to reflect the new
behaviour. Adds test_int_coord_sentinel_2087.py with 17 cases
covering the sentinel helper, user-authored int grids, mixed
dtypes, non-uniform spacing, and windowed-read round-trips.
Self-review nits on PR #2091:
- Remove the unused ``import io`` from
  test_int_coord_sentinel_2087.py.
- Rename
  ``test_user_authored_ascending_int_y_writes_real_transform`` to
  ``test_both_axes_ascending_int64_step1_trade_off``. The old name
  suggested the test verified a real transform was written, but the
  body actually pins the documented trade-off corner where both
  axes match the read-side arange pattern and the sentinel
  resolves to no-georef. The new name and comment line up with
  that intent.
@brendancol brendancol merged commit 11f7343 into main May 19, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: integer spatial coords silently strip georef on write

1 participant