Skip to content

geotiff: finish migration from integer coord sentinel to explicit no-georef attr #2133

@brendancol

Description

@brendancol

Background

The geotiff round-trip used to detect "this file has no GeoTIFF transform tags" by looking at coord shape: int64 ascending step-1 on both y/x axes meant "no georef" and the writer skipped transform synthesis. That heuristic was tightened in #2087 and then replaced in #2120 (commit 34cc415) with an explicit attr attrs['_xrspatial_no_georef'] = True.

This issue tracks the rest of the migration: one validator still infers no-georef from coord dtype, the shape helper is now test-only, and the round-trip test matrix could pin a couple more edge cases.

What 34cc415 already did

  • Added _NO_GEOREF_KEY = '_xrspatial_no_georef' in xrspatial/geotiff/_geotags.py.
  • Stamped the marker on every read path that emits placeholder coords:
    • CPU reader: _populate_attrs_from_geo_info in xrspatial/geotiff/_attrs.py (else branch when geo_info.transform is None).
    • VRT reader: both eager and chunked no-transform branches in xrspatial/geotiff/_backends/vrt.py.
    • GPU / dask readers route through _populate_attrs_from_geo_info so they pick the marker up for free.
  • Switched the writer-side detection to _has_no_georef_marker(da) in two places in xrspatial/geotiff/_coords.py:
    • coords_to_transform (return None instead of synthesising a fake unit transform).
    • require_transform_for_georeferenced (skip the fail-closed guard).
  • Kept _is_no_georef_sentinel in the module as a diagnostic but wired it out of the writer path.
  • Added test_no_georef_marker_2120.py and updated test_int_coord_sentinel_2087.py and test_no_georef_writer_round_trip_1949.py to pin the new contract.

What is still fragile

_check_write_non_uniform_coords in _validation.py

The validator at line 755 skips its uniformity check whenever a coord has integer dtype, with the comment "those coords are the 0..N-1 fallback the no-georef reader emits". After #2120 that exemption is wrong for the same reason the writer's old shape check was wrong: a user-authored int-coord grid (the exact case #2087 fixed) now bypasses uniformity validation. A user with int coords on a non-uniform grid will silently write a misrepresented transform.

The validator should gate the exemption on _has_no_georef_marker(da), not on dtype.kind. Float coords with the marker set should also be exempt for symmetry with the writer.

_is_no_georef_sentinel is dead on the production path

It only survives because test_int_coord_sentinel_2087.py still references it as a predicate. Two options:

  1. Drop the helper and rewrite the tests to check _has_no_georef_marker instead.
  2. Move the helper into the test file as a local fixture.

I lean toward option 1. The helper's docstring already says it is test-only, and keeping public-module dead code makes it easier for someone to plug it back into a write path by accident.

Attribute name and the georef_status proposal

A separate issue is being filed to propose a richer attrs['georef_status'] enum ('georef' | 'no_georef' | 'rotated' | ...) that would replace ad-hoc booleans across the geotiff module. If that lands first, the implementation here should consume georef_status == 'no_georef' rather than the dedicated _xrspatial_no_georef flag, and _has_no_georef_marker should be the single point of indirection.

If this issue lands first, keep _xrspatial_no_georef as an underscore-prefixed internal contract and migrate it under the georef_status umbrella later. Either way, callers should never read _xrspatial_no_georef directly outside the geotiff module.

Read paths to audit

Each path that emits placeholder y/x coords must stamp the marker. 34cc415 covered the three known paths; double-check that no new reader has appeared since:

  • xrspatial/geotiff/_attrs.py: _populate_attrs_from_geo_info else branch.
  • xrspatial/geotiff/_backends/vrt.py: read_vrt and _read_vrt_chunked no-transform branches.

GPU (_backends/gpu.py) and dask (_backends/dask.py) readers both call _populate_attrs_from_geo_info, so they inherit the marker. A unit test that opens a no-georef fixture through each backend and asserts the marker is present would catch any future regression.

Write paths to audit

Each writer must consult the marker only, never coord shape:

  • xrspatial/geotiff/_writers/eager.py: to_geotiff (line 565-568) and the VRT per-tile write (line 946-948). Both call _coords_to_transform(data), which already checks the marker.
  • xrspatial/geotiff/_writers/gpu.py: write_geotiff_gpu (line 425-427). Same path.
  • xrspatial/geotiff/_writer.py low-level write accepts a pre-computed geo_transform, so the marker decision happens upstream.

The validator fix above is the missing piece.

Round-trip tests to add

  • A user grid with int coords on a non-uniform spacing should raise NonUniformCoordsError even though the dtype is int. Today it silently writes a misrepresented transform via the validator exemption.
  • A no-georef file opened through each of the four backends (CPU eager, CPU dask, GPU eager, GPU dask) carries the marker.
  • Marker survives a da.copy() / da.assign_attrs(...) chain. xarray attr-propagation defaults sometimes drop underscore-prefixed keys, so this is worth pinning explicitly.
  • Marker survives a to_geotiff(da, path) followed by open_geotiff(path) on a hand-built no-georef array (already covered by test_marker_on_user_grid_skips_transform_synthesis in geotiff: to_geotiff silently strips georef on int64 step-1 user coords #2120; add a corresponding 3D test).

Parallel with PR #2127 / commit 478dbd5

PR #2127 threaded the masked_nodata decision through _set_nodata_attrs so the writer reads an explicit attr instead of inferring intent from dtype. Same pattern applies here: the no-georef decision lives on attrs[_NO_GEOREF_KEY], the writer reads that attr only, and the validator should follow suit.

Both attrs share an invariant worth documenting in docs/source/user_guide/attrs_contract.rst: they are read-side-stamped, internal, and is True identity-checked so a stray third-party stamp cannot flip behaviour.

Acceptance criteria

  • _check_write_non_uniform_coords checks the marker, not dtype.
  • _is_no_georef_sentinel removed from _coords.py; tests rewritten to use _has_no_georef_marker or a local predicate.
  • New test: user int-coord grid with non-uniform spacing raises NonUniformCoordsError.
  • New test: marker is present after reading a no-georef fixture through each backend.
  • Decision recorded on the georef_status issue about which attr is canonical long-term.

Follow-up to #2120.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencyenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions