Skip to content

geotiff: canonical georef_status attribute to disambiguate CRS vs transform presence #2136

@brendancol

Description

@brendancol

Problem

The reader currently carries three independent metadata signals and downstream code has to reconcile them by hand:

  • attrs['crs'] / attrs['crs_wkt'] from CRS-related GeoKeys
  • attrs['transform'] from ModelPixelScale / ModelTiepoint / ModelTransformationTag
  • attrs[_NO_GEOREF_KEY] (= '_xrspatial_no_georef'), stamped by _populate_attrs_from_geo_info in xrspatial/geotiff/_attrs.py when no transform tags are present

A raster can have CRS but no transform, transform but no CRS, both, or neither. Consumers today have to infer the state from absence-or-presence of three keys plus a sentinel attr, and the mapping is not 1:1 with what the reader actually saw:

  1. _populate_attrs_from_geo_info (_attrs.py:381-394) drops attrs['transform'] and stamps attrs[_NO_GEOREF_KEY] = True when has_georef=False. But has_georef=False is also what _extract_transform(..., allow_rotated=True) returns for a rotated ModelTransformationTag (_geotags.py:73-101, per geotiff: honour allow_rotated for rotated ModelTransformationTag #2116), so the rotated-but-dropped case and the truly-no-transform case look identical downstream.
  2. attrs['crs_wkt'] can still be present with neither attrs['crs'] (no EPSG resolve) nor attrs['transform'] (no transform tags). The VRT path handles this explicitly in _backends/vrt.py:284-289: stamp _NO_GEOREF_KEY, then set crs_wkt. Same shape on disk, different downstream meaning.
  3. _resolve_crs_to_wkt (_crs.py:181-269) accepts a transform-less write path but does not surface that the source had no transform; the writer ends up reconstructing georef from coords (_coords.py:296-331), which is exactly the case _NO_GEOREF_KEY was added to suppress (geotiff: to_geotiff silently strips georef on int64 step-1 user coords #2120, geotiff: move no-georef signal off coord shape onto attrs marker #2124).
  4. Round-trip: the no-georef marker is preserved on write (per tests in tests/test_int_coord_sentinel_2087.py and tests/test_no_georef_writer_round_trip_1949.py), but the rotated-dropped case has no equivalent marker. A reader that opens a file written by to_geotiff from a rotated-dropped DataArray gets an axis-aligned identity transform back with no signal of the original rotation.

Proposal

One canonical attr that encodes the five distinct states the reader can land in:

attrs['georef_status'] = (
    'full'             # CRS resolved + axis-aligned transform present
  | 'transform_only'   # transform present, no CRS (or unparseable CRS)
  | 'crs_only'         # CRS present, no transform tags at all
  | 'none'             # neither CRS nor transform
  | 'rotated_dropped'  # transform tags were present but carried rotation/shear, dropped under allow_rotated=True
)

The reader and writer already distinguish these five cases internally. Expose them as one attr instead of forcing consumers to reconstruct the state from the union of crs, crs_wkt, transform, and _NO_GEOREF_KEY.

What each state means

  • full: spatial ops can run. CRS resolves to either EPSG or parseable WKT, attrs['transform'] is an axis-aligned 6-tuple, coords are real geo coords.
  • transform_only: pixel geometry is real but unprojected. Spatial ops that compare across rasters should refuse. Reprojection helpers should refuse.
  • crs_only: no pixel geometry. coords are int64 placeholders. Any op that needs georef should refuse. The CRS attr is preserved for record-keeping.
  • none: neither is known. Equivalent to a plain image. Spatial ops should treat it the same as crs_only for refusal purposes.
  • rotated_dropped: the source had a real rotated ModelTransformationTag but the reader dropped it under allow_rotated=True (issue geotiff: honour allow_rotated for rotated ModelTransformationTag #2116). The rotated 6-tuple is preserved on geo_info.transform.rotated_affine and may be surfaced on attrs['rotated_affine'], but attrs['transform'] is absent. Spatial ops should refuse with a clearer message than the current "no transform" path gives.

Where the attr is set

Every read path that currently goes through _populate_attrs_from_geo_info or builds attrs inline:

Read path File State decision
Eager numpy __init__.py:633 via _populate_attrs_from_geo_info derived from geo_info.has_georef, geo_info.crs_epsg, geo_info.crs_wkt, and the rotated_affine marker
Dask _backends/dask.py:342 same helper, same decision
GPU (chunked, eager, tile) _backends/gpu.py:437, 816, 1369 same helper
VRT (eager + dask) _backends/vrt.py:275-289, 699-718 inline; needs the same five-state decision
HTTP / BytesIO flow through the eager + dask paths above; covered transitively

Push the state computation into _populate_attrs_from_geo_info so the four backends and the two VRT branches all derive it from the same inputs. The VRT inline path then either calls the helper or imports a smaller _compute_georef_status(geo_info) leaf.

Downstream consumers

Functions that currently gate on 'transform' in attrs, attrs.get('crs'), or both should switch to gating on attrs['georef_status']:

  • to_geotiff and the GPU / VRT writers (_writers/eager.py:300, 575; _writers/gpu.py:320, 433; __init__.py:385-405 docs): a crs_only or none array writes without georef; transform_only writes without CRS; rotated_dropped should either refuse or require an explicit opt-in argument so the writer cannot silently emit an axis-aligned file from a rotated-dropped array.
  • transform_from_attr and coords_to_transform in _coords.py:242-331: currently raise on non-zero rotation; should also refuse cleanly when georef_status is crs_only or none rather than relying on the absence of attrs['transform'].
  • Any spatial-op caller in the wider library that needs a real transform (slope, aspect, hillshade, reproject) can gate on georef_status == 'full' for a clean refusal path.

Backward compatibility

  • Keep emitting attrs['_xrspatial_no_georef'] and the existing attrs['transform'] / attrs['crs'] / attrs['crs_wkt'] keys unchanged. The new attr is additive.
  • Bump _ATTRS_CONTRACT_VERSION in _attrs.py:173 from 2 to 3; document the new key in the contract docstring at the top of _attrs.py.
  • Writers preserve attrs['georef_status'] on round-trip but do not rely on it: the existing decision logic stays as the source of truth for what gets written. The attr is for consumers, not for the writer's own state machine.
  • XRSPATIAL_GEOTIFF_STRICT=1 callers get the same refusals they get today; the error message can cite georef_status.

Tests

A new tests/test_georef_status_<issue>.py with a fixture matrix:

Fixture Expected georef_status
Standard EPSG-tagged GeoTIFF full
GeoTIFF with ModelPixelScale + ModelTiepoint but no GeoKeys transform_only
GeoTIFF with GeoKeys but no transform tags (the _NO_GEOREF_KEY fixture from tests/test_int_coord_sentinel_2087.py) crs_only
Plain-image TIFF (no GeoKeys, no transform tags) none
Rotated ModelTransformationTag, allow_rotated=True (per tests/test_allow_rotated_geotiff_2115.py) rotated_dropped
Same rotated file, default allow_rotated=False raises (matches today)
VRT with CRS and transform full
VRT with CRS, no geo_transform element crs_only
Round-trip: write each of the five states, re-read, assert georef_status matches n/a (round-trip stability check)

Extend test_attrs_contract_canonical_1984.py to include georef_status in the canonical key list and bump the contract version assertion.

Relation to recent work

Out of scope

  • Changing the writer's decision logic. Read-side attr only.
  • Re-projection. Spatial ops can gate on the attr but no reproject implementation here.
  • Vertical CRS. georef_status covers horizontal CRS + horizontal transform; vertical metadata keeps round-tripping through crs_wkt.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencyenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions