Skip to content

geotiff: add canonical attrs['georef_status'] (#2136)#2145

Open
brendancol wants to merge 4 commits into
mainfrom
worktree-agent-a3b6e3087373c6751
Open

geotiff: add canonical attrs['georef_status'] (#2136)#2145
brendancol wants to merge 4 commits into
mainfrom
worktree-agent-a3b6e3087373c6751

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

One canonical attrs['georef_status'] so downstream code can branch on a single value instead of reconciling crs, crs_wkt, transform, and _xrspatial_no_georef by hand. Five values cover what the reader can actually see:

  • full -- CRS resolved plus axis-aligned transform.
  • transform_only -- transform present, no CRS.
  • crs_only -- CRS present, no transform tags.
  • none -- neither.
  • rotated_dropped -- rotated ModelTransformationTag dropped under allow_rotated=True. Previously indistinguishable from none via public attrs.

Changes:

  • Two helpers in _attrs.py: _compute_georef_status (takes a GeoInfo) and _compute_georef_status_from_parts (for the VRT branches that don't build a GeoInfo). All read sites -- eager, dask, GPU, VRT-eager, VRT-chunked -- funnel through one decision rule.
  • _ATTRS_CONTRACT_VERSION goes 2 to 3. The contract docstring at the top of _attrs.py documents the new key.
  • Additive only. crs / crs_wkt / transform / _xrspatial_no_georef keep their pre-v3 semantics so existing consumers don't change.

Backend coverage

  • numpy eager: yes (via _populate_attrs_from_geo_info)
  • cupy / GPU: yes (same helper)
  • dask + numpy: yes (same helper)
  • dask + cupy: yes (same helper)
  • VRT eager and VRT chunked: yes (inline via _compute_georef_status_from_parts)

Test plan

  • test_georef_status_2136.py: per-state reads across numpy, dask, GPU (skipped without CUDA), and both VRT branches, plus a parametrised round-trip test for the four writable states. rotated_dropped stays read-only because to_geotiff doesn't emit rotated ModelTransformationTag; the issue calls this out under Out of Scope.
  • test_attrs_contract_canonical_1984.py: georef_status added to the canonical key list, pinned to full for the canonical fixture.
  • test_attrs_contract_version_1984.py and test_attrs_contract_passthrough_1984.py: switched to tracking _ATTRS_CONTRACT_VERSION rather than the literal 2, so the next bump only touches the constant.
  • Full geotiff suite locally: 4276 passed, 25 skipped.

Closes #2136.

Read paths emit one canonical attr that encodes the five distinct
states the reader can land in when CRS and transform tags are
combined: full, transform_only, crs_only, none, rotated_dropped.

Without this attr, downstream code has to reconcile crs / crs_wkt /
transform / _xrspatial_no_georef by hand, and two distinct on-disk
situations (rotated-with-allow_rotated and truly-no-transform) end
up indistinguishable via the public attrs.

* Stamped via _populate_attrs_from_geo_info, covering eager numpy,
  dask, and the GPU read paths.
* Stamped inline in both VRT branches (eager and chunked) through a
  shared _compute_georef_status_from_parts helper so all six call
  sites share one decision rule.
* Bumps _ATTRS_CONTRACT_VERSION 2 -> 3.
* Additive: crs, crs_wkt, transform, and the _xrspatial_no_georef
  marker keep their pre-v3 semantics so existing consumers keep
  working.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 19, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: canonical attrs['georef_status'] (#2136)

No blockers. A few small things worth addressing in this PR rather than a follow-up.

Suggestions

  • xrspatial/geotiff/tests/test_georef_status_2136.py:50 -- _ROTATED_M is imported but never referenced. Drop it from the import (or actually use it to pin the matrix the test exercises).
  • GEOREF_STATUS_* constants in _attrs.py -- The issue frames georef_status as a downstream-consumer signal, but the five constants live in private _attrs. If you want callers to branch on them rather than the string literals, re-export from xrspatial.geotiff. Otherwise document that comparing against the literal strings is the expected API.

Nits

  • xrspatial/geotiff/_attrs.py:391 -- Docstring says "The four backends (eager, dask, GPU)" but there are three distinct GPU call sites in gpu.py (chunked, eager, tile) on top of eager numpy + dask. Either drop the count or list the five paths.
  • _GEOREF_STATUS_VALUES frozenset -- Declared but unused. If it stays, expose it so downstream code can validate; otherwise drop it.
  • _backends/vrt.py:286, 717 -- bool(vrt.crs_wkt) is currently safe because _text returns None (never "") for missing/empty SRS, but _compute_georef_status uses crs_wkt is not None. Align the two checks so a future change to _text cannot drift the decision.

What looks good

  • One decision function plus a booleans-only twin for the VRT branches that don't have a GeoInfo. Clean separation.
  • Rotated-dropped detection keys off transform.rotated_affine, not has_georef, so a future change to how has_georef is set cannot leak a true-no-transform file into the rotated bucket.
  • Pre-branch stamping means changes to which optional attrs get emitted can't shift the status value.
  • Tests cover the unit decision, the five reader states across numpy / dask / GPU / VRT-eager / VRT-chunked, and round-trip stability for the four writable states. The rotated_dropped write-side asymmetry is documented inline.
  • Contract version assertions now track _ATTRS_CONTRACT_VERSION instead of literals; the next bump only touches the constant.

Checklist

  • Decision table matches the issue
  • All read paths funnel through one of the two helpers
  • Edge cases: empty crs_wkt handled (via _text returning None)
  • No materialization concerns (attrs work only)
  • Docstrings accurate modulo the count typo

- Expose GEOREF_STATUS_* constants and GEOREF_STATUS_VALUES through
  xrspatial.geotiff. Downstream code can now branch on the constants
  via the public package surface; previously the only path was an
  import from the private _attrs module.
- Align the VRT branches with _compute_georef_status by checking
  ``vrt.crs_wkt is not None`` rather than ``bool(vrt.crs_wkt)``. The
  VRT XML parser returns None (not "") for missing/empty SRS today;
  pinning the rule defends the alignment if the parser ever changes.
- Drop the unused _ROTATED_M import in the test module.
- Fix the _compute_georef_status docstring to enumerate the read call
  sites instead of an out-of-date count.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up: review findings addressed

All suggestions and nits from the previous review pass are addressed in 89283ce.

Fixed

  • Public constants exposed. GEOREF_STATUS_FULL, GEOREF_STATUS_TRANSFORM_ONLY, GEOREF_STATUS_CRS_ONLY, GEOREF_STATUS_NONE, GEOREF_STATUS_ROTATED_DROPPED, and the new GEOREF_STATUS_VALUES frozenset are now re-exported from xrspatial.geotiff and added to __all__. Downstream code can from xrspatial.geotiff import GEOREF_STATUS_FULL instead of reaching into the private _attrs module. The test_features.py public-API frozen list grew by six names to match.
  • VRT bool(vrt.crs_wkt) -> vrt.crs_wkt is not None in both VRT branches. Matches the _compute_georef_status GeoInfo helper exactly; the comment notes why the alignment matters even though _text returns None today.
  • Unused _ROTATED_M import dropped from the test module.
  • _compute_georef_status docstring rewritten to enumerate the actual call sites (eager numpy, dask, three GPU paths, two VRT helpers) instead of the stale "four backends" count.
  • _GEOREF_STATUS_VALUES renamed to GEOREF_STATUS_VALUES since it is now part of the public surface, with a docstring comment.

Added

  • test_public_constants_reexported: pins the public re-exports so a future refactor cannot quietly drop one of the six names.

Verified

Full geotiff test suite: 4277 passed, 25 skipped.

…087373c6751

# Conflicts:
#	xrspatial/geotiff/_attrs.py
…087373c6751

# Conflicts:
#	xrspatial/geotiff/_attrs.py
#	xrspatial/geotiff/_backends/vrt.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: canonical georef_status attribute to disambiguate CRS vs transform presence

1 participant