Skip to content

geotiff: drop attrs['crs'] on rotated reads (#2122)#2125

Merged
brendancol merged 6 commits into
mainfrom
issue-2122
May 19, 2026
Merged

geotiff: drop attrs['crs'] on rotated reads (#2122)#2125
brendancol merged 6 commits into
mainfrom
issue-2122

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Closes #2122.

  • _populate_attrs_from_geo_info now drops attrs['crs'] and attrs['crs_wkt'] on rotated reads opened with allow_rotated=True, matching the public docstring contract. The gate keys on geo_info.transform.rotated_affine (set by the geotag parser when a rotated ModelTransformationTag is consumed under the opt-in), so general no-georef reads still surface CRS attrs when the file has one declared but no transform tags.
  • VRT eager and chunked backends gain the same gate, keyed on the GDAL GeoTransform's b / d rotation terms.
  • Updated docs/source/user_guide/attrs_contract.rst and the _attrs.py module docstring to reflect the rotated-read exception.

Backend coverage

Backend Status
numpy (eager) covered + tested
dask covered + tested
cupy xfail in the test until the GPU CPU-fallback path forwards allow_rotated (sibling finding tracked separately)
dask + cupy covered + tested (routes through the dask path)
VRT eager covered + tested
VRT chunked covered + tested

Test plan

  • pytest xrspatial/geotiff/tests/test_allow_rotated_no_crs_2122.py: 7 pass + 1 xfail.
  • pytest xrspatial/geotiff/tests/ -k "not gpu and not cuda": 3511 pass, 17 skipped, 1 xfailed (no regressions).
  • Reproduction in the issue now yields da.attrs.get('crs') is None and 'crs_wkt' not in da.attrs.
  • Axis-aligned reads still emit crs / crs_wkt / transform.
  • to_geotiff(arr, path, crs=32610) (no coords) still round-trips the CRS through a read.

The ``open_geotiff`` docstring promises ``attrs['crs']`` is dropped
on rotated reads opened with ``allow_rotated=True``, so downstream
code that uses ``'crs' in da.attrs`` as the "this raster is
georeferenced" signal does not treat a no-georef pixel grid as
projected. Before this commit, ``_populate_attrs_from_geo_info``
only gated ``attrs['transform']`` on the ``has_georef`` flag and
kept emitting ``crs`` / ``crs_wkt`` from the GeoKey block, so the
documented contract failed in practice.

The marker for "rotated read opted in via allow_rotated=True" is
``geo_info.transform.rotated_affine`` -- the geotag parser sets the
rotated 6-tuple on that field when it sees a rotated
``ModelTransformationTag`` under the opt-in. Gate the CRS attrs on
the absence of that marker. General no-georef reads (axis-aligned
rasters that simply lack transform tags, e.g. arrays written with
``to_geotiff(..., crs=NNN)`` and no coords) still surface ``crs`` /
``crs_wkt`` because the CRS is meaningful even without an embedded
transform; only the rotated case is misleading.

Mirror the same gate on the eager and chunked VRT backends, keyed
on the GDAL GeoTransform's ``b`` / ``d`` rotation terms. The VRT
inline attrs build had the same flaw on both code paths.

Tests cover numpy, dask, cupy (xfail until the GPU CPU-fallback
forwards allow_rotated -- sibling finding), dask+cupy, VRT eager,
and VRT chunked. Regression guards confirm axis-aligned reads
still emit ``crs`` / ``crs_wkt`` / ``transform`` and that the
writer's ``crs=`` kwarg path round-trips cleanly when no coords
are supplied.

Closes #2122.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 19, 2026
CI failed with ModuleNotFoundError because the fast lane does not
install tifffile. Switch the two `import tifffile` sites in
`test_allow_rotated_no_crs_2122.py` to
`pytest.importorskip("tifffile")`, matching the convention used by
other geotiff tests that depend on tifffile for fixture writes.

Only affects the VRT and aligned-fixture tests; the
`_write_rotated_tiff_with_crs` helper has no tifffile dependency.
# Conflicts:
#	xrspatial/geotiff/_backends/vrt.py
Same fix as PR #2140: the three VRT tests in
test_masked_nodata_attr_2092 import tifffile inside _write_int_vrt,
but tifffile is not in the [tests] extras. Use pytest.importorskip
to match the pattern in test_wkt_only_crs_warning_1768 and friends.

Carries the fix on this branch so CI goes green without waiting on
#2140 to merge.
@brendancol
Copy link
Copy Markdown
Contributor Author

PR Review: geotiff: drop attrs['crs'] on rotated reads (#2122)

The fix closes the documented gap where attrs['crs'] and attrs['crs_wkt'] leaked through on rotated reads opened with allow_rotated=True. Test coverage on the four GeoTIFF backends plus both VRT entry points is good. One internal-consistency gap and a couple of smaller items below.

Blockers (must fix before merge)

None

Suggestions (should fix, not blocking)

  • xrspatial/geotiff/_backends/vrt.py:264-268, 727-731: on rotated VRT reads under allow_rotated=True, the coords path keeps calling _coords_from_pixel_geometry with the default has_georef=True, so the returned DataArray carries float projected coords computed from the axis-aligned interpretation of the rotated transform. The eager GeoTIFF path emits integer pixel coords for the same case. I verified empirically: open_geotiff(rotated.vrt, allow_rotated=True).coords['x'].dtype is float64 with values [0.5, 1.5, 2.5, 3.5], while the eager GeoTIFF path returns int64 [0, 1, 2, 3]. The attrs_contract.rst text the PR adds says "the in-memory array is a pixel grid with integer coords"; the VRT path violates that. Pass has_georef=False (or equivalent) through _coords_from_pixel_geometry when _vrt_is_rotated so both paths produce the same shape.
  • xrspatial/geotiff/_backends/vrt.py:283, 739: when the VRT is rotated, neither the eager nor chunked path stamps attrs[_NO_GEOREF_KEY] = True. The eager GeoTIFF path stamps the marker via _populate_attrs_from_geo_info whenever has_georef=False, including the rotated case. Downstream code that branches on the marker (writer round-trip detection in _coords.py:54, test_no_georef_writer_round_trip_1949.py) treats a rotated-read DataArray differently depending on which backend produced it. Stamping _NO_GEOREF_KEY on the rotated VRT branch would close the gap.
  • xrspatial/geotiff/tests/test_allow_rotated_no_crs_2122.py:285-312: the VRT rotated tests check that crs, crs_wkt, transform are absent but do not check coords['x'].dtype or the _NO_GEOREF_KEY stamp. The GeoTIFF tests (lines 137-139, 154-155) do check coord dtype. Adding the coord-dtype and marker assertions on the VRT tests would pin the contract symmetrically across backends and catch the consistency gap above.

Nits (optional improvements)

  • xrspatial/geotiff/_backends/vrt.py:293-295, 721-723: _vrt_is_rotated is computed via gt[2] != 0.0 or gt[4] != 0.0, exact float comparison. The GeoTIFF geotag parser at _geotags.py:671-675 uses a scaled tolerance (1e-12 * scale) to absorb floating-point noise from ModelTransformation tags. The existing rotation validator in _validation.py:886 already uses exact == 0.0, so the PR is consistent with current VRT policy, but a noise term like 1e-16 from XML round-tripping would now trigger the no-georef branch on VRT while the GeoTIFF side would absorb it. Worth tracking, not worth blocking.
  • xrspatial/geotiff/_backends/vrt.py:284-295, 714-723: the _vrt_is_rotated block and the comment that explains it are duplicated verbatim across the eager and chunked branches. A small private helper (e.g. _gt_is_rotated(gt) next to _gdal_geotransform_to_affine_tuple) would let both sites share one definition. Cosmetic only.
  • xrspatial/geotiff/tests/test_allow_rotated_no_crs_2122.py:166-184: xfail(strict=True, raises=NotImplementedError) is the right call for the GPU sibling-finding, but the test name test_cupy_rotated_read_drops_crs reads as a positive assertion. Renaming to something like test_cupy_rotated_read_drops_crs_xfail_gpu_fallback would make the intent obvious in CI output.

What looks good

  • The gate keys on geo_info.transform.rotated_affine rather than the bare has_georef flag, so the writer's crs= kwarg path (general no-georef, axis-aligned) still surfaces crs and crs_wkt. The inline comment at _attrs.py:371-380 is clear about that distinction.
  • Docs and module docstring both updated to match the new contract. Round-trip stability for the writer's crs= kwarg is exercised by an explicit regression test (test_axis_aligned_read_still_emits_crs).
  • The dask+CuPy test case correctly notes that the dask path forwards allow_rotated and bypasses the eager GPU CPU-fallback.
  • Tifffile import migrated to pytest.importorskip so the fast lane is not broken.

Checklist

  • Correctness: gate logic correct on the non-VRT path
  • Backend completeness: VRT eager + chunked drop attrs but emit projected float coords (Suggestion 1)
  • Float/NaN handling: untouched
  • Edge cases: axis-aligned regression covered; writer crs= kwarg case covered
  • Dask path: covered + tested
  • CuPy path: xfail with strict raises gate
  • Tests use unique tmp paths
  • Reference values asserted, not just "doesn't crash"
  • VRT tests do not assert coord dtype / no-georef marker (Suggestion 3)
  • Public docstring + contract doc updated
  • No premature .values / .compute() in the new code

Follow-up on review of PR #2125: both VRT entry points (eager and
chunked) drop crs / crs_wkt / transform on rotated reads, but still
emit float projected coords and skip the _xrspatial_no_georef marker.
The eager non-VRT rotated path emits int64 pixel coords and stamps
the marker, so a downstream consumer saw the two backends disagree on
the same input.

Compute _vrt_is_rotated before the coord block in the eager VRT path,
then thread has_georef=not _vrt_is_rotated into _coords_from_pixel_geometry
in both VRT paths. Stamp _NO_GEOREF_KEY=True on the rotated branch in
both paths, matching the no-transform arm and the eager non-VRT path
in _attrs.py.

Extend test_vrt_eager_rotated_read_drops_crs and
test_vrt_chunked_rotated_read_drops_crs to assert int64 x/y dtype and
_NO_GEOREF_KEY=True so the parity gap is pinned.
@brendancol
Copy link
Copy Markdown
Contributor Author

Review follow-up

Pushed ab85158 to address all three suggestions:

  • VRT eager + chunked rotated reads now pass has_georef=False into _coords_from_pixel_geometry, so da.x.dtype == da.y.dtype == np.int64. Matches the eager non-VRT path.
  • VRT eager + chunked rotated branches now stamp attrs[_NO_GEOREF_KEY] = True. Aligns with _attrs.py:414 and the no-transform arm.
  • test_vrt_eager_rotated_read_drops_crs and test_vrt_chunked_rotated_read_drops_crs now assert the int64 coord dtype and the _NO_GEOREF_KEY marker, so the parity gap is pinned.

Full geotiff suite still passes locally (664 / 1 xfailed). Waiting on CI.

@brendancol brendancol merged commit 24b014c into main May 19, 2026
4 checks passed
brendancol added a commit that referenced this pull request May 20, 2026
#2146 (nodata lifecycle) and #2150 (tier surface) had landed cleanly on
main without touching _populate_attrs_from_geo_info, but #2125's
"drop attrs['crs'] on rotated reads" introduced a _vrt_is_rotated gate
on the VRT branches that our pre-merge dataclass-based VRT code did not
honour. Resolved by:

* _attrs.py: kept the GeoTIFFMetadata shim. The dataclass already
  produces the same attrs surface as the legacy field-by-field path.
* _backends/vrt.py: kept the dataclass build on both VRT branches and
  threaded _vrt_is_rotated through to drop crs/crs_wkt and set
  has_georef=False on rotated reads (so metadata_to_attrs stamps the
  no-georef marker).

Verified test_allow_rotated_no_crs_2122, test_geotiff_metadata_2139,
test_nodata_lifecycle_attrs_2135, test_supported_features_tiers_2137,
test_backend_parity_matrix, and the broader VRT/attrs/nodata suite
(1309 tests) all pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: open_geotiff(allow_rotated=True) leaks attrs['crs'] despite docstring promising it is dropped

1 participant