You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reader currently carries three independent metadata signals and downstream code has to reconcile them by hand:
attrs['crs'] / attrs['crs_wkt'] from CRS-related GeoKeys
attrs['transform'] from ModelPixelScale / ModelTiepoint / ModelTransformationTag
attrs[_NO_GEOREF_KEY] (= '_xrspatial_no_georef'), stamped by _populate_attrs_from_geo_info in xrspatial/geotiff/_attrs.py when no transform tags are present
A raster can have CRS but no transform, transform but no CRS, both, or neither. Consumers today have to infer the state from absence-or-presence of three keys plus a sentinel attr, and the mapping is not 1:1 with what the reader actually saw:
_populate_attrs_from_geo_info (_attrs.py:381-394) drops attrs['transform'] and stamps attrs[_NO_GEOREF_KEY] = True when has_georef=False. But has_georef=False is also what _extract_transform(..., allow_rotated=True) returns for a rotated ModelTransformationTag (_geotags.py:73-101, per geotiff: honour allow_rotated for rotated ModelTransformationTag #2116), so the rotated-but-dropped case and the truly-no-transform case look identical downstream.
attrs['crs_wkt'] can still be present with neither attrs['crs'] (no EPSG resolve) nor attrs['transform'] (no transform tags). The VRT path handles this explicitly in _backends/vrt.py:284-289: stamp _NO_GEOREF_KEY, then set crs_wkt. Same shape on disk, different downstream meaning.
Round-trip: the no-georef marker is preserved on write (per tests in tests/test_int_coord_sentinel_2087.py and tests/test_no_georef_writer_round_trip_1949.py), but the rotated-dropped case has no equivalent marker. A reader that opens a file written by to_geotiff from a rotated-dropped DataArray gets an axis-aligned identity transform back with no signal of the original rotation.
Proposal
One canonical attr that encodes the five distinct states the reader can land in:
attrs['georef_status'] = (
'full'# CRS resolved + axis-aligned transform present|'transform_only'# transform present, no CRS (or unparseable CRS)|'crs_only'# CRS present, no transform tags at all|'none'# neither CRS nor transform|'rotated_dropped'# transform tags were present but carried rotation/shear, dropped under allow_rotated=True
)
The reader and writer already distinguish these five cases internally. Expose them as one attr instead of forcing consumers to reconstruct the state from the union of crs, crs_wkt, transform, and _NO_GEOREF_KEY.
What each state means
full: spatial ops can run. CRS resolves to either EPSG or parseable WKT, attrs['transform'] is an axis-aligned 6-tuple, coords are real geo coords.
transform_only: pixel geometry is real but unprojected. Spatial ops that compare across rasters should refuse. Reprojection helpers should refuse.
crs_only: no pixel geometry. coords are int64 placeholders. Any op that needs georef should refuse. The CRS attr is preserved for record-keeping.
none: neither is known. Equivalent to a plain image. Spatial ops should treat it the same as crs_only for refusal purposes.
rotated_dropped: the source had a real rotated ModelTransformationTag but the reader dropped it under allow_rotated=True (issue geotiff: honour allow_rotated for rotated ModelTransformationTag #2116). The rotated 6-tuple is preserved on geo_info.transform.rotated_affine and may be surfaced on attrs['rotated_affine'], but attrs['transform'] is absent. Spatial ops should refuse with a clearer message than the current "no transform" path gives.
Where the attr is set
Every read path that currently goes through _populate_attrs_from_geo_info or builds attrs inline:
Read path
File
State decision
Eager numpy
__init__.py:633 via _populate_attrs_from_geo_info
derived from geo_info.has_georef, geo_info.crs_epsg, geo_info.crs_wkt, and the rotated_affine marker
Dask
_backends/dask.py:342
same helper, same decision
GPU (chunked, eager, tile)
_backends/gpu.py:437, 816, 1369
same helper
VRT (eager + dask)
_backends/vrt.py:275-289, 699-718
inline; needs the same five-state decision
HTTP / BytesIO
flow through the eager + dask paths above; covered transitively
Push the state computation into _populate_attrs_from_geo_info so the four backends and the two VRT branches all derive it from the same inputs. The VRT inline path then either calls the helper or imports a smaller _compute_georef_status(geo_info) leaf.
Downstream consumers
Functions that currently gate on 'transform' in attrs, attrs.get('crs'), or both should switch to gating on attrs['georef_status']:
to_geotiff and the GPU / VRT writers (_writers/eager.py:300, 575; _writers/gpu.py:320, 433; __init__.py:385-405 docs): a crs_only or none array writes without georef; transform_only writes without CRS; rotated_dropped should either refuse or require an explicit opt-in argument so the writer cannot silently emit an axis-aligned file from a rotated-dropped array.
transform_from_attr and coords_to_transform in _coords.py:242-331: currently raise on non-zero rotation; should also refuse cleanly when georef_status is crs_only or none rather than relying on the absence of attrs['transform'].
Any spatial-op caller in the wider library that needs a real transform (slope, aspect, hillshade, reproject) can gate on georef_status == 'full' for a clean refusal path.
Backward compatibility
Keep emitting attrs['_xrspatial_no_georef'] and the existing attrs['transform'] / attrs['crs'] / attrs['crs_wkt'] keys unchanged. The new attr is additive.
Bump _ATTRS_CONTRACT_VERSION in _attrs.py:173 from 2 to 3; document the new key in the contract docstring at the top of _attrs.py.
Writers preserve attrs['georef_status'] on round-trip but do not rely on it: the existing decision logic stays as the source of truth for what gets written. The attr is for consumers, not for the writer's own state machine.
XRSPATIAL_GEOTIFF_STRICT=1 callers get the same refusals they get today; the error message can cite georef_status.
Tests
A new tests/test_georef_status_<issue>.py with a fixture matrix:
Fixture
Expected georef_status
Standard EPSG-tagged GeoTIFF
full
GeoTIFF with ModelPixelScale + ModelTiepoint but no GeoKeys
transform_only
GeoTIFF with GeoKeys but no transform tags (the _NO_GEOREF_KEY fixture from tests/test_int_coord_sentinel_2087.py)
geotiff: move no-georef signal off coord shape onto attrs marker #2124 (no-georef signal off coord shape onto attrs marker): introduced _NO_GEOREF_KEY because the writer was guessing georef state from coord shape. This generalises the marker: instead of one boolean for "no transform tags", a five-valued attr that also distinguishes crs_only and rotated_dropped.
Problem
The reader currently carries three independent metadata signals and downstream code has to reconcile them by hand:
attrs['crs']/attrs['crs_wkt']from CRS-related GeoKeysattrs['transform']fromModelPixelScale/ModelTiepoint/ModelTransformationTagattrs[_NO_GEOREF_KEY](='_xrspatial_no_georef'), stamped by_populate_attrs_from_geo_infoinxrspatial/geotiff/_attrs.pywhen no transform tags are presentA raster can have CRS but no transform, transform but no CRS, both, or neither. Consumers today have to infer the state from absence-or-presence of three keys plus a sentinel attr, and the mapping is not 1:1 with what the reader actually saw:
_populate_attrs_from_geo_info(_attrs.py:381-394) dropsattrs['transform']and stampsattrs[_NO_GEOREF_KEY] = Truewhenhas_georef=False. Buthas_georef=Falseis also what_extract_transform(..., allow_rotated=True)returns for a rotatedModelTransformationTag(_geotags.py:73-101, per geotiff: honour allow_rotated for rotated ModelTransformationTag #2116), so the rotated-but-dropped case and the truly-no-transform case look identical downstream.attrs['crs_wkt']can still be present with neitherattrs['crs'](no EPSG resolve) norattrs['transform'](no transform tags). The VRT path handles this explicitly in_backends/vrt.py:284-289: stamp_NO_GEOREF_KEY, then setcrs_wkt. Same shape on disk, different downstream meaning._resolve_crs_to_wkt(_crs.py:181-269) accepts a transform-less write path but does not surface that the source had no transform; the writer ends up reconstructing georef from coords (_coords.py:296-331), which is exactly the case_NO_GEOREF_KEYwas added to suppress (geotiff: to_geotiff silently strips georef on int64 step-1 user coords #2120, geotiff: move no-georef signal off coord shape onto attrs marker #2124).tests/test_int_coord_sentinel_2087.pyandtests/test_no_georef_writer_round_trip_1949.py), but the rotated-dropped case has no equivalent marker. A reader that opens a file written byto_geotifffrom a rotated-dropped DataArray gets an axis-aligned identity transform back with no signal of the original rotation.Proposal
One canonical attr that encodes the five distinct states the reader can land in:
The reader and writer already distinguish these five cases internally. Expose them as one attr instead of forcing consumers to reconstruct the state from the union of
crs,crs_wkt,transform, and_NO_GEOREF_KEY.What each state means
full: spatial ops can run. CRS resolves to either EPSG or parseable WKT,attrs['transform']is an axis-aligned 6-tuple,coordsare real geo coords.transform_only: pixel geometry is real but unprojected. Spatial ops that compare across rasters should refuse. Reprojection helpers should refuse.crs_only: no pixel geometry.coordsare int64 placeholders. Any op that needs georef should refuse. The CRS attr is preserved for record-keeping.none: neither is known. Equivalent to a plain image. Spatial ops should treat it the same ascrs_onlyfor refusal purposes.rotated_dropped: the source had a real rotatedModelTransformationTagbut the reader dropped it underallow_rotated=True(issue geotiff: honour allow_rotated for rotated ModelTransformationTag #2116). The rotated 6-tuple is preserved ongeo_info.transform.rotated_affineand may be surfaced onattrs['rotated_affine'], butattrs['transform']is absent. Spatial ops should refuse with a clearer message than the current "no transform" path gives.Where the attr is set
Every read path that currently goes through
_populate_attrs_from_geo_infoor builds attrs inline:__init__.py:633via_populate_attrs_from_geo_infogeo_info.has_georef,geo_info.crs_epsg,geo_info.crs_wkt, and therotated_affinemarker_backends/dask.py:342_backends/gpu.py:437,816,1369_backends/vrt.py:275-289,699-718Push the state computation into
_populate_attrs_from_geo_infoso the four backends and the two VRT branches all derive it from the same inputs. The VRT inline path then either calls the helper or imports a smaller_compute_georef_status(geo_info)leaf.Downstream consumers
Functions that currently gate on
'transform' in attrs,attrs.get('crs'), or both should switch to gating onattrs['georef_status']:to_geotiffand the GPU / VRT writers (_writers/eager.py:300,575;_writers/gpu.py:320,433;__init__.py:385-405docs): acrs_onlyornonearray writes without georef;transform_onlywrites without CRS;rotated_droppedshould either refuse or require an explicit opt-in argument so the writer cannot silently emit an axis-aligned file from a rotated-dropped array.transform_from_attrandcoords_to_transformin_coords.py:242-331: currently raise on non-zero rotation; should also refuse cleanly whengeoref_statusiscrs_onlyornonerather than relying on the absence ofattrs['transform'].georef_status == 'full'for a clean refusal path.Backward compatibility
attrs['_xrspatial_no_georef']and the existingattrs['transform']/attrs['crs']/attrs['crs_wkt']keys unchanged. The new attr is additive._ATTRS_CONTRACT_VERSIONin_attrs.py:173from 2 to 3; document the new key in the contract docstring at the top of_attrs.py.attrs['georef_status']on round-trip but do not rely on it: the existing decision logic stays as the source of truth for what gets written. The attr is for consumers, not for the writer's own state machine.XRSPATIAL_GEOTIFF_STRICT=1callers get the same refusals they get today; the error message can citegeoref_status.Tests
A new
tests/test_georef_status_<issue>.pywith a fixture matrix:georef_statusfullModelPixelScale+ModelTiepointbut no GeoKeystransform_only_NO_GEOREF_KEYfixture fromtests/test_int_coord_sentinel_2087.py)crs_onlynoneModelTransformationTag,allow_rotated=True(pertests/test_allow_rotated_geotiff_2115.py)rotated_droppedallow_rotated=Falsefullgeo_transformelementcrs_onlygeoref_statusmatchesExtend
test_attrs_contract_canonical_1984.pyto includegeoref_statusin the canonical key list and bump the contract version assertion.Relation to recent work
_set_nodata_attrs): same pattern this proposal extends. Replace inferred-from-shape signals with an explicit attr the writer reads back.georef_statusdoes for georef whatmasked_nodatadid for nodata._NO_GEOREF_KEYbecause the writer was guessing georef state from coord shape. This generalises the marker: instead of one boolean for "no transform tags", a five-valued attr that also distinguishescrs_onlyandrotated_dropped.allow_rotatedhonoured for rotatedModelTransformationTag): created the rotated-dropped state that currently shares an attrs shape with truly-no-transform. This gives that state its own identifier.Out of scope
georef_statuscovers horizontal CRS + horizontal transform; vertical metadata keeps round-tripping throughcrs_wkt.