geotiff: validate multiple ModelTiepoints#2118
Conversation
ModelTiepointTag can carry many (I, J, K, X, Y, Z) tuples. _extract_transform sliced only tiepoint[0:6] and dropped the rest. Files that repeat the same affine at every corner happened to read correctly; files that use the tag to encode a GCP warp silently fabricated an axis-aligned affine from the first tuple and downstream spatial ops then trusted wrong coordinates. Add _validate_tiepoint_consistency. For every extra tuple, predict its world coords from the inferred affine and compare against the declared values within a tolerance scaled to the pixel size. If they match, the read proceeds. If they disagree, raise NotImplementedError that names the GCP case so users know to rectify with gdalwarp first. Runs on both the ModelTiepoint + ModelPixelScale path and the tiepoint-only fallback.
brendancol
left a comment
There was a problem hiding this comment.
Self-review of PR #2118.
Coverage
_validate_tiepoint_consistencyfires on both theModelTiepoint + ModelPixelScalepath and the tiepoint-only fallback.- Tolerance is
1e-6 * max(|sx|, |sy|, 1.0)so files in different units are treated consistently. - The sign convention matches the rest of
_extract_transform: y decreases as row index increases, sopredicted_y = origin_y - j * sy. - Truncated tiepoints (fewer than 12 elements) are treated as a single tuple via integer division -- no crash on malformed input.
Risk areas worth a closer look
- The error message points users at
gdalwarpto rectify GCP-warp files. That is the right escape hatch but assumes the user has GDAL installed; for users on pure-pip stacks the message should still be actionable since the error tells them why the file is rejected. - Tolerance is fixed at 1e-6 of pixel size. A future opt-in for fitting a least-squares affine to the GCPs would be a natural follow-up, but is intentionally out of scope here -- this PR is about failing closed.
- The check is per-tuple, so a 12-element tiepoint (two tuples) costs one extra arithmetic comparison. No effect on the single-tuple hot path.
Failure mode
- Files that disagree raise
NotImplementedError, matching the rotation path's existing pattern. No silent wrong-coord behaviour remains for multi-tiepoint files.
Self-review on #2118 noted the message pointed only at gdalwarp, which assumes the user has GDAL installed. Mention rasterio.warp.reproject and a generic GIS-tool fallback so users on pure-pip stacks still have an actionable path forward.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up to my self-review.
Error message scope -> broadened.
Updated the NotImplementedError text to mention rasterio.warp.reproject and "any GIS tool that resamples GCP files to an affine raster" alongside gdalwarp, so users on pure-pip stacks still have a concrete path forward. New commit: c39231b.
Fixed tolerance -> deferred.
1e-6 * max(|sx|, |sy|, 1.0) is the same factor _extract_transform's rotation check already uses for floating-point noise. A least-squares affine fit on the GCP set would be a real feature (and would change the failure mode from "raise" to "approximate") -- worth tracking separately rather than slipping into a fail-closed PR.
Per-tuple cost -> verified negligible.
Each extra tuple adds four float ops (two multiplies, two subtracts) plus two comparisons. The single-tuple hot path short-circuits via if n <= 1: return at the top of the helper, so axis-aligned files with one tiepoint pay nothing.
brendancol
left a comment
There was a problem hiding this comment.
Re-review with suggestions and nits.
Suggestions (should fix, not blocking)
-
Tolerance comment is wrong.
_geotags.py:514-517says "matches the rotation check in_extract_transform", but the rotation check uses1e-12 * scale(raw matrix off-diagonal noise) while this helper uses1e-6 * max(|sx|, |sy|, 1.0)(world-coord residuals scaled to pixel size). Different precision regimes for different signals. Update the comment so the next reader does not "harmonise" them. -
1e-6is loose for high-precision geodetic files. A 1e-6 tolerance on a 1-degree pixel is ~11 cm; on a 30 cm pixel it is 0.3 µm. For surveying or high-precision geodesy a 1e-6 relative residual on a degree-based pixel may be looser than the user expects, and they would prefer a tighter tolerance to catch GCP files that are nearly but not exactly affine. Expose the tolerance as a private constant (or a kwarg) so a caller can tighten it for their workflow. -
The unit-scale fallback path treats
sx=sy=1.0as ground truth for the consistency check._geotags.py:579-595(the tiepoint-without-scale branch) infers an origin from tuple 0 and then_validate_tiepoint_consistencycompares predictions atsx=1, sy=1. For a real multi-tiepoint file without aModelPixelScale, this will almost always exceed the 1e-6 tolerance and raise. That is the right outcome -- such a file is almost certainly malformed -- but the error message blames the GCP case, which may mislead. Consider distinguishing "missing ModelPixelScale + multiple tiepoints" from a true GCP warp.
Nits (optional)
-
tp_i = tiepoint[base]at_geotags.py:521could readtiepoint[base + 0]for symmetry withtiepoint[base + 1],tiepoint[base + 3], etc. Pure cosmetic. -
Sign-convention check spans two paths.
origin_y = tp_y + tp_j * syin_extract_transformandpredicted_y = origin_y - tp_j * syin_validate_tiepoint_consistency. The asymmetry is correct (one inverts the other) but a one-line comment in_validate_tiepoint_consistencycross-referencing the convention would save a future reader the derivation. -
Long error message. The 9-line
NotImplementedErrorbody is helpful but unusual. A primary line + a follow-up "Rectify with gdalwarp / rasterio.warp" hint on a separate raise / log call would be easier to wrap in IDE / notebook output.
What looks good
- Helper covers both
ModelTiepoint + ModelPixelScaleand the unit-scale fallback path, so neither branch can fabricate a wrong affine from the first tuple. - 9-case unit test sweep (single, consistent multi, inconsistent, tolerance scaling, y-sign, missing scale, truncated) is thorough.
- Default behaviour for single-tuple files is a no-op via
if n <= 1: return, so axis-aligned files pay nothing. - Broadened error message (commit c39231b) names
rasterio.warp.reprojectalongsidegdalwarp.
- Fix the misleading "matches the rotation check" comment: that check is 1e-12 * scale on raw matrix off-diagonals, while this one is 1e-6 * max(|sx|, |sy|, 1.0) on world-coord residuals -- different signals, different regimes. - Promote the tolerance factor to module-level ``_TIEPOINT_CONSISTENCY_REL_TOL`` and expose a ``rel_tol`` kwarg so surveying / high-precision geodetic callers can tighten it. - Distinguish the unit-scale fallback case from a real GCP warp in the error message via a ``scale_source`` kwarg; a multi-tiepoint file without ``ModelPixelScale`` is almost certainly malformed, not a deliberate GCP set, and the message now says so. - Read ``tiepoint[base + 0]`` for symmetry with ``base + 1`` / ``base + 3``. - Add a cross-reference comment in the helper explaining that the ``predicted_y = origin_y - tp_j * sy`` form is the inverse of the ``origin_y = tp_y + tp_j * sy`` form in ``_extract_transform``. - Split the ``NotImplementedError`` body into primary / cause / hint lines so notebook and IDE consumers can wrap it. - New tests cover the custom ``rel_tol`` path; the unit-scale fallback test now asserts the missing-scale wording rather than the GCP wording.
Closes #2117.
Summary
_validate_tiepoint_consistencyin_geotags.py. WhenModelTiepointTagcarries more than one(I, J, K, X, Y, Z)tuple, the helper predicts each extra tuple's world coords from the affine inferred from tuple 0 and compares with the declared values within a tolerance scaled to pixel size.NotImplementedErrorwith a message that points users atgdalwarpto rectify first, rather than silently producing wrong coordinates.ModelTiepoint + ModelPixelScaleand the tiepoint-only fallback path.Backend coverage
_extract_transform, so the check fires for numpy, cupy, dask+numpy, and dask+cupy reads alike.Test plan
test_multi_tiepoint_validation_2117.py: covers single-tuple unchanged, consistent multi-tuple passing, inconsistent multi-tuple raising, tolerance scaling with pixel size, the y-axis sign convention, the tiepoint-without-scale branch, and a short (truncated) tiepoint falling back to single-tuple semantics.xrspatial/geotiff/tests/suite: 4027 passed, 25 skipped.