Skip to content

geotiff: validate multiple ModelTiepoints#2118

Merged
brendancol merged 3 commits into
mainfrom
issue-2117
May 19, 2026
Merged

geotiff: validate multiple ModelTiepoints#2118
brendancol merged 3 commits into
mainfrom
issue-2117

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2117.

Summary

  • Added _validate_tiepoint_consistency in _geotags.py. When ModelTiepointTag carries more than one (I, J, K, X, Y, Z) tuple, the helper predicts each extra tuple's world coords from the affine inferred from tuple 0 and compares with the declared values within a tolerance scaled to pixel size.
  • Multi-tiepoint files whose tuples agree continue to read. Files whose tuples disagree (the GCP-warp case) now raise NotImplementedError with a message that points users at gdalwarp to rectify first, rather than silently producing wrong coordinates.
  • Runs on both ModelTiepoint + ModelPixelScale and the tiepoint-only fallback path.

Backend coverage

  • All backends share _extract_transform, so the check fires for numpy, cupy, dask+numpy, and dask+cupy reads alike.

Test plan

  • test_multi_tiepoint_validation_2117.py: covers single-tuple unchanged, consistent multi-tuple passing, inconsistent multi-tuple raising, tolerance scaling with pixel size, the y-axis sign convention, the tiepoint-without-scale branch, and a short (truncated) tiepoint falling back to single-tuple semantics.
  • Full xrspatial/geotiff/tests/ suite: 4027 passed, 25 skipped.

ModelTiepointTag can carry many (I, J, K, X, Y, Z) tuples. _extract_transform
sliced only tiepoint[0:6] and dropped the rest. Files that repeat the
same affine at every corner happened to read correctly; files that use
the tag to encode a GCP warp silently fabricated an axis-aligned
affine from the first tuple and downstream spatial ops then trusted
wrong coordinates.

Add _validate_tiepoint_consistency. For every extra tuple, predict its
world coords from the inferred affine and compare against the
declared values within a tolerance scaled to the pixel size. If they
match, the read proceeds. If they disagree, raise NotImplementedError
that names the GCP case so users know to rectify with gdalwarp first.
Runs on both the ModelTiepoint + ModelPixelScale path and the
tiepoint-only fallback.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 19, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-review of PR #2118.

Coverage

  • _validate_tiepoint_consistency fires on both the ModelTiepoint + ModelPixelScale path and the tiepoint-only fallback.
  • Tolerance is 1e-6 * max(|sx|, |sy|, 1.0) so files in different units are treated consistently.
  • The sign convention matches the rest of _extract_transform: y decreases as row index increases, so predicted_y = origin_y - j * sy.
  • Truncated tiepoints (fewer than 12 elements) are treated as a single tuple via integer division -- no crash on malformed input.

Risk areas worth a closer look

  • The error message points users at gdalwarp to rectify GCP-warp files. That is the right escape hatch but assumes the user has GDAL installed; for users on pure-pip stacks the message should still be actionable since the error tells them why the file is rejected.
  • Tolerance is fixed at 1e-6 of pixel size. A future opt-in for fitting a least-squares affine to the GCPs would be a natural follow-up, but is intentionally out of scope here -- this PR is about failing closed.
  • The check is per-tuple, so a 12-element tiepoint (two tuples) costs one extra arithmetic comparison. No effect on the single-tuple hot path.

Failure mode

  • Files that disagree raise NotImplementedError, matching the rotation path's existing pattern. No silent wrong-coord behaviour remains for multi-tiepoint files.

Self-review on #2118 noted the message pointed only at gdalwarp, which
assumes the user has GDAL installed. Mention rasterio.warp.reproject
and a generic GIS-tool fallback so users on pure-pip stacks still have
an actionable path forward.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up to my self-review.

Error message scope -> broadened.
Updated the NotImplementedError text to mention rasterio.warp.reproject and "any GIS tool that resamples GCP files to an affine raster" alongside gdalwarp, so users on pure-pip stacks still have a concrete path forward. New commit: c39231b.

Fixed tolerance -> deferred.
1e-6 * max(|sx|, |sy|, 1.0) is the same factor _extract_transform's rotation check already uses for floating-point noise. A least-squares affine fit on the GCP set would be a real feature (and would change the failure mode from "raise" to "approximate") -- worth tracking separately rather than slipping into a fail-closed PR.

Per-tuple cost -> verified negligible.
Each extra tuple adds four float ops (two multiplies, two subtracts) plus two comparisons. The single-tuple hot path short-circuits via if n <= 1: return at the top of the helper, so axis-aligned files with one tiepoint pay nothing.

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-review with suggestions and nits.

Suggestions (should fix, not blocking)

  • Tolerance comment is wrong. _geotags.py:514-517 says "matches the rotation check in _extract_transform", but the rotation check uses 1e-12 * scale (raw matrix off-diagonal noise) while this helper uses 1e-6 * max(|sx|, |sy|, 1.0) (world-coord residuals scaled to pixel size). Different precision regimes for different signals. Update the comment so the next reader does not "harmonise" them.

  • 1e-6 is loose for high-precision geodetic files. A 1e-6 tolerance on a 1-degree pixel is ~11 cm; on a 30 cm pixel it is 0.3 µm. For surveying or high-precision geodesy a 1e-6 relative residual on a degree-based pixel may be looser than the user expects, and they would prefer a tighter tolerance to catch GCP files that are nearly but not exactly affine. Expose the tolerance as a private constant (or a kwarg) so a caller can tighten it for their workflow.

  • The unit-scale fallback path treats sx=sy=1.0 as ground truth for the consistency check. _geotags.py:579-595 (the tiepoint-without-scale branch) infers an origin from tuple 0 and then _validate_tiepoint_consistency compares predictions at sx=1, sy=1. For a real multi-tiepoint file without a ModelPixelScale, this will almost always exceed the 1e-6 tolerance and raise. That is the right outcome -- such a file is almost certainly malformed -- but the error message blames the GCP case, which may mislead. Consider distinguishing "missing ModelPixelScale + multiple tiepoints" from a true GCP warp.

Nits (optional)

  • tp_i = tiepoint[base] at _geotags.py:521 could read tiepoint[base + 0] for symmetry with tiepoint[base + 1], tiepoint[base + 3], etc. Pure cosmetic.

  • Sign-convention check spans two paths. origin_y = tp_y + tp_j * sy in _extract_transform and predicted_y = origin_y - tp_j * sy in _validate_tiepoint_consistency. The asymmetry is correct (one inverts the other) but a one-line comment in _validate_tiepoint_consistency cross-referencing the convention would save a future reader the derivation.

  • Long error message. The 9-line NotImplementedError body is helpful but unusual. A primary line + a follow-up "Rectify with gdalwarp / rasterio.warp" hint on a separate raise / log call would be easier to wrap in IDE / notebook output.

What looks good

  • Helper covers both ModelTiepoint + ModelPixelScale and the unit-scale fallback path, so neither branch can fabricate a wrong affine from the first tuple.
  • 9-case unit test sweep (single, consistent multi, inconsistent, tolerance scaling, y-sign, missing scale, truncated) is thorough.
  • Default behaviour for single-tuple files is a no-op via if n <= 1: return, so axis-aligned files pay nothing.
  • Broadened error message (commit c39231b) names rasterio.warp.reproject alongside gdalwarp.

- Fix the misleading "matches the rotation check" comment: that check
  is 1e-12 * scale on raw matrix off-diagonals, while this one is
  1e-6 * max(|sx|, |sy|, 1.0) on world-coord residuals -- different
  signals, different regimes.
- Promote the tolerance factor to module-level
  ``_TIEPOINT_CONSISTENCY_REL_TOL`` and expose a ``rel_tol`` kwarg so
  surveying / high-precision geodetic callers can tighten it.
- Distinguish the unit-scale fallback case from a real GCP warp in the
  error message via a ``scale_source`` kwarg; a multi-tiepoint file
  without ``ModelPixelScale`` is almost certainly malformed, not a
  deliberate GCP set, and the message now says so.
- Read ``tiepoint[base + 0]`` for symmetry with ``base + 1`` /
  ``base + 3``.
- Add a cross-reference comment in the helper explaining that the
  ``predicted_y = origin_y - tp_j * sy`` form is the inverse of the
  ``origin_y = tp_y + tp_j * sy`` form in ``_extract_transform``.
- Split the ``NotImplementedError`` body into primary / cause / hint
  lines so notebook and IDE consumers can wrap it.
- New tests cover the custom ``rel_tol`` path; the unit-scale fallback
  test now asserts the missing-scale wording rather than the GCP wording.
@brendancol brendancol merged commit f2e191d into main May 19, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Multiple ModelTiepoints are silently collapsed to the first tiepoint

1 participant