geotiff: golden corpus layout + manifest schema (#1930, phase 1.1)#1992
Merged
brendancol merged 2 commits intoMay 16, 2026
Merged
Conversation
… phase 1.1) First piece of the golden-corpus parity work in xarray-contrib#1930. Adds the directory, a YAML manifest that pins down what a fixture is, and a deterministic generator that rebuilds the .tif files from it. The manifest covers every dimension the issue body lists: tiled vs stripped, byte order, planar config, dtype range, compression (none/deflate/lzw/lerc/jpeg/packbits) with predictor 1/2/3, nodata sentinels (int, NaN, miniswhite), overviews (internal and .ovr), CRS variants (EPSG, WKT, citation-only), GDAL_METADATA, and free-form extra_tags. The schema is enforced by generate.validate(). The generator iterates fixtures in declared order, emits files in sorted-id order, seeds randomness per fixture, and normalises mtimes to a fixed epoch so re-runs are byte-stable. One canonical example fixture rides along so the schema gets exercised. No real .tif files yet (Phase 2). No oracle harness (Phase 1 PR 2 is in flight in parallel). No backends wired (Phase 3). rasterio and pyyaml are not in install_requires today. The generator imports both lazily and the smoke test uses importorskip, so minimal environments still pass. The tests extra can be amended when Phase 2 needs real writes in CI.
Address self-review of xarray-contrib#1992: * validator now enforces predictor 3 -> float dtype and predictor 2 -> integer dtype. Catching the wrong pairing here gives a clear ManifestError instead of a confusing rasterio write-time failure. * validator type-checks external_overview as bool. * _rasterio_kwargs now threads compression_level (zlevel / zstd_level / jpeg_quality depending on codec) and max_z_error (LERC only). They were documented in the manifest header but ignored by the generator. * README no longer claims fixtures/ is gitignored; Phase 2 commits the real .tif files there. * Iterable import moved from typing to collections.abc. * generate() accepts a manifest_path so load_manifest's path argument is actually reachable. Three new parametrised validator tests cover the new schema checks.
Contributor
Author
Self-review (PR #1992)Read through the PR after opening and pushed a fix-up ( Suggestions (fixed)
Nits (fixed)
Considered, not changed
Test plan
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First PR of the Phase 1 split for #1930. Lands the corpus layout and
the manifest schema, nothing else.
From the implementation plan in #1930:
What lands
manifest.yaml-- schema documented in the header comment, onecanonical example fixture covering every field, version: 1.
generate.py-- deterministic generator. CLI flags:--dry-run,--only <id>,--output-dir.generate.validate()is theauthoritative schema check.
README.md-- how to regenerate, what each phase will add.test_golden_corpus_manifest_1930.py-- smoke test. Parses themanifest, validates every entry, runs the generator in dry-run mode,
and checks the error messages a contributor is most likely to hit.
Out of scope
_oracle.py). Sibling PR for Phase 1 PR 2 (geotiff: golden corpus oracle harness (#1930, phase 1.2) #1991)..tiffiles. Phase 2 PRs add them in batches.pyyaml/rasteriotosetup.cfg'stestsextra. Thesmoke test uses
importorskipuntil Phase 2 needs real writes in CI.Test plan
pytest xrspatial/geotiff/tests/test_golden_corpus_manifest_1930.py-- 12 tests, all pass locally.
python -m xrspatial.geotiff.tests.golden_corpus.generate --dry-run-- exits 0, prints planned outputs.
python -m xrspatial.geotiff.tests.golden_corpus.generate --output-dir /tmp/x-- writes the example fixture end-to-end with normalised mtime;
byte-stable across reruns.