Skip to content

geotiff: add formal backend parity test matrix #2132

@brendancol

Description

@brendancol

Summary

xrspatial.geotiff exposes one logical read API (open_geotiff) that fans out to four backend modules (eager numpy in __init__.py, _backends/dask.py, _backends/gpu.py, _backends/vrt.py) and several source types (local path, HTTP, fsspec URI, BytesIO). Parity between those paths is asserted today, but the assertions are scattered across roughly 15+ files keyed to past bug numbers. Add a single matrix that names every supported (backend, source) pair and asserts the same set of fields on every cell.

Where parity is checked today

The existing parity files each pin a slice of the contract for one historical bug:

  • xrspatial/geotiff/tests/test_backend_parity_matrix.py: already a table-driven harness, but currently parametrized over only numpy and dask+numpy. Has the assert_parity helper, _FixtureSpec dataclass, and _materialise / _coord_view helpers ready to extend. Its docstring names it as "single source of truth" for fixture-vs-backend parity going forward.
  • xrspatial/geotiff/tests/test_backend_pixel_parity_matrix_1813.py: 4-backend pixel-byte parity across dtype/compression/layout combinations, with its own _BACKENDS list and _materialise helper.
  • xrspatial/geotiff/tests/test_attrs_parity_1548.py: 4-backend attrs key/value parity for one tifffile-written fixture.
  • xrspatial/geotiff/tests/test_backend_kwarg_parity_1561.py: kwargs reach each backend (dispatcher does not silently drop them).
  • xrspatial/geotiff/tests/test_signature_parity_1631.py: signature parity between open_geotiff and the explicit read_geotiff_* entry points.
  • xrspatial/geotiff/tests/test_miniswhite_backend_parity_1797.py: MinIsWhite photometric handling parity.
  • xrspatial/geotiff/tests/test_vrt_backend_coverage_2026_05_11.py: VRT GPU + dask+GPU coverage.
  • xrspatial/geotiff/tests/test_bytesio_source.py: BytesIO round-trip.
  • xrspatial/geotiff/tests/test_cog_http_*.py, test_http_*.py: HTTP COG behaviour, mostly read-side effects.

A backend change (for example #2127's _set_nodata_attrs rewire) has to be cross-checked against ~10 separate files to know whether it kept the contract. No single file says "this fixture is read by these 8 paths and they all return the same thing."

What the matrix should cover

The matrix is (fixture, backend) -> read DataArray, plus a small set of error fixtures.

Backends (8)

id dispatch source type
numpy open_geotiff(path) local path
dask+numpy open_geotiff(path, chunks=N) local path
gpu open_geotiff(path, gpu=True) local path
dask+gpu open_geotiff(path, gpu=True, chunks=N) local path
vrt-eager open_geotiff(vrt_path) local .vrt
vrt-dask open_geotiff(vrt_path, chunks=N) local .vrt
http-cog open_geotiff('http://...') HTTP URL
fsspec-memory open_geotiff('memory://...') or BytesIO fsspec / IO

GPU rows skip via the existing _gpu_available() predicate from test_backend_pixel_parity_matrix_1813.py. HTTP rows use pytest-httpserver (already a dev dep, see test_cog_http_concurrent.py). fsspec rows use memory:// URIs and io.BytesIO. The BytesIO row also asserts the file-like rejection path for gpu=True / chunks=N separately, since those combinations are documented ValueError cases in open_geotiff.

Fixtures (initial set)

Reuse the _FixtureSpec dataclass in xrspatial/geotiff/tests/test_backend_parity_matrix.py. Start with:

Assertions

The existing assert_parity in test_backend_parity_matrix.py already covers most fields. Concrete checks per cell:

  1. _assert_pixels_equal(ref, actual): byte-equal for integer dtype, np.array_equal(equal_nan=True) for float.
  2. da.dims == spec.expected_dims and da.shape == ref.shape.
  3. Per-axis coord values + dtype: _coord_view(ref, axis).tobytes() == _coord_view(da, axis).tobytes() and ref_c.dtype == actual_c.dtype.
  4. da.attrs.get(k) == ref.attrs.get(k) for a fixed canonical subset (raster_type, transform, crs, crs_wkt, nodata, masked_nodata).
  5. da.dtype == spec.dtype (catches silent upcast that the reference read would also exhibit).
  6. attrs['nodata'] semantics: present iff the fixture declares one; sentinel value matches; masked_nodata flag reflects whether the mask was applied (Bug: attrs['masked_nodata'] reports True when masking was disabled #2092).
  7. Error fixtures: pytest.raises(ExpectedExc, match=expected_msg) runs on every backend that supports the source type.

Helper location

Extend xrspatial/geotiff/tests/test_backend_parity_matrix.py instead of creating a new _parity_matrix.py. The dataclass, materialise helper, and assert_parity already live there with a docstring naming the file as single source of truth. Adding the remaining 6 backend entries to _BACKENDS and the fixtures above to _FIXTURES is mechanical.

The cross-backend helpers in xrspatial/tests/general_checks.py (general_output_checks, assert_numpy_equals_dask_numpy, assert_numpy_equals_cupy, assert_numpy_equals_dask_cupy) are designed for pure-function-of-DataArray operators and assume the input and output share attrs, dims, and coords. They do not fit here because the matrix compares different reads of the same file rather than an op applied across backends. The matrix stays in the geotiff test tree.

Migration

Keep the bug-numbered files as named regression markers. Do not delete them. They should keep their narrow per-bug assertions and stop accumulating new general parity cases. New parity assertions land in the matrix.

Concretely:

  • test_backend_parity_matrix.py: extend _BACKENDS and _FIXTURES, add the error-fixture sub-matrix.
  • test_backend_pixel_parity_matrix_1813.py: stays. Locks the dtype/compression/layout sweep specific to the geotiff: split __init__.py into per-backend modules with shared validation #1813 refactor.
  • test_attrs_parity_1548.py, test_backend_kwarg_parity_1561.py, test_miniswhite_backend_parity_1797.py, test_vrt_backend_coverage_2026_05_11.py: stay as named regression markers.

Out of scope

  • Writer parity (to_geotiff / write_geotiff_gpu / write_vrt). Writer matrix is a separate follow-up; test_writer_matrix.py is the current single point.
  • Performance / throughput assertions.
  • Golden-corpus byte-equality (covered by test_golden_corpus_*_1930.py).
  • A new top-level helper module. The existing file is the home.
  • Property-based / fuzz coverage (test_fuzz_hypothesis_1661.py already does that for one slice).

Acceptance

  • xrspatial/geotiff/tests/test_backend_parity_matrix.py parametrizes over all 8 backends and at least the 7 fixtures listed above.
  • pytest xrspatial/geotiff/tests/test_backend_parity_matrix.py -v reports one cell per (fixture, backend) pair, with GPU / HTTP / fsspec cells skipping cleanly when their deps are absent.
  • Adding a new backend or fixture is one row appended to _BACKENDS or _FIXTURES; no other file edits.
  • The existing per-bug parity files remain green and unchanged.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencyenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions