Skip to content

geotiff: tier the public surface; mark experimental codecs/features opt-in #2137

@brendancol

Description

@brendancol

Why

The xrspatial.geotiff module has grown a lot of surface area. Some of it is the boring core (read a local .tif, write a .tif). Some of it is genuinely impressive but risky: VRT mosaics, COG, cloud URLs, GPU encode/decode via nvCOMP and nvJPEG, JPEG-in-TIFF, JPEG2000, LERC, GDAL metadata pass-through, external .tif.ovr sidecars, rotated ModelTransformationTag files.

A new user reading from xrspatial.geotiff import open_geotiff, to_geotiff cannot tell from the public surface which combinations are the supported "happy path" and which are research-grade. Every codec sits flat in _VALID_COMPRESSIONS. Every kwarg sits flat in the docstring. The one feature we have already gated behind an explicit opt-in is allow_internal_only_jpeg on to_geotiff (see xrspatial/geotiff/tests/test_to_geotiff_allow_internal_only_jpeg_parity.py), and that posture is the right one: the writer accepts JPEG only when the caller asks for it by name and emits GeoTIFFFallbackWarning so they know the output will not round-trip through libtiff / GDAL / rasterio.

This issue proposes extending that posture to the rest of the module: tier the features, document the tiers, and put an opt-in or warning in front of anything beyond the stable core. Nothing is removed.

Current feature inventory

From xrspatial/geotiff/__init__.py, _writers/eager.py, _writers/gpu.py, _backends/{gpu,dask,vrt}.py, _compression.py, _reader.py, _sidecar.py:

Read entry points: open_geotiff, read_geotiff_gpu, read_geotiff_dask, read_vrt.
Write entry points: to_geotiff, write_geotiff_gpu, write_vrt.

Codec set in _attrs.py:_VALID_COMPRESSIONS: none, deflate, lzw, jpeg, packbits, zstd, lz4, jpeg2000 / j2k, lerc.

Other capabilities: COG (cog=True), overview pyramids (overview_levels=), external .tif.ovr sidecars (_sidecar.py), VRT mosaics, fsspec / cloud URIs (s3://, gs://, az://, abfs://, memory://), HTTP range reads, GPU nvCOMP / nvJPEG / nvJPEG2K, BigTIFF, rotated transforms (allow_rotated=True), unparseable CRS (allow_unparseable_crs=True), GDAL metadata pass-through and extra_tags.

Proposed tiering

Tier 1: Stable

The path a new user should be on. Local file in, local file out, common codec, axis-aligned grid.

  • open_geotiff(path) and to_geotiff(da, path) on a local file system path.
  • Codecs none, deflate, zstd, lzw, packbits (lossless integer + float, byte-for-byte round-trip).
  • 2D and 3D rasters, axis-aligned transforms, EPSG CRS in attrs['crs'].
  • window=, band=, dtype=, nodata=, tiled=, tile_size=, predictor=.
  • chunks= for dask out-of-core reads on a local path.

Tier 2: Supported but advanced

Works, is tested, but the caller should know what they are signing up for.

  • VRT mosaics (read_vrt, write_vrt). Cross-source nodata, missing backing files, partial mosaics, per-band metadata disagreement.
  • COG with overviews (cog=True, overview_levels=, overview_resampling=).
  • External .tif.ovr sidecars.
  • BigTIFF (bigtiff=True).
  • fsspec cloud URIs (s3://, gs://, az://, ...) with the max_cloud_bytes budget.
  • HTTP / HTTPS range reads on a .tif URL.
  • Float predictor (predictor=3).
  • allow_rotated=True (read-only; write path drops the rotation silently).
  • allow_unparseable_crs=True.

Tier 3: Experimental / opt-in

Works in our tests, no claim about external interop or numerical parity across backends.

  • lerc codec. Lossy when max_z_error > 0; the bound is on per-pixel absolute error, not on downstream analytics.
  • jpeg2000 / j2k codec. CPU path uses glymur, GPU path uses nvJPEG2K, the two are not byte-for-byte equal.
  • lz4 codec. Listed in _VALID_COMPRESSIONS but rarely seen in the wild; reader support across GDAL versions is uneven.
  • GPU read (read_geotiff_gpu, open_geotiff(gpu=True)) and GPU write (write_geotiff_gpu, to_geotiff(gpu=True)). Requires cupy + numba CUDA, optionally nvCOMP / nvJPEG / nvJPEG2K; on_gpu_failure='strict' vs 'auto' controls fallback.
  • GDAL metadata XML round-trip and extra_tags pass-through.

Tier 4: Internal-only

Already gated behind an explicit opt-in. Used as the template for the rest.

  • allow_internal_only_jpeg=True on to_geotiff / write_geotiff_gpu. The encode emits self-contained JFIF tiles and skips the JPEGTables tag, so the output is unreadable by libtiff / GDAL / rasterio and only round-trips through this library's reader. Already emits GeoTIFFFallbackWarning. See write_geotiff_gpu emits JPEG TIFFs that other readers reject #1845.

Opt-in mechanism

Recommend per-feature flags rather than a single experimental=True kwarg. A single flag flips too much at once and hides which feature actually mattered when something goes wrong. The JPEG-internal-only flag is the model.

For Tier 3 codecs, the proposal is to reject the codec name in _VALID_COMPRESSIONS unless an allow_experimental_codecs=True kwarg is set on the writer, and emit GeoTIFFFallbackWarning when the opt-in fires (same warning class as the JPEG path). The rejection message names the kwarg, mirroring how the JPEG rejection points at allow_internal_only_jpeg.

For Tier 3 GPU paths, the existing gpu=True kwarg is already the explicit opt-in. The change is in the docstring: mark the GPU path Experimental, list the optional libraries, and link the parity-matrix issue.

For Tier 2 features, no kwarg gate is added. The docstring gets a short "Advanced" marker block at the top of the relevant parameter, naming the specific failure mode (e.g. cloud reads can run up cost, VRT can return partial mosaics under missing_sources='warn', rotated transforms drop on write).

No env vars. They are hard to discover and hard to test.

Docstring posture

Each public entry point gets a tier marker block right after the summary line. Each parameter that crosses tier boundaries gets a one-line marker:

  • Tier 1: no marker.
  • Tier 2: Advanced: <one-line caveat> at the start of the parameter doc.
  • Tier 3: Experimental: <one-line caveat> at the start of the parameter doc.
  • Tier 4: Internal-only: <one-line caveat> at the start of the parameter doc.

The compression docstring in _writers/eager.py already calls out lerc as lossy and jpeg as internal-only. Extending that pattern to the other codecs is the bulk of the change.

Deprecation plan

None. This issue is documentation plus opt-in flags only. The wire-level codec set is unchanged. Existing callers of lerc / j2k / lz4 will see a ValueError the first time they hit the new gate, with a message naming allow_experimental_codecs=True so they can fix it in one line. That is the same break shape allow_internal_only_jpeg already has, and we accepted it there.

If a future release wants to drop a codec entirely, it goes through the normal DeprecationWarning -> remove cycle. Tiering does not skip that.

Acceptance criteria

  1. A module-level xrspatial.geotiff.SUPPORTED_FEATURES constant (dict-of-tuples or a small dataclass) that enumerates every feature with its tier. The test suite imports it. Example shape:

    SUPPORTED_FEATURES = {
        'codec.zstd': 'stable',
        'codec.deflate': 'stable',
        'codec.lzw': 'stable',
        'codec.packbits': 'stable',
        'codec.none': 'stable',
        'codec.lerc': 'experimental',
        'codec.jpeg2000': 'experimental',
        'codec.j2k': 'experimental',
        'codec.lz4': 'experimental',
        'codec.jpeg': 'internal_only',
        'reader.local_file': 'stable',
        'reader.fsspec': 'advanced',
        'reader.http': 'advanced',
        'reader.vrt': 'advanced',
        'reader.sidecar_ovr': 'advanced',
        'reader.gpu': 'experimental',
        'reader.allow_rotated': 'advanced',
        'reader.allow_unparseable_crs': 'advanced',
        'writer.cog': 'advanced',
        'writer.overviews': 'advanced',
        'writer.bigtiff': 'advanced',
        'writer.gpu': 'experimental',
        'writer.gdal_metadata_xml': 'experimental',
        'writer.extra_tags': 'experimental',
    }
  2. A new allow_experimental_codecs: bool = False kwarg on to_geotiff and write_geotiff_gpu. Setting compression= to a Tier 3 codec without the flag raises ValueError whose message names the flag. Setting it with the flag emits GeoTIFFFallbackWarning once per call.

  3. The JPEG opt-in keeps its own flag name (allow_internal_only_jpeg). Internal-only is a stricter tier than experimental; the two should not collapse into one switch.

  4. Docstrings for open_geotiff, to_geotiff, read_geotiff_gpu, write_geotiff_gpu, read_vrt, write_vrt, and read_geotiff_dask carry a tier marker at the top and per-parameter tier markers where they cross tiers.

  5. A test in xrspatial/geotiff/tests/ walks SUPPORTED_FEATURES, checks that every Tier 3 entry has a matching ValueError rejection on the default writer call, and checks that every Tier 4 entry rejects without its specific opt-in flag.

  6. A short table in the geotiff user guide notebook (examples/user_guide/39_GeoTIFF_IO.ipynb) generated from SUPPORTED_FEATURES, so the documentation cannot drift from the code.

Tie-in: parity matrix (#2132)

The backend parity test matrix in #2132 needs a definition of which features parity is required for. Tier 1 is the answer. Tier 2 features get parity tests where the feature applies to multiple backends (e.g. cloud URIs are CPU-only, sidecar ovr is CPU-only, COG cuts across CPU and GPU writers). Tier 3 features only carry a single-backend test, since cross-backend numerical parity is not a claim Tier 3 makes. The SUPPORTED_FEATURES constant lets the parity matrix iterate the tier set programmatically rather than hard-coding the list.

Out of scope

  • Removing any codec or capability.
  • Changing the wire format.
  • Renaming any public function.
  • Adding new codecs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    apiAPI design and consistencyenhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions