You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The xrspatial.geotiff module has grown a lot of surface area. Some of it is the boring core (read a local .tif, write a .tif). Some of it is genuinely impressive but risky: VRT mosaics, COG, cloud URLs, GPU encode/decode via nvCOMP and nvJPEG, JPEG-in-TIFF, JPEG2000, LERC, GDAL metadata pass-through, external .tif.ovr sidecars, rotated ModelTransformationTag files.
A new user reading from xrspatial.geotiff import open_geotiff, to_geotiff cannot tell from the public surface which combinations are the supported "happy path" and which are research-grade. Every codec sits flat in _VALID_COMPRESSIONS. Every kwarg sits flat in the docstring. The one feature we have already gated behind an explicit opt-in is allow_internal_only_jpeg on to_geotiff (see xrspatial/geotiff/tests/test_to_geotiff_allow_internal_only_jpeg_parity.py), and that posture is the right one: the writer accepts JPEG only when the caller asks for it by name and emits GeoTIFFFallbackWarning so they know the output will not round-trip through libtiff / GDAL / rasterio.
This issue proposes extending that posture to the rest of the module: tier the features, document the tiers, and put an opt-in or warning in front of anything beyond the stable core. Nothing is removed.
Current feature inventory
From xrspatial/geotiff/__init__.py, _writers/eager.py, _writers/gpu.py, _backends/{gpu,dask,vrt}.py, _compression.py, _reader.py, _sidecar.py:
GDAL metadata XML round-trip and extra_tags pass-through.
Tier 4: Internal-only
Already gated behind an explicit opt-in. Used as the template for the rest.
allow_internal_only_jpeg=True on to_geotiff / write_geotiff_gpu. The encode emits self-contained JFIF tiles and skips the JPEGTables tag, so the output is unreadable by libtiff / GDAL / rasterio and only round-trips through this library's reader. Already emits GeoTIFFFallbackWarning. See write_geotiff_gpu emits JPEG TIFFs that other readers reject #1845.
Opt-in mechanism
Recommend per-feature flags rather than a single experimental=True kwarg. A single flag flips too much at once and hides which feature actually mattered when something goes wrong. The JPEG-internal-only flag is the model.
For Tier 3 codecs, the proposal is to reject the codec name in _VALID_COMPRESSIONS unless an allow_experimental_codecs=True kwarg is set on the writer, and emit GeoTIFFFallbackWarning when the opt-in fires (same warning class as the JPEG path). The rejection message names the kwarg, mirroring how the JPEG rejection points at allow_internal_only_jpeg.
For Tier 3 GPU paths, the existing gpu=True kwarg is already the explicit opt-in. The change is in the docstring: mark the GPU path Experimental, list the optional libraries, and link the parity-matrix issue.
For Tier 2 features, no kwarg gate is added. The docstring gets a short "Advanced" marker block at the top of the relevant parameter, naming the specific failure mode (e.g. cloud reads can run up cost, VRT can return partial mosaics under missing_sources='warn', rotated transforms drop on write).
No env vars. They are hard to discover and hard to test.
Docstring posture
Each public entry point gets a tier marker block right after the summary line. Each parameter that crosses tier boundaries gets a one-line marker:
Tier 1: no marker.
Tier 2: Advanced: <one-line caveat> at the start of the parameter doc.
Tier 3: Experimental: <one-line caveat> at the start of the parameter doc.
Tier 4: Internal-only: <one-line caveat> at the start of the parameter doc.
The compression docstring in _writers/eager.py already calls out lerc as lossy and jpeg as internal-only. Extending that pattern to the other codecs is the bulk of the change.
Deprecation plan
None. This issue is documentation plus opt-in flags only. The wire-level codec set is unchanged. Existing callers of lerc / j2k / lz4 will see a ValueError the first time they hit the new gate, with a message naming allow_experimental_codecs=True so they can fix it in one line. That is the same break shape allow_internal_only_jpeg already has, and we accepted it there.
If a future release wants to drop a codec entirely, it goes through the normal DeprecationWarning -> remove cycle. Tiering does not skip that.
Acceptance criteria
A module-level xrspatial.geotiff.SUPPORTED_FEATURES constant (dict-of-tuples or a small dataclass) that enumerates every feature with its tier. The test suite imports it. Example shape:
A new allow_experimental_codecs: bool = False kwarg on to_geotiff and write_geotiff_gpu. Setting compression= to a Tier 3 codec without the flag raises ValueError whose message names the flag. Setting it with the flag emits GeoTIFFFallbackWarning once per call.
The JPEG opt-in keeps its own flag name (allow_internal_only_jpeg). Internal-only is a stricter tier than experimental; the two should not collapse into one switch.
Docstrings for open_geotiff, to_geotiff, read_geotiff_gpu, write_geotiff_gpu, read_vrt, write_vrt, and read_geotiff_dask carry a tier marker at the top and per-parameter tier markers where they cross tiers.
A test in xrspatial/geotiff/tests/ walks SUPPORTED_FEATURES, checks that every Tier 3 entry has a matching ValueError rejection on the default writer call, and checks that every Tier 4 entry rejects without its specific opt-in flag.
A short table in the geotiff user guide notebook (examples/user_guide/39_GeoTIFF_IO.ipynb) generated from SUPPORTED_FEATURES, so the documentation cannot drift from the code.
The backend parity test matrix in #2132 needs a definition of which features parity is required for. Tier 1 is the answer. Tier 2 features get parity tests where the feature applies to multiple backends (e.g. cloud URIs are CPU-only, sidecar ovr is CPU-only, COG cuts across CPU and GPU writers). Tier 3 features only carry a single-backend test, since cross-backend numerical parity is not a claim Tier 3 makes. The SUPPORTED_FEATURES constant lets the parity matrix iterate the tier set programmatically rather than hard-coding the list.
Why
The
xrspatial.geotiffmodule has grown a lot of surface area. Some of it is the boring core (read a local.tif, write a.tif). Some of it is genuinely impressive but risky: VRT mosaics, COG, cloud URLs, GPU encode/decode via nvCOMP and nvJPEG, JPEG-in-TIFF, JPEG2000, LERC, GDAL metadata pass-through, external.tif.ovrsidecars, rotatedModelTransformationTagfiles.A new user reading
from xrspatial.geotiff import open_geotiff, to_geotiffcannot tell from the public surface which combinations are the supported "happy path" and which are research-grade. Every codec sits flat in_VALID_COMPRESSIONS. Every kwarg sits flat in the docstring. The one feature we have already gated behind an explicit opt-in isallow_internal_only_jpegonto_geotiff(seexrspatial/geotiff/tests/test_to_geotiff_allow_internal_only_jpeg_parity.py), and that posture is the right one: the writer accepts JPEG only when the caller asks for it by name and emitsGeoTIFFFallbackWarningso they know the output will not round-trip through libtiff / GDAL / rasterio.This issue proposes extending that posture to the rest of the module: tier the features, document the tiers, and put an opt-in or warning in front of anything beyond the stable core. Nothing is removed.
Current feature inventory
From
xrspatial/geotiff/__init__.py,_writers/eager.py,_writers/gpu.py,_backends/{gpu,dask,vrt}.py,_compression.py,_reader.py,_sidecar.py:Read entry points:
open_geotiff,read_geotiff_gpu,read_geotiff_dask,read_vrt.Write entry points:
to_geotiff,write_geotiff_gpu,write_vrt.Codec set in
_attrs.py:_VALID_COMPRESSIONS:none,deflate,lzw,jpeg,packbits,zstd,lz4,jpeg2000/j2k,lerc.Other capabilities: COG (
cog=True), overview pyramids (overview_levels=), external.tif.ovrsidecars (_sidecar.py), VRT mosaics, fsspec / cloud URIs (s3://,gs://,az://,abfs://,memory://), HTTP range reads, GPU nvCOMP / nvJPEG / nvJPEG2K, BigTIFF, rotated transforms (allow_rotated=True), unparseable CRS (allow_unparseable_crs=True), GDAL metadata pass-through andextra_tags.Proposed tiering
Tier 1: Stable
The path a new user should be on. Local file in, local file out, common codec, axis-aligned grid.
open_geotiff(path)andto_geotiff(da, path)on a local file system path.none,deflate,zstd,lzw,packbits(lossless integer + float, byte-for-byte round-trip).attrs['crs'].window=,band=,dtype=,nodata=,tiled=,tile_size=,predictor=.chunks=for dask out-of-core reads on a local path.Tier 2: Supported but advanced
Works, is tested, but the caller should know what they are signing up for.
read_vrt,write_vrt). Cross-source nodata, missing backing files, partial mosaics, per-band metadata disagreement.cog=True,overview_levels=,overview_resampling=)..tif.ovrsidecars.bigtiff=True).s3://,gs://,az://, ...) with themax_cloud_bytesbudget..tifURL.predictor=3).allow_rotated=True(read-only; write path drops the rotation silently).allow_unparseable_crs=True.Tier 3: Experimental / opt-in
Works in our tests, no claim about external interop or numerical parity across backends.
lerccodec. Lossy whenmax_z_error > 0; the bound is on per-pixel absolute error, not on downstream analytics.jpeg2000/j2kcodec. CPU path uses glymur, GPU path uses nvJPEG2K, the two are not byte-for-byte equal.lz4codec. Listed in_VALID_COMPRESSIONSbut rarely seen in the wild; reader support across GDAL versions is uneven.read_geotiff_gpu,open_geotiff(gpu=True)) and GPU write (write_geotiff_gpu,to_geotiff(gpu=True)). Requires cupy + numba CUDA, optionally nvCOMP / nvJPEG / nvJPEG2K;on_gpu_failure='strict'vs'auto'controls fallback.extra_tagspass-through.Tier 4: Internal-only
Already gated behind an explicit opt-in. Used as the template for the rest.
allow_internal_only_jpeg=Trueonto_geotiff/write_geotiff_gpu. The encode emits self-contained JFIF tiles and skips the JPEGTables tag, so the output is unreadable by libtiff / GDAL / rasterio and only round-trips through this library's reader. Already emitsGeoTIFFFallbackWarning. See write_geotiff_gpu emits JPEG TIFFs that other readers reject #1845.Opt-in mechanism
Recommend per-feature flags rather than a single
experimental=Truekwarg. A single flag flips too much at once and hides which feature actually mattered when something goes wrong. The JPEG-internal-only flag is the model.For Tier 3 codecs, the proposal is to reject the codec name in
_VALID_COMPRESSIONSunless anallow_experimental_codecs=Truekwarg is set on the writer, and emitGeoTIFFFallbackWarningwhen the opt-in fires (same warning class as the JPEG path). The rejection message names the kwarg, mirroring how the JPEG rejection points atallow_internal_only_jpeg.For Tier 3 GPU paths, the existing
gpu=Truekwarg is already the explicit opt-in. The change is in the docstring: mark the GPU path Experimental, list the optional libraries, and link the parity-matrix issue.For Tier 2 features, no kwarg gate is added. The docstring gets a short "Advanced" marker block at the top of the relevant parameter, naming the specific failure mode (e.g. cloud reads can run up cost, VRT can return partial mosaics under
missing_sources='warn', rotated transforms drop on write).No env vars. They are hard to discover and hard to test.
Docstring posture
Each public entry point gets a tier marker block right after the summary line. Each parameter that crosses tier boundaries gets a one-line marker:
Advanced: <one-line caveat>at the start of the parameter doc.Experimental: <one-line caveat>at the start of the parameter doc.Internal-only: <one-line caveat>at the start of the parameter doc.The compression docstring in
_writers/eager.pyalready calls outlercas lossy andjpegas internal-only. Extending that pattern to the other codecs is the bulk of the change.Deprecation plan
None. This issue is documentation plus opt-in flags only. The wire-level codec set is unchanged. Existing callers of
lerc/j2k/lz4will see aValueErrorthe first time they hit the new gate, with a message namingallow_experimental_codecs=Trueso they can fix it in one line. That is the same break shapeallow_internal_only_jpegalready has, and we accepted it there.If a future release wants to drop a codec entirely, it goes through the normal
DeprecationWarning-> remove cycle. Tiering does not skip that.Acceptance criteria
A module-level
xrspatial.geotiff.SUPPORTED_FEATURESconstant (dict-of-tuples or a small dataclass) that enumerates every feature with its tier. The test suite imports it. Example shape:A new
allow_experimental_codecs: bool = Falsekwarg onto_geotiffandwrite_geotiff_gpu. Settingcompression=to a Tier 3 codec without the flag raisesValueErrorwhose message names the flag. Setting it with the flag emitsGeoTIFFFallbackWarningonce per call.The JPEG opt-in keeps its own flag name (
allow_internal_only_jpeg). Internal-only is a stricter tier than experimental; the two should not collapse into one switch.Docstrings for
open_geotiff,to_geotiff,read_geotiff_gpu,write_geotiff_gpu,read_vrt,write_vrt, andread_geotiff_daskcarry a tier marker at the top and per-parameter tier markers where they cross tiers.A test in
xrspatial/geotiff/tests/walksSUPPORTED_FEATURES, checks that every Tier 3 entry has a matchingValueErrorrejection on the default writer call, and checks that every Tier 4 entry rejects without its specific opt-in flag.A short table in the geotiff user guide notebook (
examples/user_guide/39_GeoTIFF_IO.ipynb) generated fromSUPPORTED_FEATURES, so the documentation cannot drift from the code.Tie-in: parity matrix (#2132)
The backend parity test matrix in #2132 needs a definition of which features parity is required for. Tier 1 is the answer. Tier 2 features get parity tests where the feature applies to multiple backends (e.g. cloud URIs are CPU-only, sidecar ovr is CPU-only, COG cuts across CPU and GPU writers). Tier 3 features only carry a single-backend test, since cross-backend numerical parity is not a claim Tier 3 makes. The
SUPPORTED_FEATURESconstant lets the parity matrix iterate the tier set programmatically rather than hard-coding the list.Out of scope