Reason or Problem
The geotiff module has several broad except Exception: return None handlers that swallow errors silently. When something goes wrong (a corrupt VRT source, a missing nvCOMP library, a CUDA error during decode), the caller sees a None return or a working CPU fallback with no clue that the fast path failed. Regressions stay invisible in CI, and "GPU read" turns into "sometimes GPU, sometimes CPU".
Concrete sites flagged by audit:
xrspatial/geotiff/__init__.py:92. _wkt_to_epsg() swallows any error and returns None. A broken pyproj install or a non-string input both look like "EPSG unknown".
xrspatial/geotiff/_geotags.py:242. _epsg_to_wkt() has the same pattern. Round-trip WKT generation can fail without anyone noticing.
xrspatial/geotiff/_vrt.py:359. VRT source read failures continue past the dead tile. A partial mosaic comes back with a hole and no warning, masking partial-mosaic corruption.
xrspatial/geotiff/_gpu_decode.py:944, 950, 1054, 1173, 1350, 1494, 1625, 2411, 2600, 2745. GDS, nvCOMP batch decode, nvJPEG, nvJPEG2000, predictor decode, and Adler checksum stages all return None on any exception with zero diagnostics.
read_geotiff_gpu(on_gpu_failure='auto') (the default) already warns on the two top-level GPU stage failures, but the per-helper return-None paths above run underneath those stages and never emit anything.
Proposal
Two changes:
-
Replace each broad except Exception: return None at the cited lines with a narrower exception list and a warnings.warn(...). The warning category is RuntimeWarning for transient failures (_gpu_decode.py helpers) and UserWarning for metadata helpers (_wkt_to_epsg, _epsg_to_wkt, _vrt.py source skip). Warning messages include the original exception type and message so users can tell pyproj-missing from pyproj-broken-input.
-
Add an XRSPATIAL_GEOTIFF_STRICT=1 environment variable. When set, the same code paths re-raise instead of warning. A small helper _geotiff_strict_mode() reads the env var on each call. CI flips the var to fail loudly on any silent fallback.
Design:
def _geotiff_strict_mode() -> bool:
import os
return os.environ.get('XRSPATIAL_GEOTIFF_STRICT', '').lower() in ('1', 'true', 'yes')
def _wkt_to_epsg(wkt_or_proj: str) -> int | None:
try:
from pyproj import CRS
return CRS.from_user_input(wkt_or_proj).to_epsg()
except Exception as e:
if _geotiff_strict_mode():
raise
warnings.warn(
f"_wkt_to_epsg failed ({type(e).__name__}: {e}); returning None.",
UserWarning,
stacklevel=2,
)
return None
For read_geotiff_gpu(on_gpu_failure='auto') the existing two-stage warnings stay. The new env var also forces the GPU stages to raise even when the caller did not pass on_gpu_failure='strict', so CI can opt into strict mode globally.
Usage:
XRSPATIAL_GEOTIFF_STRICT=1 pytest xrspatial/geotiff/tests/
In default (unset) mode the behaviour is unchanged: warnings appear, fallbacks still happen.
Value: Silent failures become visible. CI catches GPU-path regressions without flipping every test to on_gpu_failure='strict' individually.
Stakeholders and Impacts
- Users running
read_geotiff_gpu see a RuntimeWarning when the GDS, nvCOMP, nvJPEG, or nvJPEG2000 helper returns None. Existing on_gpu_failure='auto' callers see one extra warning per failed helper.
- VRT readers see a
UserWarning when a source tile is skipped. Mosaics with missing tiles surface immediately.
- No public-API breakage. The env var is opt-in.
Drawbacks
More warnings in the default log stream for setups where pyproj is missing or where GDS is intentionally unavailable. Warnings are at module-helper granularity, so a deep stack of failures can emit several warnings per call.
Alternatives
- A
strict= kwarg on every public read/write function. Rejected because the failure sites are deep inside helpers; threading a kwarg through five layers is more invasive than a process-level env var.
- A
logging.getLogger call instead of warnings.warn. Rejected because the project already uses warnings.warn for the existing two GPU-stage fallback messages, and CI captures warnings via filterwarnings.
Unresolved Questions
Should _epsg_to_wkt and _wkt_to_epsg raise ImportError in strict mode when pyproj is missing, or fall back to whatever the caller's crs_wkt path produces?
Additional Notes or Context
Cited line numbers are from current main (post-#1653).
Reason or Problem
The geotiff module has several broad
except Exception: return Nonehandlers that swallow errors silently. When something goes wrong (a corrupt VRT source, a missing nvCOMP library, a CUDA error during decode), the caller sees aNonereturn or a working CPU fallback with no clue that the fast path failed. Regressions stay invisible in CI, and "GPU read" turns into "sometimes GPU, sometimes CPU".Concrete sites flagged by audit:
xrspatial/geotiff/__init__.py:92._wkt_to_epsg()swallows any error and returns None. A broken pyproj install or a non-string input both look like "EPSG unknown".xrspatial/geotiff/_geotags.py:242._epsg_to_wkt()has the same pattern. Round-trip WKT generation can fail without anyone noticing.xrspatial/geotiff/_vrt.py:359. VRT source read failurescontinuepast the dead tile. A partial mosaic comes back with a hole and no warning, masking partial-mosaic corruption.xrspatial/geotiff/_gpu_decode.py:944, 950, 1054, 1173, 1350, 1494, 1625, 2411, 2600, 2745. GDS, nvCOMP batch decode, nvJPEG, nvJPEG2000, predictor decode, and Adler checksum stages all return None on any exception with zero diagnostics.read_geotiff_gpu(on_gpu_failure='auto')(the default) already warns on the two top-level GPU stage failures, but the per-helper return-None paths above run underneath those stages and never emit anything.Proposal
Two changes:
Replace each broad
except Exception: return Noneat the cited lines with a narrower exception list and awarnings.warn(...). The warning category isRuntimeWarningfor transient failures (_gpu_decode.pyhelpers) andUserWarningfor metadata helpers (_wkt_to_epsg,_epsg_to_wkt,_vrt.pysource skip). Warning messages include the original exception type and message so users can tell pyproj-missing from pyproj-broken-input.Add an
XRSPATIAL_GEOTIFF_STRICT=1environment variable. When set, the same code paths re-raise instead of warning. A small helper_geotiff_strict_mode()reads the env var on each call. CI flips the var to fail loudly on any silent fallback.Design:
For
read_geotiff_gpu(on_gpu_failure='auto')the existing two-stage warnings stay. The new env var also forces the GPU stages to raise even when the caller did not passon_gpu_failure='strict', so CI can opt into strict mode globally.Usage:
In default (unset) mode the behaviour is unchanged: warnings appear, fallbacks still happen.
Value: Silent failures become visible. CI catches GPU-path regressions without flipping every test to
on_gpu_failure='strict'individually.Stakeholders and Impacts
read_geotiff_gpusee aRuntimeWarningwhen the GDS, nvCOMP, nvJPEG, or nvJPEG2000 helper returns None. Existingon_gpu_failure='auto'callers see one extra warning per failed helper.UserWarningwhen a source tile is skipped. Mosaics with missing tiles surface immediately.Drawbacks
More warnings in the default log stream for setups where pyproj is missing or where GDS is intentionally unavailable. Warnings are at module-helper granularity, so a deep stack of failures can emit several warnings per call.
Alternatives
strict=kwarg on every public read/write function. Rejected because the failure sites are deep inside helpers; threading a kwarg through five layers is more invasive than a process-level env var.logging.getLoggercall instead ofwarnings.warn. Rejected because the project already useswarnings.warnfor the existing two GPU-stage fallback messages, and CI captures warnings viafilterwarnings.Unresolved Questions
Should
_epsg_to_wktand_wkt_to_epsgraiseImportErrorin strict mode when pyproj is missing, or fall back to whatever the caller'scrs_wktpath produces?Additional Notes or Context
Cited line numbers are from current main (post-#1653).