Skip to content

Replace silent except-return-None in geotiff with typed exceptions + strict mode #1662

@brendancol

Description

@brendancol

Reason or Problem

The geotiff module has several broad except Exception: return None handlers that swallow errors silently. When something goes wrong (a corrupt VRT source, a missing nvCOMP library, a CUDA error during decode), the caller sees a None return or a working CPU fallback with no clue that the fast path failed. Regressions stay invisible in CI, and "GPU read" turns into "sometimes GPU, sometimes CPU".

Concrete sites flagged by audit:

  • xrspatial/geotiff/__init__.py:92. _wkt_to_epsg() swallows any error and returns None. A broken pyproj install or a non-string input both look like "EPSG unknown".
  • xrspatial/geotiff/_geotags.py:242. _epsg_to_wkt() has the same pattern. Round-trip WKT generation can fail without anyone noticing.
  • xrspatial/geotiff/_vrt.py:359. VRT source read failures continue past the dead tile. A partial mosaic comes back with a hole and no warning, masking partial-mosaic corruption.
  • xrspatial/geotiff/_gpu_decode.py:944, 950, 1054, 1173, 1350, 1494, 1625, 2411, 2600, 2745. GDS, nvCOMP batch decode, nvJPEG, nvJPEG2000, predictor decode, and Adler checksum stages all return None on any exception with zero diagnostics.
  • read_geotiff_gpu(on_gpu_failure='auto') (the default) already warns on the two top-level GPU stage failures, but the per-helper return-None paths above run underneath those stages and never emit anything.

Proposal

Two changes:

  1. Replace each broad except Exception: return None at the cited lines with a narrower exception list and a warnings.warn(...). The warning category is RuntimeWarning for transient failures (_gpu_decode.py helpers) and UserWarning for metadata helpers (_wkt_to_epsg, _epsg_to_wkt, _vrt.py source skip). Warning messages include the original exception type and message so users can tell pyproj-missing from pyproj-broken-input.

  2. Add an XRSPATIAL_GEOTIFF_STRICT=1 environment variable. When set, the same code paths re-raise instead of warning. A small helper _geotiff_strict_mode() reads the env var on each call. CI flips the var to fail loudly on any silent fallback.

Design:

def _geotiff_strict_mode() -> bool:
    import os
    return os.environ.get('XRSPATIAL_GEOTIFF_STRICT', '').lower() in ('1', 'true', 'yes')


def _wkt_to_epsg(wkt_or_proj: str) -> int | None:
    try:
        from pyproj import CRS
        return CRS.from_user_input(wkt_or_proj).to_epsg()
    except Exception as e:
        if _geotiff_strict_mode():
            raise
        warnings.warn(
            f"_wkt_to_epsg failed ({type(e).__name__}: {e}); returning None.",
            UserWarning,
            stacklevel=2,
        )
        return None

For read_geotiff_gpu(on_gpu_failure='auto') the existing two-stage warnings stay. The new env var also forces the GPU stages to raise even when the caller did not pass on_gpu_failure='strict', so CI can opt into strict mode globally.

Usage:

XRSPATIAL_GEOTIFF_STRICT=1 pytest xrspatial/geotiff/tests/

In default (unset) mode the behaviour is unchanged: warnings appear, fallbacks still happen.

Value: Silent failures become visible. CI catches GPU-path regressions without flipping every test to on_gpu_failure='strict' individually.

Stakeholders and Impacts

  • Users running read_geotiff_gpu see a RuntimeWarning when the GDS, nvCOMP, nvJPEG, or nvJPEG2000 helper returns None. Existing on_gpu_failure='auto' callers see one extra warning per failed helper.
  • VRT readers see a UserWarning when a source tile is skipped. Mosaics with missing tiles surface immediately.
  • No public-API breakage. The env var is opt-in.

Drawbacks

More warnings in the default log stream for setups where pyproj is missing or where GDS is intentionally unavailable. Warnings are at module-helper granularity, so a deep stack of failures can emit several warnings per call.

Alternatives

  • A strict= kwarg on every public read/write function. Rejected because the failure sites are deep inside helpers; threading a kwarg through five layers is more invasive than a process-level env var.
  • A logging.getLogger call instead of warnings.warn. Rejected because the project already uses warnings.warn for the existing two GPU-stage fallback messages, and CI captures warnings via filterwarnings.

Unresolved Questions

Should _epsg_to_wkt and _wkt_to_epsg raise ImportError in strict mode when pyproj is missing, or fall back to whatever the caller's crs_wkt path produces?

Additional Notes or Context

Cited line numbers are from current main (post-#1653).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgpuCuPy / CUDA GPU supportinput-validationInput validation and error messages

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions