Skip to content

Geotiff polish: validation, caching caps, parallelism thresholds, memory guards #1488

@brendancol

Description

@brendancol

Description

Polish items from the geotiff audit. Nothing's broken, but ten small ergonomic and safety items worth bundling:

  • C-1: to_geotiff only validates compression once it reaches _compression_tag, so the traceback is uglier than it needs to be. Validate at the top.
  • C-2: Both read_vrt and read_geotiff_dask check for .vrt and can dispatch to each other. Trace the path and either consolidate or just leave the defensive check in place.
  • C-5: write_vrt(**kwargs) doesn't list valid kwargs in the docstring. Spell them out.
  • C-6: Document that predictor=True and predictor=2 are equivalent, False/0/1 is none, and 3 is the fp predictor (float dtypes only).
  • C-7: When tiled=False, tile_size is silently ignored. Warn or document.
  • P-3: _mmap_cache has no eviction. Add a soft cap (default 32) configurable via XRSPATIAL_GEOTIFF_MMAP_CACHE_SIZE. Use OrderedDict for LRU.
  • P-4: Decode parallelism gate is compression != 1 and n_tiles > 4. Drop the compression filter and lower to n_tiles > 1 and tile_pixels > 64*1024. Uncompressed multi-tile reads still benefit from parallel memcpy.
  • P-5: read_geotiff_dask chunk auto-scaling can silently produce up to a million tasks. Lower the cap to 50,000 and raise ValueError instead of rescaling.
  • P-6: _gpu_decode.py has several cupy.empty() and cupy.zeros() calls that can OOM the GPU. Add a _check_gpu_memory(required_bytes) helper.
  • P-9: COG auto-overview generation has no upper level limit. Cap at 8.

Approach

One PR, ten items, one minimal test per item. No public API changes other than the new env var and the extra ValueError on extreme dask reads.

Risks

  • P-4: small/single-tile reads could regress if the threshold is wrong.
  • P-5: raising instead of auto-scaling is a behavior change. Note in changelog.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions