Description
Polish items from the geotiff audit. Nothing's broken, but ten small ergonomic and safety items worth bundling:
- C-1:
to_geotiff only validates compression once it reaches _compression_tag, so the traceback is uglier than it needs to be. Validate at the top.
- C-2: Both
read_vrt and read_geotiff_dask check for .vrt and can dispatch to each other. Trace the path and either consolidate or just leave the defensive check in place.
- C-5:
write_vrt(**kwargs) doesn't list valid kwargs in the docstring. Spell them out.
- C-6: Document that
predictor=True and predictor=2 are equivalent, False/0/1 is none, and 3 is the fp predictor (float dtypes only).
- C-7: When
tiled=False, tile_size is silently ignored. Warn or document.
- P-3:
_mmap_cache has no eviction. Add a soft cap (default 32) configurable via XRSPATIAL_GEOTIFF_MMAP_CACHE_SIZE. Use OrderedDict for LRU.
- P-4: Decode parallelism gate is
compression != 1 and n_tiles > 4. Drop the compression filter and lower to n_tiles > 1 and tile_pixels > 64*1024. Uncompressed multi-tile reads still benefit from parallel memcpy.
- P-5:
read_geotiff_dask chunk auto-scaling can silently produce up to a million tasks. Lower the cap to 50,000 and raise ValueError instead of rescaling.
- P-6:
_gpu_decode.py has several cupy.empty() and cupy.zeros() calls that can OOM the GPU. Add a _check_gpu_memory(required_bytes) helper.
- P-9: COG auto-overview generation has no upper level limit. Cap at 8.
Approach
One PR, ten items, one minimal test per item. No public API changes other than the new env var and the extra ValueError on extreme dask reads.
Risks
- P-4: small/single-tile reads could regress if the threshold is wrong.
- P-5: raising instead of auto-scaling is a behavior change. Note in changelog.
Description
Polish items from the geotiff audit. Nothing's broken, but ten small ergonomic and safety items worth bundling:
to_geotiffonly validatescompressiononce it reaches_compression_tag, so the traceback is uglier than it needs to be. Validate at the top.read_vrtandread_geotiff_daskcheck for.vrtand can dispatch to each other. Trace the path and either consolidate or just leave the defensive check in place.write_vrt(**kwargs)doesn't list valid kwargs in the docstring. Spell them out.predictor=Trueandpredictor=2are equivalent,False/0/1is none, and3is the fp predictor (float dtypes only).tiled=False,tile_sizeis silently ignored. Warn or document._mmap_cachehas no eviction. Add a soft cap (default 32) configurable viaXRSPATIAL_GEOTIFF_MMAP_CACHE_SIZE. UseOrderedDictfor LRU.compression != 1 and n_tiles > 4. Drop the compression filter and lower ton_tiles > 1 and tile_pixels > 64*1024. Uncompressed multi-tile reads still benefit from parallel memcpy.read_geotiff_daskchunk auto-scaling can silently produce up to a million tasks. Lower the cap to 50,000 and raise ValueError instead of rescaling._gpu_decode.pyhas severalcupy.empty()andcupy.zeros()calls that can OOM the GPU. Add a_check_gpu_memory(required_bytes)helper.Approach
One PR, ten items, one minimal test per item. No public API changes other than the new env var and the extra ValueError on extreme dask reads.
Risks