Author of Proposal: @brendan
Reason or Problem
open_geotiff and to_geotiff work well but lack a few explicit knobs for
controlling memory and compression behaviour:
-
No way to downcast on read. A float64 DEM loaded as float32 would use half
the memory, but callers have to do the cast themselves after the full-precision
array is already allocated.
-
Compression codecs (deflate, zstd, lz4) support tunable levels internally,
but the public API hardcodes defaults. Users can't trade write speed for file
size or vice versa.
-
to_geotiff calls .compute() on dask inputs, pulling the entire raster
into RAM before writing. For large rasters this is a dealbreaker. Writing to
a VRT of tiled GeoTIFFs would let each dask chunk compute and flush
independently.
Proposal
Design:
Three new controls, all opt-in with unchanged defaults:
-
dtype parameter on open_geotiff -- casts each tile/strip to the target
dtype during decode. For the numba LZW path, the cast happens per-value so
the tile buffer is never allocated at native dtype. Other codecs cast one
tile at a time after decompression.
-
compression_level parameter on to_geotiff -- integer passed through to
the codec. deflate 1-9 (default 6), zstd 1-22 (default 3), lz4 0-16
(default 0). Codecs without level support ignore it silently.
-
VRT output triggered by .vrt extension on the output path. Creates a sibling
{stem}_tiles/ directory with one GeoTIFF per dask chunk (or per tile_size
slice for numpy input). The VRT index uses relative paths so the whole thing
is portable. cog=True with .vrt raises ValueError.
Full spec: docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.md
Usage:
# Half the memory on read
dem = open_geotiff('big_dem.tif', dtype='float32')
# Same with dask -- each chunk is float32
dem = open_geotiff('big_dem.tif', dtype='float32', chunks=512)
# Fast compression (level 1) for scratch files
to_geotiff(data, 'scratch.tif', compression_level=1)
# Max compression for archival
to_geotiff(data, 'archive.tif', compression='zstd', compression_level=22)
# Stream a dask array to disk without materializing it
to_geotiff(dask_da, 'output.vrt', compression='zstd')
# produces: output.vrt + output_tiles/tile_0000_0000.tif, ...
Value: Gives users direct control over the memory/speed/size tradeoffs
that matter most when working with large rasters. No hidden magic.
Stakeholders and Impacts
Additive changes to open_geotiff and to_geotiff. Existing callers are
unaffected (all new parameters default to current behaviour). The write()
function in _writer.py and compress() in _compression.py gain a
compression_level pass-through.
Drawbacks
dtype on read adds a validation surface (float-to-int is rejected).
- VRT output creates a directory of files instead of a single file, which is
less convenient for simple workflows.
Alternatives
- Streaming write to a monolithic TIFF (offset patching). More complex and
tracked separately.
- Automatic dtype inference based on data range. Too magical for this project's
philosophy.
Unresolved Questions
None. See the full spec for details.
Author of Proposal: @brendan
Reason or Problem
open_geotiffandto_geotiffwork well but lack a few explicit knobs forcontrolling memory and compression behaviour:
No way to downcast on read. A float64 DEM loaded as float32 would use half
the memory, but callers have to do the cast themselves after the full-precision
array is already allocated.
Compression codecs (deflate, zstd, lz4) support tunable levels internally,
but the public API hardcodes defaults. Users can't trade write speed for file
size or vice versa.
to_geotiffcalls.compute()on dask inputs, pulling the entire rasterinto RAM before writing. For large rasters this is a dealbreaker. Writing to
a VRT of tiled GeoTIFFs would let each dask chunk compute and flush
independently.
Proposal
Design:
Three new controls, all opt-in with unchanged defaults:
dtypeparameter onopen_geotiff-- casts each tile/strip to the targetdtype during decode. For the numba LZW path, the cast happens per-value so
the tile buffer is never allocated at native dtype. Other codecs cast one
tile at a time after decompression.
compression_levelparameter onto_geotiff-- integer passed through tothe codec. deflate 1-9 (default 6), zstd 1-22 (default 3), lz4 0-16
(default 0). Codecs without level support ignore it silently.
VRT output triggered by
.vrtextension on the output path. Creates a sibling{stem}_tiles/directory with one GeoTIFF per dask chunk (or per tile_sizeslice for numpy input). The VRT index uses relative paths so the whole thing
is portable.
cog=Truewith.vrtraises ValueError.Full spec:
docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.mdUsage:
Value: Gives users direct control over the memory/speed/size tradeoffs
that matter most when working with large rasters. No hidden magic.
Stakeholders and Impacts
Additive changes to
open_geotiffandto_geotiff. Existing callers areunaffected (all new parameters default to current behaviour). The
write()function in
_writer.pyandcompress()in_compression.pygain acompression_levelpass-through.Drawbacks
dtypeon read adds a validation surface (float-to-int is rejected).less convenient for simple workflows.
Alternatives
tracked separately.
philosophy.
Unresolved Questions
None. See the full spec for details.