Skip to content

Add dtype, compression_level, and VRT output to geotiff I/O #1083

@brendancol

Description

@brendancol

Author of Proposal: @brendan

Reason or Problem

open_geotiff and to_geotiff work well but lack a few explicit knobs for
controlling memory and compression behaviour:

  1. No way to downcast on read. A float64 DEM loaded as float32 would use half
    the memory, but callers have to do the cast themselves after the full-precision
    array is already allocated.

  2. Compression codecs (deflate, zstd, lz4) support tunable levels internally,
    but the public API hardcodes defaults. Users can't trade write speed for file
    size or vice versa.

  3. to_geotiff calls .compute() on dask inputs, pulling the entire raster
    into RAM before writing. For large rasters this is a dealbreaker. Writing to
    a VRT of tiled GeoTIFFs would let each dask chunk compute and flush
    independently.

Proposal

Design:

Three new controls, all opt-in with unchanged defaults:

  • dtype parameter on open_geotiff -- casts each tile/strip to the target
    dtype during decode. For the numba LZW path, the cast happens per-value so
    the tile buffer is never allocated at native dtype. Other codecs cast one
    tile at a time after decompression.

  • compression_level parameter on to_geotiff -- integer passed through to
    the codec. deflate 1-9 (default 6), zstd 1-22 (default 3), lz4 0-16
    (default 0). Codecs without level support ignore it silently.

  • VRT output triggered by .vrt extension on the output path. Creates a sibling
    {stem}_tiles/ directory with one GeoTIFF per dask chunk (or per tile_size
    slice for numpy input). The VRT index uses relative paths so the whole thing
    is portable. cog=True with .vrt raises ValueError.

Full spec: docs/superpowers/specs/2026-03-30-geotiff-perf-controls-design.md

Usage:

# Half the memory on read
dem = open_geotiff('big_dem.tif', dtype='float32')

# Same with dask -- each chunk is float32
dem = open_geotiff('big_dem.tif', dtype='float32', chunks=512)

# Fast compression (level 1) for scratch files
to_geotiff(data, 'scratch.tif', compression_level=1)

# Max compression for archival
to_geotiff(data, 'archive.tif', compression='zstd', compression_level=22)

# Stream a dask array to disk without materializing it
to_geotiff(dask_da, 'output.vrt', compression='zstd')
# produces: output.vrt + output_tiles/tile_0000_0000.tif, ...

Value: Gives users direct control over the memory/speed/size tradeoffs
that matter most when working with large rasters. No hidden magic.

Stakeholders and Impacts

Additive changes to open_geotiff and to_geotiff. Existing callers are
unaffected (all new parameters default to current behaviour). The write()
function in _writer.py and compress() in _compression.py gain a
compression_level pass-through.

Drawbacks

  • dtype on read adds a validation surface (float-to-int is rejected).
  • VRT output creates a directory of files instead of a single file, which is
    less convenient for simple workflows.

Alternatives

  • Streaming write to a monolithic TIFF (offset patching). More complex and
    tracked separately.
  • Automatic dtype inference based on data range. Too magical for this project's
    philosophy.

Unresolved Questions

None. See the full spec for details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions