Skip to content

perf(geotiff): _write_vrt_tiled uses synchronous scheduler, defeating parallel tile writes #1714

@brendancol

Description

@brendancol

Reason or Problem

_write_vrt_tiled in xrspatial/geotiff/__init__.py (line 1708) writes a dask-backed DataArray to a directory of tiled GeoTIFFs by building one dask.delayed task per tile, then executing them all with:

dask.compute(*delayed_tasks, scheduler='synchronous')

The synchronous scheduler runs every task one at a time on the current thread. Tile writes are independent (different chunks, different output files, no shared mutable state inside _write_single_tile), so the parallelism is left on the table.

Microbench on a single 16-thread machine, 4096x4096 float32 random data, chunks=256, zstd compression, 256 output tiles:

Scheduler Wall time
synchronous (current) 0.49 s
threads (monkey-patched) 0.33 s

That is a ~33% reduction with zero correctness risk on this path. The gain grows with tile count and with codec cost (zstd level 9 / LERC are higher per-tile CPU).

Proposal

Switch to the threaded scheduler explicitly:

dask.compute(*delayed_tasks, scheduler='threads')

Considerations:

  1. _write_single_tile opens a fresh file path per tile and never mutates shared Python state, so threading is safe.
  2. zstd/zlib/LZW release the GIL during compression, so threading delivers real parallelism on the compression stage.
  3. File-write concurrency is bounded by the dask thread pool default (a few threads on a typical box). Local filesystems handle that fine; the OS write-back coalesces.
  4. On dask+cupy data, each thread will land in chunk_data.get() independently. cupy releases the GIL on D2H, so threading is still safe.

If a future caller wants to write to a slow networked filesystem and serialise writes intentionally, they can set DASK_SCHEDULER=synchronous in the environment to override.

Acceptance criteria

  • _write_vrt_tiled uses scheduler='threads'.
  • Existing tests in xrspatial/geotiff/tests/test_vrt_tiled_metadata_1606.py, test_polish_1488.py, and the dask-backed VRT write paths continue to pass.
  • Microbench (to_geotiff(da, 'out.vrt', compression='zstd') on a 4096x4096 chunks=256 dask DataArray) shows wall-time reduction comparable to the 33% measured.

Context

Found via deep-sweep performance audit on 2026-05-12. Cat 2 (Dask chunking): synchronous scheduler on an embarrassingly-parallel write loop.

Original code from #1083 / #1085 (May 2025) used synchronous without a documented rationale; the comment block above the call does not mention threading. Looks like a defensive default that no longer needs to be defensive.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestperformancePR touches performance-sensitive code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions