Skip to content

perf(geotiff): parallelise strip decode in _read_strips and _fetch_decode_cog_http_strips #2100

@brendancol

Description

@brendancol

Summary

Two strip-decode paths run their per-strip codec calls in a Python for-loop while the matching tile paths use a ThreadPoolExecutor gated on _PARALLEL_DECODE_PIXEL_THRESHOLD (64K pixels). Codec decode for deflate, zstd and LZW releases the GIL, so the tile paths overlap C-level work across cores; the strip paths leave that parallelism on the table.

Pattern matches issue #1980 (the previous audit fixed the HTTP tile path in #1981).

Locations

  1. xrspatial/geotiff/_reader.py _read_strips at ~L1972 -- local-file strip decode. Tile counterpart _read_tiles at ~L2146 already parallelises.
  2. xrspatial/geotiff/_reader.py _fetch_decode_cog_http_strips at ~L2670 -- HTTP COG strip decode. Tile counterpart _fetch_decode_cog_http_tiles at ~L2898 already parallelises (fixed in geotiff: parallelise tile decode in _fetch_decode_cog_http_tiles (#1980) #1981).

Proposed fix

Mirror the tile-path gate in both strip paths:

  • Compute strip_pixels = width * rps.
  • When n_strips > 1 and strip_pixels >= _PARALLEL_DECODE_PIXEL_THRESHOLD, run _decode_strip_or_tile calls via a ThreadPoolExecutor with min(n_strips, os.cpu_count() or 4) workers.
  • Keep the placement loop sequential to avoid contending writes into the output buffer.

Why MEDIUM

Most real-world strip layouts (width >= 1024, rps >= 64) clear the 64K-pixel gate per strip, so the speedup applies to any multi-strip read. Codec choice matters: deflate/zstd/LZW release the GIL during decompression; uncompressed strips still see a numpy frombuffer + copy cost the threaded path overlaps.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions