Skip to content

geotiff: jit packbits_compress #2049

@brendancol

Description

@brendancol

Reason or Problem

xrspatial.geotiff._compression.packbits_compress (lines 1021-1051) mirrors the decode case: a pure-Python double while loop over the input bytes. The outer loop scans for runs of 3+ identical bytes; the inner loop accumulates literal spans up to 128 bytes. Output goes into a bytearray.

Every other codec helper in _compression.py is @ngjit (see #2048 for the decode-side gap and the full list). PackBits compress is the only writer-side codec helper that runs in pure Python.

Proposal

Same shape as #2048: a jit kernel against preallocated uint8 buffers, with a bytes-in/bytes-out wrapper.

@ngjit
def _packbits_encode_kernel(src, src_len, dst, dst_cap):
    # outer: detect run >= 3 of identical bytes
    # inner: accumulate literal run, break on next run start
    # returns out_pos

def packbits_compress(data: bytes) -> bytes:
    src = np.frombuffer(data, dtype=np.uint8)
    # Worst case: every byte becomes a 2-byte (header + literal) pair.
    dst = np.empty(2 * len(src) + 1, dtype=np.uint8)
    n = _packbits_encode_kernel(src, len(src), dst, len(dst))
    return bytes(dst[:n])

The worst-case output sizing (2x + 1) matches the TIFF spec bound: a degenerate input where no byte equals its neighbour fills a fresh literal header for every two bytes.

Value

Only relevant for to_geotiff(..., compression='packbits'). The codec is rarely chosen for new files (deflate and LZW dominate), but every strip or tile written runs through this loop. For users who do pick PackBits, the encode step is the single CPU bottleneck on the write path.

Stakeholders and Impacts

  • _compression_tag in _writer.py dispatches here. No interface change.
  • Existing tests/test_compression_* round-trip cases cover the new path.

Drawbacks

The worst-case output buffer (2 * len(src) + 1) is generous. For very large strips this means a temporary uint8 array of 2x strip size. Acceptable; strips are at most a few MB.

Alternatives

Leave in Python. Writers picking PackBits are rare enough that this is firmly low-priority.

Refs #2048 (decode-side companion).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions