Skip to content

geotiff: jit packbits_decompress #2048

@brendancol

Description

@brendancol

Reason or Problem

xrspatial.geotiff._compression.packbits_decompress (lines 981-1018) is a pure-Python while loop over the compressed byte stream. It copies literal runs into a bytearray and expands sentinel runs with out.extend(bytes([src[i]]) * (1 - n)). Every other inner-loop codec helper in the same module is @ngjit:

  • _lzw_decode_kernel, _lzw_encode_kernel
  • All four _predictor_decode_u{8,16,32,64} variants
  • All four _predictor_encode_u{8,16,32,64} variants
  • _fp_predictor_decode_row / _fp_predictor_decode_rows
  • _fp_predictor_encode_row / _fp_predictor_encode_rows

PackBits decode is the only non-native codec helper that still runs in pure Python on the read path.

Proposal

Split into a jit kernel and a public wrapper, matching the LZW pattern:

@ngjit
def _packbits_decode_kernel(src, src_len, dst, dst_cap):
    # integer cursor, bounded-write semantics, returns out_pos

def packbits_decompress(data: bytes, expected_size: int = 0) -> bytes:
    src = np.frombuffer(data, dtype=np.uint8)
    cap = _max_output_with_margin(expected_size) or len(data) * 128
    dst = np.empty(cap, dtype=np.uint8)
    n = _packbits_decode_kernel(src, len(src), dst, cap)
    if expected_size and n > _max_output_with_margin(expected_size):
        raise ValueError("packbits decode exceeded expected size...")
    return bytes(dst[:n])

Bytes-in / bytes-out signature stays the same. The decompression-bomb guard at lines 1012-1017 stays: the kernel writes up to dst_cap and returns the count; the wrapper raises ValueError when the cap is hit.

Value

PackBits is uncommon in modern Sentinel/Landsat imagery but lives on in legacy scanned imagery and some mil-spec products. For files that use it, every strip or tile decode runs this loop. Other codec helpers in this module saw 5-50x speedups when jitted (issue #1713 documents the unpack_bits numpy pass; LZW and predictor variants ran similar).

Stakeholders and Impacts

  • _decode_strip_or_tile in _reader.py consumes the bytes result. No interface change.
  • Existing PackBits round-trip tests in tests/test_compression_* cover the new path with no edits.

Drawbacks

bytes -> uint8 array -> bytes adds a small constant-time copy at the boundaries. On a tiny strip this can wash out the per-iteration savings; benchmark on a 256x256 tile before/after.

Alternatives

Leave it in Python. PackBits is rarely on the hot path for typical scientific imagery, so this is a low-traffic win.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions