Skip to content

geotiff: jit packbits_decompress#2083

Merged
brendancol merged 3 commits into
mainfrom
issue-2048
May 19, 2026
Merged

geotiff: jit packbits_decompress#2083
brendancol merged 3 commits into
mainfrom
issue-2048

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

  • Split packbits_decompress into a numba @ngjit kernel (_packbits_decode_kernel) plus a thin bytes-in / bytes-out wrapper, matching the LZW codec pattern in the same module.
  • The kernel walks the PackBits state machine over a uint8 array with bounded writes. The wrapper sizes the output buffer via the existing _max_output_with_margin cap and turns a cap-exceeded sentinel into the same ValueError the previous Python path raised, so the decompression-bomb guard still fires on adversarial input.
  • On a 256x256 mixed-content tile, decode drops from ~0.33 ms to ~0.04 ms per call (~7.5x). Light strips see less benefit since the bytes <-> uint8 array boundary copies dominate there.

Backends

PackBits is a CPU-only TIFF codec on the reader's bytes path. No GPU or dask code paths change.

Test plan

  • pytest xrspatial/geotiff/tests/test_packbits_jit_2048.py (20 new tests: empty input, single-byte literal/replicate, max 128-byte runs, 128-byte literal/replicate boundary, -128 no-op sentinel, Wikipedia canonical example, truncated streams, bomb cap)
  • pytest xrspatial/geotiff/tests/test_compression.py test_features.py test_decompression_caps.py test_packbits_jit_2048.py (193 passed, 4 skipped)
  • pytest xrspatial/geotiff/tests/test_fuzz_hypothesis_1661.py (15 passed; PackBits is round-tripped as part of the lossless-codec fuzz pass)

Closes #2048

Split packbits_decompress into a numba @ngjit kernel
(_packbits_decode_kernel) and a thin bytes-in / bytes-out wrapper,
matching the LZW codec pattern in the same module. The kernel walks the
PackBits state machine over a uint8 array with bounded writes; the
wrapper sizes the output buffer using the existing decompression-bomb
cap (_max_output_with_margin) and turns a cap-exceeded sentinel into the
same ValueError the previous Python path raised.

On a 256x256 mixed-content tile this is ~7.5x faster (~0.33 ms -> ~0.04
ms per decode).

New tests in test_packbits_jit_2048.py cover empty input, single-byte
runs, max literal run (128 bytes), max replicate run (128 bytes via
header 0x81), the 128-byte literal/replicate boundary, the -128 no-op
sentinel, the Wikipedia canonical example, truncated streams, and the
bomb-cap round-trip. Existing PackBits coverage in test_features.py and
test_decompression_caps.py still passes unchanged.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
…2048)

Rename the kernel-return local in packbits_decompress from `n` to
`n_written` so the bytes-written value isn't reusing the signed-header
name from the old loop. The cap-exceeded ValueError now spells out that
the produced count crossed `cap` (cap = expected_size * 1.05 + 1) rather
than only naming the cap, which makes the message readable without
having to reread the docstring.

Add a parametrised test that pins the cap boundary at three points:
the exact-cap-equals-decode case (must pass), the one-byte-over case
(must raise), and a tiny expected_size=1 boundary. These would catch
an off-by-one regression in either direction. Existing 20 tests still
pass; total is now 23.
@brendancol brendancol merged commit f1e1990 into main May 19, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: jit packbits_decompress

1 participant