Skip to content

geotiff: jit packbits_compress (#2049)#2081

Merged
brendancol merged 2 commits into
mainfrom
issue-2049
May 19, 2026
Merged

geotiff: jit packbits_compress (#2049)#2081
brendancol merged 2 commits into
mainfrom
issue-2049

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2049.

Summary

  • Rewrites packbits_compress to dispatch into an @ngjit kernel that writes into a preallocated np.uint8 buffer, matching the shape of lzw_compress / _lzw_encode_kernel.
  • Worst-case output sizing is 2 * src_len + 1 (the TIFF spec bound for PackBits: a degenerate stream fills a fresh literal header for every two input bytes).
  • No public signature change. The wrapper still takes bytes and returns bytes.

Performance

Encode rate on 1 MB strips, single thread, warm JIT cache:

Workload Before After Speedup
all-same 29.8 ms 0.56 ms 53x
alternating 49.0 ms 1.69 ms 29x
random 49.3 ms 1.53 ms 32x
mixed runs + random 40.3 ms 1.04 ms 39x

Backend coverage

PackBits is a CPU codec only; no GPU or dask path. The existing _compression_tag dispatcher in _writer.py is unchanged.

Test plan

  • Existing TestPackBits round-trip cases still pass.
  • New test_packbits_jit_2049.py covers regime boundaries: length 0, 1, 2, 128, 129; alternating patterns; runs at start / end; mixed; 4 random seeds; large all-zero buffer; direct kernel checks.

Refs #2048 (decode-side companion).

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 18, 2026
PackBits encode was the last writer-side codec helper still running in
pure Python. The kernel matches the shape of the existing lzw_compress
path: an @ngjit function operates on preallocated uint8 buffers, and a
thin bytes-in / bytes-out wrapper takes care of the conversion. Output
sizing uses the TIFF spec bound (2 * src_len + 1).

On 1 MB strips the JIT version runs at 0.6-1.7 ms/call versus 30-49
ms/call for the previous Python loop, a 30-50x speedup.
Self-review follow-up to #2081.

The original kernel docstring described the worst case as "one literal
header per input byte", which is not actually achievable: the encoder
packs up to 128 bytes under a single literal header, so the tight upper
bound is src_len + ceil(src_len / 128) + 1. The 2 * src_len + 1
allocation in the wrapper is still a safe overestimate; the comments
now say so explicitly.

Adds three tests: a golden literal-encoding check on the raw kernel,
a parametrized buffer-cap invariant over the regime-boundary inputs,
and a random sweep that re-checks the cap on 0..4096-byte streams.
@brendancol brendancol merged commit 1aa310b into main May 19, 2026
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: jit packbits_compress

1 participant