Reason or Problem
unpack_bits in xrspatial/geotiff/_compression.py (lines 815-880) decodes sub-byte pixel data when a file declares BitsPerSample in {1, 2, 4, 12}. The bps=1 branch uses np.unpackbits (vectorised, fast). The bps=2, bps=4, and bps=12 branches use Python for loops over packed bytes, which is 100-200x slower than numpy-vectorised equivalents at realistic tile sizes.
Microbench on a single thread:
| bps |
pixel_count |
current unpack_bits |
vectorised |
speedup |
| 2 |
4,000,000 |
336 ms |
(a few ms) |
~150x |
| 4 |
2,000,000 |
162 ms |
1.5 ms |
~108x |
| 12 |
2,000,000 |
220 ms |
(a few ms) |
~70x |
Sub-byte BPS files are uncommon in geospatial workflows but real: bps=4 is used by some indexed/classified rasters and bps=12 by certain scientific instruments (e.g. older satellite radiometers, some medical/microscopy formats). When such a file is read the slow path is the dominant cost.
Proposal
Replace the per-byte Python loops with numpy slicing/shifting:
elif bps == 2:
# 4 pixels per byte, MSB-first.
n_bytes = (pixel_count + 3) // 4
raw = data[:n_bytes]
out = np.empty(n_bytes * 4, dtype=np.uint8)
out[0::4] = (raw >> 6) & 0x03
out[1::4] = (raw >> 4) & 0x03
out[2::4] = (raw >> 2) & 0x03
out[3::4] = raw & 0x03
return out[:pixel_count]
elif bps == 4:
# 2 pixels per byte, high nibble first.
n_bytes = (pixel_count + 1) // 2
raw = data[:n_bytes]
out = np.empty(n_bytes * 2, dtype=np.uint8)
out[0::2] = (raw >> 4) & 0x0F
out[1::2] = raw & 0x0F
return out[:pixel_count]
elif bps == 12:
# 2 pixels per 3 bytes, MSB-first.
n_pairs = pixel_count // 2
remainder = pixel_count % 2
n_bytes = n_pairs * 3 + (2 if remainder else 0)
raw = data[:n_bytes].astype(np.uint16)
out = np.empty(pixel_count, dtype=np.uint16)
b0 = raw[0:n_pairs * 3:3]
b1 = raw[1:n_pairs * 3:3]
b2 = raw[2:n_pairs * 3:3]
out[0:n_pairs * 2:2] = (b0 << 4) | (b1 >> 4)
out[1:n_pairs * 2:2] = ((b1 & 0x0F) << 8) | b2
if remainder:
out[pixel_count - 1] = ((int(data[n_pairs * 3]) << 4)
| (int(data[n_pairs * 3 + 1]) >> 4))
return out
The vectorised versions match the existing reference implementation bit-for-bit (verified against random uint8 inputs of length 1000 across all three bps values).
Acceptance criteria
unpack_bits(data, bps, pixel_count) for bps in {2, 4, 12} returns the same byte/short sequence as the current implementation across the full pixel_count range.
- Microbench: bps=4 on 2M pixels reduces from ~160 ms to <5 ms.
- Existing sub-byte tests in
xrspatial/geotiff/tests/test_features.py (the _make_sub_byte_tiff helpers around lines 1840-1880) continue to pass.
Context
Found via deep-sweep performance audit on 2026-05-12. Cat 5 (Numba anti-pattern): nested Python loops over packed bytes that should be vectorised.
Reason or Problem
unpack_bitsinxrspatial/geotiff/_compression.py(lines 815-880) decodes sub-byte pixel data when a file declaresBitsPerSamplein{1, 2, 4, 12}. The bps=1 branch usesnp.unpackbits(vectorised, fast). The bps=2, bps=4, and bps=12 branches use Pythonforloops over packed bytes, which is 100-200x slower than numpy-vectorised equivalents at realistic tile sizes.Microbench on a single thread:
unpack_bitsSub-byte BPS files are uncommon in geospatial workflows but real:
bps=4is used by some indexed/classified rasters andbps=12by certain scientific instruments (e.g. older satellite radiometers, some medical/microscopy formats). When such a file is read the slow path is the dominant cost.Proposal
Replace the per-byte Python loops with numpy slicing/shifting:
The vectorised versions match the existing reference implementation bit-for-bit (verified against random uint8 inputs of length 1000 across all three bps values).
Acceptance criteria
unpack_bits(data, bps, pixel_count)forbpsin{2, 4, 12}returns the same byte/short sequence as the current implementation across the fullpixel_countrange.xrspatial/geotiff/tests/test_features.py(the_make_sub_byte_tiffhelpers around lines 1840-1880) continue to pass.Context
Found via deep-sweep performance audit on 2026-05-12. Cat 5 (Numba anti-pattern): nested Python loops over packed bytes that should be vectorised.