Skip to content

geotiff: pre-decode bomb cap for JPEG codec (#1792)#1793

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-security-geotiff-2026-05-13-s2
May 13, 2026
Merged

geotiff: pre-decode bomb cap for JPEG codec (#1792)#1793
brendancol merged 4 commits into
mainfrom
deep-sweep-security-geotiff-2026-05-13-s2

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Why

A crafted TIFF can declare a small tile (e.g. 256x256 RGB, ~196 KiB expected) while shipping a JPEG payload whose SOF marker declares a much larger image. Pillow's default MAX_IMAGE_PIXELS fires only at ~178M pixels (~500 MB RGB), so without a tile-aware pre-check a single tile can allocate hundreds of MB before the downstream chunk.size != expected reshape check runs. np.asarray(img).tobytes() materialises the full Pillow buffer.

Test plan

  • pytest xrspatial/geotiff/tests/test_decompression_caps.py (29 passed, 4 new)
  • pytest xrspatial/geotiff/tests/test_jpeg.py (19 passed, no regression)
  • pytest xrspatial/geotiff/tests/test_security.py (46 passed)
  • New JPEG cases: SOF-rewritten bomb raises with Likely a decompression bomb; legitimate 32x32 RGB JPEG round-trips; default kwargs keep the cap disabled; a JPEG without a parseable SOF defers to Pillow's own error.

Notes

jpeg_decompress accepted width/height/samples but never consulted
them before letting Pillow allocate the decoded buffer. A crafted
TIFF could declare a small tile (256x256, ~196 KiB expected) while
shipping a JPEG payload whose SOF marker declared a much larger
image, allocating up to ~500 MB per tile before Pillow's own
DecompressionBombError fires at 178M pixels.

Mirror the JP2K SIZ pre-check (#1625) and LERC blob-info pre-check:
parse the JPEG SOF marker without decoding pixels, and reject when
declared output exceeds expected * 1.05 + 1 bytes. Default kwargs
(width=0, height=0) disable the cap so direct callers and round-trip
tests keep working.

Tests cover (1) a SOF-rewritten JPEG raising on the bomb cap, (2) a
legitimate JPEG decoding normally, (3) the no-cap fallback for direct
callers, and (4) a JPEG without a parseable SOF deferring to Pillow.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 13, 2026
@brendancol brendancol requested a review from Copilot May 13, 2026 14:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a tile-aware, pre-decode decompression-bomb cap for the JPEG GeoTIFF codec by parsing JPEG SOF dimensions before invoking Pillow, aligning JPEG behavior with the existing pre-check patterns used for other codecs.

Changes:

  • Add _read_jpeg_sof() + _check_jpeg_bomb() and invoke the check from jpeg_decompress() when width/height/samples are provided.
  • Add new JPEG decompression-cap tests (including SOF-forged “bomb” payloads) gated on Pillow availability.
  • Update the sweep-security state tracker entry for issue #1792.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
xrspatial/geotiff/_compression.py Adds JPEG SOF parsing and enforces a per-tile pre-decode size cap in jpeg_decompress().
xrspatial/geotiff/tests/test_decompression_caps.py Adds targeted tests for JPEG pre-decode bomb protection and backward-compat behavior.
.claude/sweep-security-state.csv Records the security re-audit state for issue #1792.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1087 to +1093
if marker in _JPEG_SOF_CODES:
if i + 9 >= n:
return None
height = (data[i + 5] << 8) | data[i + 6]
width = (data[i + 7] << 8) | data[i + 8]
components = data[i + 9]
return height, width, components
Comment thread xrspatial/geotiff/_compression.py Outdated
Comment on lines +1130 to +1132
# case; cap at the caller-declared sample count regardless. Most
# 4-component JPEGs in TIFFs ship CMYK, but the bomb check just
# needs an upper bound on bytes per pixel.
Comment thread xrspatial/geotiff/_compression.py Outdated
Comment on lines +1133 to +1141
expected_pixels = expected_width * expected_height * expected_samples
declared_pixels = declared_w * declared_h * max(declared_components, 1)
cap = _max_output_with_margin(expected_pixels)
if declared_pixels > cap:
raise ValueError(
f"jpeg decode would exceed expected size: declared output is "
f"{declared_pixels} bytes ({declared_w}x{declared_h}x"
f"{declared_components}), cap is {cap} (expected "
f"{expected_pixels}). Likely a decompression bomb."
declared_h=8000, declared_w=8000,
)
from xrspatial.geotiff._compression import jpeg_decompress
with pytest.raises(ValueError, match="exceed"):
#1792)

Addresses Copilot review on PR #1793.

1. ``_read_jpeg_sof`` now parses the SOF segment length and refuses
   the segment when ``seg_len < 8`` or the segment runs past EOF.
   Pre-fix, a truncated SOF could push the fixed-offset H/W/components
   reads past the segment boundary into later bytes, producing
   misleading dimensions.

2. Renamed ``expected_pixels``/``declared_pixels`` to
   ``expected_bytes``/``declared_bytes`` since they're compared
   against ``_max_output_with_margin()`` (byte cap) and reported as
   bytes in the error message. Updated the surrounding comment to
   describe the actual computation (caller-declared expected vs SOF-
   declared output) instead of the inaccurate "regardless of components"
   wording.

3. ``test_jpeg_bomb_raises`` now matches the full diagnostic
   ``"jpeg decode would exceed.*Likely a decompression bomb"`` so a
   regression that swaps in a different error path (numeric overflow,
   Pillow's own DecompressionBombError) fails the test instead of
   silently passing.

4. Added two tests for the new SOF length-validation branch:
   truncated segment lengths and below-minimum lengths both return
   None.
@brendancol brendancol merged commit a90f77f into main May 13, 2026
1 of 11 checks passed
brendancol added a commit that referenced this pull request May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the
`_read_vrt_dask` dispatch hook from the PR branch. All other
geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804,
#1805, #1806) were already integrated into the working tree by the
prior 7329dd9 commit; this merge just records the parent so git
recognises the reconciliation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: jpeg_decompress lacks pre-decode size cap (bomb defense gap)

2 participants