geotiff: pre-decode bomb cap for JPEG codec (#1792)#1793
Merged
Conversation
… bomb pre-check gap)
jpeg_decompress accepted width/height/samples but never consulted them before letting Pillow allocate the decoded buffer. A crafted TIFF could declare a small tile (256x256, ~196 KiB expected) while shipping a JPEG payload whose SOF marker declared a much larger image, allocating up to ~500 MB per tile before Pillow's own DecompressionBombError fires at 178M pixels. Mirror the JP2K SIZ pre-check (#1625) and LERC blob-info pre-check: parse the JPEG SOF marker without decoding pixels, and reject when declared output exceeds expected * 1.05 + 1 bytes. Default kwargs (width=0, height=0) disable the cap so direct callers and round-trip tests keep working. Tests cover (1) a SOF-rewritten JPEG raising on the bomb cap, (2) a legitimate JPEG decoding normally, (3) the no-cap fallback for direct callers, and (4) a JPEG without a parseable SOF deferring to Pillow.
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a tile-aware, pre-decode decompression-bomb cap for the JPEG GeoTIFF codec by parsing JPEG SOF dimensions before invoking Pillow, aligning JPEG behavior with the existing pre-check patterns used for other codecs.
Changes:
- Add
_read_jpeg_sof()+_check_jpeg_bomb()and invoke the check fromjpeg_decompress()whenwidth/height/samplesare provided. - Add new JPEG decompression-cap tests (including SOF-forged “bomb” payloads) gated on Pillow availability.
- Update the sweep-security state tracker entry for issue #1792.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
xrspatial/geotiff/_compression.py |
Adds JPEG SOF parsing and enforces a per-tile pre-decode size cap in jpeg_decompress(). |
xrspatial/geotiff/tests/test_decompression_caps.py |
Adds targeted tests for JPEG pre-decode bomb protection and backward-compat behavior. |
.claude/sweep-security-state.csv |
Records the security re-audit state for issue #1792. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+1087
to
+1093
| if marker in _JPEG_SOF_CODES: | ||
| if i + 9 >= n: | ||
| return None | ||
| height = (data[i + 5] << 8) | data[i + 6] | ||
| width = (data[i + 7] << 8) | data[i + 8] | ||
| components = data[i + 9] | ||
| return height, width, components |
Comment on lines
+1130
to
+1132
| # case; cap at the caller-declared sample count regardless. Most | ||
| # 4-component JPEGs in TIFFs ship CMYK, but the bomb check just | ||
| # needs an upper bound on bytes per pixel. |
Comment on lines
+1133
to
+1141
| expected_pixels = expected_width * expected_height * expected_samples | ||
| declared_pixels = declared_w * declared_h * max(declared_components, 1) | ||
| cap = _max_output_with_margin(expected_pixels) | ||
| if declared_pixels > cap: | ||
| raise ValueError( | ||
| f"jpeg decode would exceed expected size: declared output is " | ||
| f"{declared_pixels} bytes ({declared_w}x{declared_h}x" | ||
| f"{declared_components}), cap is {cap} (expected " | ||
| f"{expected_pixels}). Likely a decompression bomb." |
| declared_h=8000, declared_w=8000, | ||
| ) | ||
| from xrspatial.geotiff._compression import jpeg_decompress | ||
| with pytest.raises(ValueError, match="exceed"): |
#1792) Addresses Copilot review on PR #1793. 1. ``_read_jpeg_sof`` now parses the SOF segment length and refuses the segment when ``seg_len < 8`` or the segment runs past EOF. Pre-fix, a truncated SOF could push the fixed-offset H/W/components reads past the segment boundary into later bytes, producing misleading dimensions. 2. Renamed ``expected_pixels``/``declared_pixels`` to ``expected_bytes``/``declared_bytes`` since they're compared against ``_max_output_with_margin()`` (byte cap) and reported as bytes in the error message. Updated the surrounding comment to describe the actual computation (caller-declared expected vs SOF- declared output) instead of the inaccurate "regardless of components" wording. 3. ``test_jpeg_bomb_raises`` now matches the full diagnostic ``"jpeg decode would exceed.*Likely a decompression bomb"`` so a regression that swaps in a different error path (numeric overflow, Pillow's own DecompressionBombError) fails the test instead of silently passing. 4. Added two tests for the new SOF length-validation branch: truncated segment lengths and below-minimum lengths both return None.
brendancol
added a commit
that referenced
this pull request
May 13, 2026
Resolves conflict in xrspatial/geotiff/__init__.py: keeps the `_read_vrt_dask` dispatch hook from the PR branch. All other geotiff changes from main (#1791, #1793, #1801, #1802, #1803, #1804, #1805, #1806) were already integrated into the working tree by the prior 7329dd9 commit; this merge just records the parent so git recognises the reconciliation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
jpeg_decompressnow parses the JPEG SOF marker before invoking Pillow and refuses blobs whose declared output exceeds the sameexpected * 1.05 + 1cap used by every other codec wrapper.width/height/samples(the reader does) get the cap; direct callers and round-trip tests passing the defaults fall through to Pillow's ownDecompressionBombErroras before.getLercBlobInfopre-check.Why
A crafted TIFF can declare a small tile (e.g. 256x256 RGB, ~196 KiB expected) while shipping a JPEG payload whose SOF marker declares a much larger image. Pillow's default
MAX_IMAGE_PIXELSfires only at ~178M pixels (~500 MB RGB), so without a tile-aware pre-check a single tile can allocate hundreds of MB before the downstreamchunk.size != expectedreshape check runs.np.asarray(img).tobytes()materialises the full Pillow buffer.Test plan
pytest xrspatial/geotiff/tests/test_decompression_caps.py(29 passed, 4 new)pytest xrspatial/geotiff/tests/test_jpeg.py(19 passed, no regression)pytest xrspatial/geotiff/tests/test_security.py(46 passed)Likely a decompression bomb; legitimate 32x32 RGB JPEG round-trips; default kwargs keep the cap disabled; a JPEG without a parseable SOF defers to Pillow's own error.Notes
width/height/sampleskwargs.