Summary
xrspatial.geotiff._compression.jpeg_decompress (lines 1042-1066) accepts width, height, and samples keyword arguments but never consults them to pre-validate the JPEG stream's declared SOF dimensions before letting Pillow allocate the decoded pixel buffer. This is asymmetric with the JPEG 2000 wrapper (which pre-parses the SIZ marker) and the LERC wrapper (which inspects getLercBlobInfo) — both reject blobs whose declared output exceeds the per-tile expected_size * 1.05 + 1 cap before any decode allocation runs.
Impact
A crafted TIFF can declare a small JPEG tile (e.g. 256x256 RGB, ~196 KB expected) but ship a JPEG payload whose SOF marker declares a much larger image. Image.open(BytesIO(data)) is lazy, but np.asarray(img).tobytes() triggers full decode. Pillow's default MAX_IMAGE_PIXELS is ~89M (warning) / ~178M (error), so an attacker-declared image up to ~178M pixels (~500 MB RGB) will allocate before Pillow's own guard fires. The downstream chunk.size != expected reshape check in _decode_strip_or_tile only runs after the full buffer is materialised.
Severity: MEDIUM. Pillow's DecompressionBombError is a partial guard; without a tile-aware pre-check, a single malicious tile can still allocate up to ~500 MB per call (and more per IFD when many tiles are decoded sequentially).
Repro sketch
A JPEG stream with SOF declaring 13000x13000x3 (~507 MB) embedded as a tile in a TIFF whose TileWidth=256, TileLength=256, SamplesPerPixel=3, Compression=7 lets jpeg_decompress allocate ~500 MB even though the caller only asked for ~196 KB.
Fix
Parse the JPEG SOF0/SOF2 marker to read declared width/height/samples without decoding pixels, then reject the blob when the declared decoded byte count exceeds expected_size * 1.05 + 1 (the same margin used by the other codec wrappers). The caller already passes the expected tile dimensions via width / height / samples; the helper just needs to use them.
The check should be additive: when called without dimensions (the round-trip / direct-caller path) it should fall back to Pillow's own guard.
Acceptance criteria
- A new pre-decode helper that scans
data for the SOF marker and rejects oversize blobs.
jpeg_decompress calls it when width > 0 and height > 0.
- Tests cover: (1) a legitimate 256x256 JPEG decodes normally, (2) a JPEG declaring 13000x13000 against expected 256x256 raises
ValueError mentioning the decompression bomb, (3) a malformed JPEG with no SOF marker still surfaces a clear error.
- No regression in the JPEG-tile round-trip tests in
tests/test_jpeg.py.
Notes
This is the third codec in this series. Issues #1533 (deflate/zstd/lz4/packbits) and #1625 (LERC/JPEG2000) covered the others. LZW already caps via the caller-supplied buffer.
Summary
xrspatial.geotiff._compression.jpeg_decompress(lines 1042-1066) acceptswidth,height, andsampleskeyword arguments but never consults them to pre-validate the JPEG stream's declared SOF dimensions before letting Pillow allocate the decoded pixel buffer. This is asymmetric with the JPEG 2000 wrapper (which pre-parses the SIZ marker) and the LERC wrapper (which inspectsgetLercBlobInfo) — both reject blobs whose declared output exceeds the per-tileexpected_size * 1.05 + 1cap before any decode allocation runs.Impact
A crafted TIFF can declare a small JPEG tile (e.g. 256x256 RGB, ~196 KB expected) but ship a JPEG payload whose SOF marker declares a much larger image.
Image.open(BytesIO(data))is lazy, butnp.asarray(img).tobytes()triggers full decode. Pillow's defaultMAX_IMAGE_PIXELSis ~89M (warning) / ~178M (error), so an attacker-declared image up to ~178M pixels (~500 MB RGB) will allocate before Pillow's own guard fires. The downstreamchunk.size != expectedreshape check in_decode_strip_or_tileonly runs after the full buffer is materialised.Severity: MEDIUM. Pillow's
DecompressionBombErroris a partial guard; without a tile-aware pre-check, a single malicious tile can still allocate up to ~500 MB per call (and more per IFD when many tiles are decoded sequentially).Repro sketch
A JPEG stream with SOF declaring 13000x13000x3 (~507 MB) embedded as a tile in a TIFF whose
TileWidth=256, TileLength=256, SamplesPerPixel=3, Compression=7letsjpeg_decompressallocate ~500 MB even though the caller only asked for ~196 KB.Fix
Parse the JPEG SOF0/SOF2 marker to read declared width/height/samples without decoding pixels, then reject the blob when the declared decoded byte count exceeds
expected_size * 1.05 + 1(the same margin used by the other codec wrappers). The caller already passes the expected tile dimensions viawidth/height/samples; the helper just needs to use them.The check should be additive: when called without dimensions (the round-trip / direct-caller path) it should fall back to Pillow's own guard.
Acceptance criteria
datafor the SOF marker and rejects oversize blobs.jpeg_decompresscalls it whenwidth > 0andheight > 0.ValueErrormentioning the decompression bomb, (3) a malformed JPEG with no SOF marker still surfaces a clear error.tests/test_jpeg.py.Notes
This is the third codec in this series. Issues #1533 (deflate/zstd/lz4/packbits) and #1625 (LERC/JPEG2000) covered the others. LZW already caps via the caller-supplied buffer.