Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/sweep-security-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely pe
flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs.
focal,2026-04-27,1284,HIGH,1,,"HIGH (fixed PR #1286): apply(), focal_stats(), and hotspots() accepted unbounded user-supplied kernels via custom_kernel(), which only checks shape parity. The kernel-size guard from #1241 (_check_kernel_memory) only ran inside circle_kernel/annulus_kernel, so a (50001, 50001) custom kernel on a 10x10 raster allocated ~10 GB on the kernel itself plus a much larger padded raster before any work -- same shape as the bilateral DoS in #1236. Fixed by adding _check_kernel_vs_raster_memory in focal.py and wiring it into apply(), focal_stats(), and hotspots() after custom_kernel() validation. All 134 focal tests + 19 bilateral tests pass. No other findings: 10 CUDA kernels all have proper bounds + stencil guards; _validate_raster called on every public entry point; hotspots already raises ZeroDivisionError on constant-value rasters; _focal_variety_cuda uses a fixed-size local buffer (silent truncation but bounded); _focal_std_cuda/_focal_var_cuda clamp the catastrophic-cancellation case via if var < 0.0: var = 0.0; no file I/O."
geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R."
geotiff,2026-05-12,1737,HIGH,1,,"Re-audit pass 16 2026-05-12 (deep-sweep p3). NEW HIGH (Cat 1, fixed PR pending against #1737): VRT _resample_nearest allocated (dr.y_size, dr.x_size) before the clip was taken. A crafted <SimpleSource><DstRect xSize/ySize> can declare values orders of magnitude larger than the VRT's rasterXSize/rasterYSize; the output buffer was bounded by _check_dimensions but the resample intermediate was not. Tracing: a 10x10 source with DstRect 50000x50000 on a 100x100 VRT extent allocated ~2.5 GB inside np.repeat. Fixed by checking dr.x_size * dr.y_size against max_pixels in _vrt.py:read_vrt() before _resample_nearest runs. Mirrors the _check_dimensions pattern from _reader.py. Six new tests in test_vrt_dstrect_resample_cap_1737.py cover the huge-X, huge-Y, legitimate, max_pixels override, at-cap, and negative cases. All 201 existing VRT tests still pass. Other categories verified clean (no new findings): Cat 1 (allocations): _check_dimensions covers the public window / HTTP / dask paths; MAX_TILE_BYTES_DEFAULT (256 MiB) caps per-tile / per-strip compressed bytes locally and over HTTP (PR #1668); LERC blob-header pre-check, JP2K SIZ pre-check, deflate/zstd/lz4/packbits caps still in place; Cat 2 (overflow): _check_dimensions catches before alloc; int64 in CUDA tile offsets; Cat 3 (NaN logic): NaN-aware paths via #1597, #1630; Cat 4 (GPU bounds): all CUDA kernels (_byte_swap_lanes_kernel, _lzw_decode_tiles_kernel, _inflate_tiles_kernel, _predictor_decode_kernel_u8/u16/u32/u64, _fp_predictor_decode_kernel, _assemble_tiles_kernel, _extract_tiles_kernel, _predictor_encode_kernel_u8/u16/u32/u64, _fp_predictor_encode_kernel) have bounds guards; shared memory sizes fixed; Cat 5 (path): _MmapCache realpath, VRT path containment #1671 with XRSPATIAL_VRT_ALLOWED_ROOTS opt-in; tempfile.mkstemp + os.replace for writer; SSRF defenses for _HTTPSource via #1664; Cat 6 (dtype): _validate_dtype_cast in __init__.py; _NATIVE_ORDER byte-swap. XML payloads gated via _safe_xml.safe_fromstring with DOCTYPE rejection (#1579). _compression.py uses tempfile.mkstemp without dir= so the JP2K temp file lands in the system tmpdir (TMPDIR / TEMP), which is the documented safe default. No new findings beyond #1737."
geotiff,2026-05-13,1792,MEDIUM,1,,"Re-audit pass 17 2026-05-13 (deep-sweep s2). NEW MEDIUM (Cat 1): jpeg_decompress (_compression.py:1042-1066) hands attacker-controlled JPEG bytes to Pillow without consulting the declared tile width/height/samples; a tile-size mismatch lets a small JPEG payload allocate up to Pillow's MAX_IMAGE_PIXELS*2 (~178M pixels, ~500 MB RGB) before the downstream chunk.size != expected check fires. Asymmetric with the JP2K SIZ pre-check and LERC blob-info pre-check. Pillow's default DecompressionBombError is a partial guard so severity is MEDIUM. Other categories verified clean: Cat 2-6 same coverage as pass 16 audit; JPEG2000 / LERC / deflate / zstd / lz4 / packbits / LZW caps still in place; VRT _resample_nearest DstRect cap (#1737) merged; VRT path containment + DOCTYPE rejection in _safe_xml; CUDA kernels have bounds guards; mmap cache uses realpath; SSRF defenses on _HTTPSource."
glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
gpu_rtx,2026-04-29,1308,HIGH,1,,"HIGH (fixed #1308 / PR #1310): hillshade_rtx (gpu_rtx/hillshade.py:184) and viewshed_gpu (gpu_rtx/viewshed.py:269) allocated cupy device buffers sized by raster shape with no memory check. create_triangulation (mesh_utils.py:23-24) adds verts (12 B/px) + triangles (24 B/px) = 36 B/px; hillshade_rtx adds d_rays(32) + d_hits(16) + d_aux(12) + d_output(4) = 64 B/px (100 B/px total); viewshed_gpu adds d_rays(32) + d_hits(16) + d_visgrid(4) + d_vsrays(32) = 84 B/px (120 B/px total). A 30000x30000 raster asked for 90-108 GB of VRAM before cupy surfaced an opaque allocator error. Fixed by adding gpu_rtx/_memory.py with _available_gpu_memory_bytes() and _check_gpu_memory(func_name, h, w) helpers (cost_distance #1262 / sky_view_factor #1299 pattern, 120 B/px budget covers worst case, raises MemoryError when required > 50% of free VRAM, skips silently when memGetInfo() unavailable). Wired into both entry points after the cupy.ndarray type check and before create_triangulation. 9 new tests in test_gpu_rtx_memory.py (5 helper-unit + 4 end-to-end gated on has_rtx). All 81 existing hillshade/viewshed tests still pass. Cat 4 clean: all CUDA kernels (hillshade.py:25/62/106, viewshed.py:32/74/116, mesh_utils.py:50) have bounds guards; no shared memory, no syncthreads needed. MEDIUM not fixed (Cat 6): hillshade_rtx and viewshed_gpu do not call _validate_raster directly but parent hillshade() (hillshade.py:252) and viewshed() (viewshed.py:1707) already validate, so input validation runs before the gpu_rtx entry point - defense-in-depth, not exploitable. MEDIUM not fixed (Cat 2): mesh_utils.py:64-68 cast mesh_map_index to int32 in the triangle index buffer; overflows at H*W > 2.1B vertices (~46341x46341+) but the new memory guard rejects rasters that large first - documentation/clarity item rather than exploitable. MEDIUM not fixed (Cat 3): mesh_utils.py:19 scale = maxDim / maxH divides by zero on an all-zero raster, propagating inf/NaN into mesh vertex z-coords; separate follow-up. LOW not fixed (Cat 5): mesh_utils.write() opens user-supplied path without canonicalization but its only call site (mesh_utils.py:38-39) sits behind if False: in create_triangulation, not reachable in production."
hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64."
Expand Down
132 changes: 132 additions & 0 deletions xrspatial/geotiff/_compression.py
Original file line number Diff line number Diff line change
Expand Up @@ -1039,6 +1039,125 @@ def _splice_jpeg_tables(tile_data: bytes,
return tile_data[:2] + tables_body + tile_data[2:]


# JPEG Start-Of-Frame marker codes: 0xFFC0..0xFFCF *except* DHT (0xC4),
# JPG (0xC8), and DAC (0xCC). The payload format is the same across every
# SOF variant: 2-byte segment length, 1-byte sample precision, 2-byte
# image height (big-endian), 2-byte image width (big-endian), 1-byte
# component count.
_JPEG_SOF_CODES = frozenset(
{0xC0, 0xC1, 0xC2, 0xC3, 0xC5, 0xC6, 0xC7,
0xC9, 0xCA, 0xCB, 0xCD, 0xCE, 0xCF}
)


def _read_jpeg_sof(data: bytes) -> tuple[int, int, int] | None:
"""Return ``(height, width, components)`` from the first JPEG SOF marker.

Scans *data* for the first Start-Of-Frame marker without decoding any
pixel data. Returns ``None`` when the buffer is too short, when the
SOI marker is missing, when a length-prefixed segment runs off the
end, or when no SOF is found before EOI/end-of-buffer. The caller
treats a ``None`` return as "could not determine declared size" and
falls back to Pillow's own guard.

Used by :func:`jpeg_decompress` to enforce a pre-decode bomb cap
analogous to the JP2K SIZ pre-check and the LERC ``getLercBlobInfo``
pre-check.
"""
n = len(data)
if n < 4 or data[0] != 0xFF or data[1] != 0xD8:
return None
i = 2
while i + 3 < n:
if data[i] != 0xFF:
return None
# Skip JPEG fill bytes (0xFF padding).
marker = data[i + 1]
while marker == 0xFF and i + 2 < n:
i += 1
marker = data[i + 1]
# Standalone markers without payload: SOI, EOI, RSTm, TEM.
# Encountering EOI before SOF means the stream is malformed for
# this purpose; caller falls back to Pillow.
if marker == 0xD9: # EOI
return None
if marker in (0xD8,) or 0xD0 <= marker <= 0xD7 or marker == 0x01:
i += 2
continue
if marker in _JPEG_SOF_CODES:
# SOF is a length-prefixed segment; validate the declared
# segment length before reading the fixed fields so a
# truncated/malformed segment header is rejected as "unknown
# size" rather than silently reading past the segment into
# later bytes. The fields we need are at offsets 5..9 in the
# segment, which sits inside the declared seg_len bytes.
if i + 3 >= n:
return None
seg_len = (data[i + 2] << 8) | data[i + 3]
# SOF requires at least: 2 (length) + 1 (precision) + 2 (h)
# + 2 (w) + 1 (components) = 8 bytes; the segment must also
# fit inside the buffer.
if seg_len < 8 or i + 2 + seg_len > n:
return None
height = (data[i + 5] << 8) | data[i + 6]
width = (data[i + 7] << 8) | data[i + 8]
components = data[i + 9]
return height, width, components
Comment on lines +1087 to +1105
# Length-prefixed segment: payload length includes the two length
# bytes themselves but not the marker.
if i + 3 >= n:
return None
seg_len = (data[i + 2] << 8) | data[i + 3]
if seg_len < 2:
return None
i += 2 + seg_len
return None


def _check_jpeg_bomb(data: bytes, expected_width: int, expected_height: int,
expected_samples: int) -> None:
"""Reject JPEG blobs whose SOF dimensions exceed the expected tile size.

The caller passes the TIFF-declared tile width/height/samples; we
parse the JPEG SOF marker to discover the JPEG's own declared
dimensions and refuse to decode when the projected output exceeds
the same ``expected * 1.05 + 1`` margin used by every other codec
wrapper. Skipping when any expected value is non-positive matches
the convention from the other codecs: callers that don't supply a
size fall back to Pillow's built-in ``DecompressionBombError``
guard.

A return of ``None`` from :func:`_read_jpeg_sof` is treated as
"could not determine declared size"; in that case we defer to
Pillow rather than raising, so legitimate streams with unusual
framing still decode.
"""
if expected_width <= 0 or expected_height <= 0 or expected_samples <= 0:
return
info = _read_jpeg_sof(data)
if info is None:
return
declared_h, declared_w, declared_components = info
# JPEG components ship as 8-bit samples in every TIFF JPEG we
# support, so ``width * height * components`` equals the post-decode
# byte count exactly. JPEG-12 (12-bit precision) would round up to
# 2 bytes per sample, but tifffile/Pillow doesn't surface those on
# the read path so we treat samples as bytes here. The expected
# side uses what the *caller* declared (TIFF tile size and
# samples-per-pixel), so a JPEG claiming extra components is
# rejected as a bomb.
expected_bytes = expected_width * expected_height * expected_samples
declared_bytes = declared_w * declared_h * max(declared_components, 1)
cap = _max_output_with_margin(expected_bytes)
if declared_bytes > cap:
raise ValueError(
f"jpeg decode would exceed expected size: declared output is "
f"{declared_bytes} bytes ({declared_w}x{declared_h}x"
f"{declared_components}), cap is {cap} (expected "
f"{expected_bytes}). Likely a decompression bomb."
)


def jpeg_decompress(data: bytes, width: int = 0, height: int = 0,
samples: int = 1, jpeg_tables: bytes | None = None) -> bytes:
"""Decompress JPEG tile/strip data. Requires Pillow.
Expand All @@ -1048,6 +1167,13 @@ def jpeg_decompress(data: bytes, width: int = 0, height: int = 0,
data : bytes
Raw JPEG bytes from one TIFF strip or tile. May be a fragment
when ``jpeg_tables`` is supplied (GDAL tiled JPEG).
width, height, samples : int, optional
TIFF-declared tile dimensions. When all three are positive the
wrapper inspects the JPEG SOF marker before decoding and raises
``ValueError`` when the JPEG's declared output exceeds
``width * height * samples * 1.05 + 1`` bytes
(decompression-bomb guard). Defaults of 0/0/1 disable the cap
for direct callers and round-trip tests.
jpeg_tables : bytes, optional
Contents of TIFF tag 347 (JPEGTables). If supplied, the shared
DQT/DHT segments are spliced into ``data`` before decoding so
Expand All @@ -1060,6 +1186,12 @@ def jpeg_decompress(data: bytes, width: int = 0, height: int = 0,
import io
if jpeg_tables:
data = _splice_jpeg_tables(data, jpeg_tables)
# Pre-decode bomb cap (issue #1792). Pillow's built-in
# DecompressionBombError fires at ~178M pixels (~500 MB RGB) which
# is well above a typical tile's expected size; without this
# per-tile pre-check a single malicious tile can allocate hundreds
# of MB before the downstream chunk.size != expected check fires.
_check_jpeg_bomb(data, width, height, samples)
img = Image.open(io.BytesIO(data))
# libjpeg already converts YCbCr->RGB during decode, so rely on the
# mode Pillow returns. Calling .convert() unnecessarily would copy.
Expand Down
Loading
Loading