xarray-contrib · brendancol · May 19, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv
@@ -18,7 +18,7 @@ fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely pe
 flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs.
 focal,2026-04-27,1284,HIGH,1,,"HIGH (fixed PR #1286): apply(), focal_stats(), and hotspots() accepted unbounded user-supplied kernels via custom_kernel(), which only checks shape parity. The kernel-size guard from #1241 (_check_kernel_memory) only ran inside circle_kernel/annulus_kernel, so a (50001, 50001) custom kernel on a 10x10 raster allocated ~10 GB on the kernel itself plus a much larger padded raster before any work -- same shape as the bilateral DoS in #1236. Fixed by adding _check_kernel_vs_raster_memory in focal.py and wiring it into apply(), focal_stats(), and hotspots() after custom_kernel() validation. All 134 focal tests + 19 bilateral tests pass. No other findings: 10 CUDA kernels all have proper bounds + stencil guards; _validate_raster called on every public entry point; hotspots already raises ZeroDivisionError on constant-value rasters; _focal_variety_cuda uses a fixed-size local buffer (silent truncation but bounded); _focal_std_cuda/_focal_var_cuda clamp the catastrophic-cancellation case via if var < 0.0: var = 0.0; no file I/O."
 geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R."
-geotiff,2026-05-13,1792,MEDIUM,1,,"Re-audit pass 17 2026-05-13 (deep-sweep s2). NEW MEDIUM (Cat 1): jpeg_decompress (_compression.py:1042-1066) hands attacker-controlled JPEG bytes to Pillow without consulting the declared tile width/height/samples; a tile-size mismatch lets a small JPEG payload allocate up to Pillow's MAX_IMAGE_PIXELS*2 (~178M pixels, ~500 MB RGB) before the downstream chunk.size != expected check fires. Asymmetric with the JP2K SIZ pre-check and LERC blob-info pre-check. Pillow's default DecompressionBombError is a partial guard so severity is MEDIUM. Other categories verified clean: Cat 2-6 same coverage as pass 16 audit; JPEG2000 / LERC / deflate / zstd / lz4 / packbits / LZW caps still in place; VRT _resample_nearest DstRect cap (#1737) merged; VRT path containment + DOCTYPE rejection in _safe_xml; CUDA kernels have bounds guards; mmap cache uses realpath; SSRF defenses on _HTTPSource."
+geotiff,2026-05-18,,MEDIUM,1,,"Re-audit pass 18 2026-05-18 (deep-sweep p1). MEDIUM Cat 1 fixed in deep-sweep-security-geotiff-2026-05-18-p1: read_geotiff_gpu eager path (_backends/gpu.py) now applies the same _max_tile_bytes_from_env() per-tile cap that _read_tiles and _fetch_decode_cog_http_tiles enforce. The CPU and GPU readers now agree on the per-tile budget; a malformed local TIFF with TileByteCounts pointing into a large file region is rejected before GPU decode rather than relying on _check_gpu_memory's aggregate-sum guard. Test: tests/test_gpu_tile_byte_cap_2026_05_18.py. Other categories verified clean: JPEG bomb cap (#1792), HTTP read_all byte budget (#2057), VRT XML cap, DOCTYPE rejection, path containment, SSRF, validate_tile_layout, dimension caps, IFD entry caps, MAX_IFDS, MAX_PIXEL_ARRAY_COUNT, GPU bounds guards, atomic writes, realpath canonicalization, dtype validation."
 glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
 gpu_rtx,2026-04-29,1308,HIGH,1,,"HIGH (fixed #1308 / PR #1310): hillshade_rtx (gpu_rtx/hillshade.py:184) and viewshed_gpu (gpu_rtx/viewshed.py:269) allocated cupy device buffers sized by raster shape with no memory check. create_triangulation (mesh_utils.py:23-24) adds verts (12 B/px) + triangles (24 B/px) = 36 B/px; hillshade_rtx adds d_rays(32) + d_hits(16) + d_aux(12) + d_output(4) = 64 B/px (100 B/px total); viewshed_gpu adds d_rays(32) + d_hits(16) + d_visgrid(4) + d_vsrays(32) = 84 B/px (120 B/px total). A 30000x30000 raster asked for 90-108 GB of VRAM before cupy surfaced an opaque allocator error. Fixed by adding gpu_rtx/_memory.py with _available_gpu_memory_bytes() and _check_gpu_memory(func_name, h, w) helpers (cost_distance #1262 / sky_view_factor #1299 pattern, 120 B/px budget covers worst case, raises MemoryError when required > 50% of free VRAM, skips silently when memGetInfo() unavailable). Wired into both entry points after the cupy.ndarray type check and before create_triangulation. 9 new tests in test_gpu_rtx_memory.py (5 helper-unit + 4 end-to-end gated on has_rtx). All 81 existing hillshade/viewshed tests still pass. Cat 4 clean: all CUDA kernels (hillshade.py:25/62/106, viewshed.py:32/74/116, mesh_utils.py:50) have bounds guards; no shared memory, no syncthreads needed. MEDIUM not fixed (Cat 6): hillshade_rtx and viewshed_gpu do not call _validate_raster directly but parent hillshade() (hillshade.py:252) and viewshed() (viewshed.py:1707) already validate, so input validation runs before the gpu_rtx entry point - defense-in-depth, not exploitable. MEDIUM not fixed (Cat 2): mesh_utils.py:64-68 cast mesh_map_index to int32 in the triangle index buffer; overflows at H*W > 2.1B vertices (~46341x46341+) but the new memory guard rejects rasters that large first - documentation/clarity item rather than exploitable. MEDIUM not fixed (Cat 3): mesh_utils.py:19 scale = maxDim / maxH divides by zero on an all-zero raster, propagating inf/NaN into mesh vertex z-coords; separate follow-up. LOW not fixed (Cat 5): mesh_utils.write() opens user-supplied path without canonicalization but its only call site (mesh_utils.py:38-39) sits behind if False: in create_triangulation, not reachable in production."
 hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64."

diff --git a/xrspatial/geotiff/_backends/gpu.py b/xrspatial/geotiff/_backends/gpu.py
@@ -254,7 +254,7 @@ def read_geotiff_gpu(source: str, *,
 
     from .._reader import (
         _FileSource, _check_dimensions, MAX_PIXELS_DEFAULT, _coerce_path,
-        _resolve_masked_fill,
+        _max_tile_bytes_from_env, _resolve_masked_fill,
     )
     from .._compression import COMPRESSION_LERC
     from .._header import (
@@ -488,6 +488,33 @@ def read_geotiff_gpu(source: str, *,
         # read OOB otherwise. See issue #1219.
         validate_tile_layout(ifd)
 
+        # Per-tile compressed-byte cap, matching the CPU paths
+        # ``_read_tiles`` and ``_fetch_decode_cog_http_tiles`` apply
+        # via the same env var (issue #1664). ``validate_tile_layout``
+        # bounds the offsets array length but not the byte_counts
+        # entries; a crafted ``TileByteCounts`` value can still ask
+        # the GPU pipeline to fetch and decompress a multi-hundred-MB
+        # tile that the CPU paths would already refuse. The
+        # ``_check_gpu_memory`` guard in the downstream kvikio /
+        # nvCOMP paths runs against ``sum(byte_counts)`` so it only
+        # catches the extreme aggregate case; this loop closes the
+        # per-tile asymmetry between the CPU and GPU readers. Sparse
+        # tiles (``byte_count == 0``) pass under any positive cap by
+        # design -- they carry no compressed bytes to decode and the
+        # CPU mirror at ``_reader.py`` does the same.
+        max_tile_bytes = _max_tile_bytes_from_env()
+        for tile_idx, bc in enumerate(byte_counts):
+            if bc > max_tile_bytes:
+                raise ValueError(
+                    f"TIFF tile {tile_idx} declares "
+                    f"TileByteCount={bc:,} bytes, which exceeds the "
+                    f"per-tile safety cap of {max_tile_bytes:,} bytes. "
+                    f"The file is malformed or attempting "
+                    f"denial-of-service. Override via "
+                    f"XRSPATIAL_COG_MAX_TILE_BYTES if this file is "
+                    f"legitimate."
+                )
+
     finally:
         src.close()
 
@@ -935,12 +962,56 @@ def _read_geotiff_gpu_chunked(source, *, dtype, chunks, overview_level,
     """
     import cupy
 
-    from .._reader import _FileSource, _coerce_path
+    from .._reader import (
+        _FileSource, _coerce_path, _max_tile_bytes_from_env,
+    )
     from .._header import parse_header, parse_all_ifds, select_overview_ifd
     from .._geotags import extract_geo_info_with_overview_inheritance
 
     src_path = _coerce_path(source)
 
+    # Per-tile compressed-byte cap, mirroring the eager GPU path and
+    # the CPU readers (issue #1664 + the GPU eager fix in this PR).
+    # The chunked dask + GPU path either qualifies for the GDS fast
+    # path (handled in ``_read_geotiff_gpu_chunked_gds`` which runs
+    # the same cap on its own metadata parse) or falls through to
+    # ``read_geotiff_dask`` whose per-chunk ``read_to_array`` calls
+    # apply the cap inside the CPU reader. The check here closes the
+    # window between "qualification probe parses the IFDs" and "the
+    # dispatch decides which path to take" so a forged tile is
+    # rejected at graph-build time rather than at first ``.compute()``.
+    # Sparse tiles (``byte_count == 0``) pass under any positive cap
+    # by design.
+    if isinstance(src_path, str) and not src_path.startswith(
+            ('http://', 'https://')):
+        try:
+            _cap_fs = _FileSource(src_path)
+            try:
+                _cap_raw = _cap_fs.read_all()
+            finally:
+                _cap_fs.close()
+            _cap_header = parse_header(_cap_raw)
+            _cap_ifds = parse_all_ifds(_cap_raw, _cap_header)
+            _cap_ifd = select_overview_ifd(_cap_ifds, overview_level)
+            _cap_byte_counts = _cap_ifd.tile_byte_counts
+        except Exception:
+            # If metadata parse fails here, the downstream path will
+            # surface a clear error; do not double-report.
+            _cap_byte_counts = None
+        if _cap_byte_counts is not None:
+            _cap = _max_tile_bytes_from_env()
+            for _tile_idx, _bc in enumerate(_cap_byte_counts):
+                if _bc > _cap:
+                    raise ValueError(
+                        f"TIFF tile {_tile_idx} declares "
+                        f"TileByteCount={_bc:,} bytes, which exceeds "
+                        f"the per-tile safety cap of {_cap:,} bytes. "
+                        f"The file is malformed or attempting "
+                        f"denial-of-service. Override via "
+                        f"XRSPATIAL_COG_MAX_TILE_BYTES if this file "
+                        f"is legitimate."
+                    )
+
     # Try the disk->GPU path. Parse metadata once; if the file does not
     # qualify, fall through to the CPU-decode path. Any unexpected
     # exception during the qualification probe also falls through so we
@@ -1026,7 +1097,8 @@ def _read_geotiff_gpu_chunked_gds(source, ifd, geo_info, header, *,
     import dask.array as da_mod
 
     from .._reader import (
-        _check_dimensions, MAX_PIXELS_DEFAULT, _resolve_masked_fill,
+        _check_dimensions, MAX_PIXELS_DEFAULT,
+        _max_tile_bytes_from_env, _resolve_masked_fill,
     )
     from .._compression import COMPRESSION_LERC
     from .._header import validate_tile_layout
@@ -1053,6 +1125,27 @@ def _read_geotiff_gpu_chunked_gds(source, ifd, geo_info, header, *,
     _check_dimensions(tw, th, samples, max_pixels)
     validate_tile_layout(ifd)
 
+    # Per-tile compressed-byte cap, mirroring the eager GPU path's loop
+    # (issue #1664 + the original eager fix above). The chunked GDS
+    # graph fans tile reads out across dask tasks, so a forged
+    # ``TileByteCount`` would otherwise slip past every task's GDS
+    # request and the downstream ``_check_gpu_memory`` guard, which
+    # only catches the aggregate sum. Running the check here means the
+    # dask graph never builds for a hostile file. Sparse tiles
+    # (``byte_count == 0``) pass under any positive cap by design.
+    max_tile_bytes = _max_tile_bytes_from_env()
+    for tile_idx, bc in enumerate(byte_counts):
+        if bc > max_tile_bytes:
+            raise ValueError(
+                f"TIFF tile {tile_idx} declares "
+                f"TileByteCount={bc:,} bytes, which exceeds the "
+                f"per-tile safety cap of {max_tile_bytes:,} bytes. "
+                f"The file is malformed or attempting "
+                f"denial-of-service. Override via "
+                f"XRSPATIAL_COG_MAX_TILE_BYTES if this file is "
+                f"legitimate."
+            )
+
     # Window restricts the visible region; offsets are computed relative
     # to the windowed origin so chunks line up with the user's request.
     if window is not None:

diff --git a/xrspatial/geotiff/tests/_tiff_surgery.py b/xrspatial/geotiff/tests/_tiff_surgery.py
@@ -0,0 +1,75 @@
+"""In-place TIFF byte-surgery helpers shared by security-cap tests.
+
+The local strip / tile byte-cap tests and the GPU per-tile byte-cap
+test both need to forge a TIFF whose declared ``TileByteCounts`` (tag
+325) or ``StripByteCounts`` (tag 279) entries exceed the production
+cap. They each parse the leading IFD and rewrite every matching tag's
+value array in place. Keeping two near-identical copies of that
+surgery in two test files invited drift, so the helpers now live here.
+
+Not part of the public API; used only by the test suite.
+"""
+from __future__ import annotations
+
+import struct
+
+
+def patch_byte_counts(data: bytearray, tag: int, value: int) -> None:
+    """Rewrite every entry for *tag* in the first IFD of *data*.
+
+    Parameters
+    ----------
+    data : bytearray
+        Mutable TIFF file bytes (entire file). Mutated in place.
+    tag : int
+        ``325`` for ``TileByteCounts`` or ``279`` for ``StripByteCounts``.
+        Other tags work mechanically but the helper exists for those two.
+    value : int
+        New value to stamp into every byte-count entry. For ``SHORT``
+        (type 3) entries the value is clipped to ``0xFFFF`` because the
+        on-disk slot is 16-bit; tests that need a multi-MB value must
+        ensure the source file was written with a ``LONG`` (type 4) tag.
+
+    Raises
+    ------
+    AssertionError
+        When ``tag`` is not present in the first IFD.
+    """
+    from xrspatial.geotiff._header import parse_header
+
+    header = parse_header(bytes(data))
+    bo = header.byte_order
+    ifd_offset = header.first_ifd_offset
+    num_entries = struct.unpack_from(f"{bo}H", data, ifd_offset)[0]
+    entry_offset = ifd_offset + 2
+
+    for i in range(num_entries):
+        eo = entry_offset + i * 12
+        cur_tag = struct.unpack_from(f"{bo}H", data, eo)[0]
+        if cur_tag != tag:
+            continue
+        type_id = struct.unpack_from(f"{bo}H", data, eo + 2)[0]
+        count = struct.unpack_from(f"{bo}I", data, eo + 4)[0]
+        if type_id == 4:  # LONG
+            total = count * 4
+            if total <= 4:
+                for k in range(count):
+                    struct.pack_into(f"{bo}I", data, eo + 8 + k * 4, value)
+            else:
+                ptr = struct.unpack_from(f"{bo}I", data, eo + 8)[0]
+                for k in range(count):
+                    struct.pack_into(f"{bo}I", data, ptr + k * 4, value)
+        elif type_id == 3:  # SHORT
+            clipped = min(value, 0xFFFF)
+            total = count * 2
+            if total <= 4:
+                for k in range(count):
+                    struct.pack_into(
+                        f"{bo}H", data, eo + 8 + k * 2, clipped)
+            else:
+                ptr = struct.unpack_from(f"{bo}I", data, eo + 8)[0]
+                for k in range(count):
+                    struct.pack_into(
+                        f"{bo}H", data, ptr + k * 2, clipped)
+        return
+    raise AssertionError(f"tag {tag} not found in IFD")