diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv index b497552f..0f1ae38d 100644 --- a/.claude/sweep-security-state.csv +++ b/.claude/sweep-security-state.csv @@ -30,11 +30,13 @@ normalize,2026-04-27,,,,,"Clean. Both rescale and standardize handle the constan pathfinding,2026-04-22,,MEDIUM,1;6,,"No CRITICAL/HIGH findings. Cat 1 already well-guarded: _check_memory(h, w) runs before every numpy/cupy _a_star_search allocation and covers the ~65 bytes/pixel footprint (parent_ys/xs int64, g_cost f64, visited i8, three heap arrays h_keys/h_rows/h_cols sized h*w). Auto-radius falls back when full grid exceeds 50% RAM, HPA* kicks in for long paths. Dask path uses sparse dict/set and on-demand chunk cache so no full-grid numpy materialisation. No CUDA kernels (cupy backend transfers to CPU). No file-path I/O from user input (only /proc/meminfo read). MEDIUM (unfixed, Cat 1): multi_stop_search does not cap len(waypoints); _optimize_waypoint_order builds an O(N^2) dist matrix and runs N^2 A* calls, and _nearest_neighbor_2opt is O(N^3), so pathological waypoint lists can cause extreme CPU consumption (DoS). MEDIUM (unfixed, Cat 6): a_star_search and multi_stop_search do not call _validate_raster(surface) -- only ndim==2 is checked, dtype is not; non-numeric dtypes would fail inside numba with confusing errors rather than a clean TypeError. int64 overflow in height*width (line 214 max_heap) is not reachable given the memory guard (~46340x46340 would already raise MemoryError long before 2**63)." perlin,2026-04-22,1232,HIGH,6,,"HIGH (fixed #1232): perlin() accepted integer-dtyped DataArrays via _validate_raster, but all four backends write float noise into the input buffer in place, then normalize by ptp. With integer storage the float values cast to 0, ptp=0, and the div-by-zero produced NaN/Inf that cast back to INT_MIN on every pixel. Fixed by adding an np.issubdtype(agg.dtype, np.floating) check in perlin() that raises ValueError. MEDIUM (unfixed follow-up): _perlin_numpy/_perlin_cupy/_perlin_dask_numpy/_perlin_dask_cupy all divide by ptp/(max-min) with no zero guard, so degenerate inputs like freq=(0,0) still emit NaN through the normalization step. GPU kernels have bounds guards, shared memory is fixed-size 512 int32 (not user-influenced), cuda.syncthreads() is present after the cooperative load. No file I/O." polygon_clip,2026-04-27,,,,,"Clean. Module is a raster mask-and-clip wrapper -- not a Sutherland-Hodgman polygon-vs-polygon clipper. It resolves a shapely geometry into polygon pairs, optionally crops to bbox, delegates mask construction to xrspatial.rasterize (which has its own memory guards), and applies via xarray.where. No manual line-segment intersection, no recursive clip amplification, no float division on user vertices. Cat 1: list(geometry) materializes the user iterable but the dominant memory cost is the rasterize-built mask which is already bounded by guarded raster size. Cat 2: no integer math. Cat 3: NaN bounds from degenerate geometry are caught by the does-not-overlap ValueError (line 93 _crop_to_bbox); shapely raises GEOSException on malformed input. Cat 4 N/A: no CUDA kernels. Cat 5: dynamic geopandas/shapely.ops imports are import-name strings, not user paths. Cat 6: _validate_raster called with default numeric=True; integer raster + np.nan nodata silently coerces but is a UX nit, not a security issue. Vertex amplification attack surface lives in shapely, not here." +polygonize,2026-04-28,,MEDIUM,1;6,,"Cat 1 MEDIUM: no MemoryError guard like other modules; _calculate_regions allocations (regions uint32, visited uint8, region_lookup that doubles up to 2*uint32_max=~32GB worst case) are all O(N) for input size N, and runtime check at line 328 raises RuntimeError when region count hits uint32 max. Working set is inherent to algorithm, no caller-controlled amplification. Cat 2: flat indices ij=i+j*nx done in numba int64, no overflow possible for realistic dimensions. Region IDs in uint32 with explicit max-region check (line 328). Cat 3: NaN handling correct: numpy backend masks NaN (line 555-560), cupy backend masks NaN (line 605-611), point_in_ring divisions guarded by sign-test, _perpendicular_distance has len_sq==0 guard (line 937). Cat 4: no custom CUDA kernels, uses cupyx.scipy.ndimage.label. Cat 5: no file I/O; _to_geojson returns dict. Cat 6 MEDIUM: polygonize() does not call _validate_raster(), only its own ndim check (line 1623). Missing numeric-dtype check, but generated_jit _is_close handles int/float separately and dtype confusion produces clean errors not silent wrong results. Not fixed (MEDIUM only)." proximity,2026-04-22,,,,,"Clean. Public APIs (proximity/allocation/direction) all call _validate_raster. GPU kernel _proximity_cuda_kernel has bounds guard at lines 359-360. Dask KDTree path has explicit memory guards (lines 897-903 result array, 1297-1312 unbounded distance fallback, 681-682 cache budget). Index math uses np.int64 for pan_near_x/pan_near_y, target_counts, y_offsets/x_offsets -- no int32 overflow risk. Target detection filters NaN via np.isfinite (lines 533, 657). _calc_direction guards x1==x2 & y1==y2 before arctan2. No file I/O. LOW (not flagged): line 1235 pad_y/pad_x omit abs() while line 437 uses it -- minor inconsistency, not exploitable." rasterize,2026-04-21,1223,HIGH,1;2,,HIGH: unbounded out/written allocation in _run_numpy/_run_cupy driven by user-supplied width/height/resolution (no cap). MEDIUM (unfixed): _build_row_csr_numba total=row_ptr[height] is int32 and can wrap for very tall rasters with many long edges. reproject,2026-04-17,,MEDIUM,1;3,, resample,2026-04-28,1295,HIGH,1,,"HIGH (fixed #1295): resample() did not bound output dimensions derived from user-supplied scale_factor / target_resolution. _output_shape returns max(1, round(in_h * scale_y)), max(1, round(in_w * scale_x)) and was passed straight through to the eager numpy / cupy backends, where _run_numpy and _run_cupy / the _AGG_FUNCS numba kernels and _nan_aware_interp_np allocated np.empty / cupy.empty / map_coordinates buffers of that size with no memory check. scale_factor=1e9 on a 4x4 raster requested ~190 EB; target_resolution=1e-9 on a meter-scale raster did the same. Fixed by adding _available_memory_bytes() / _available_gpu_memory_bytes() helpers and _check_resample_memory(out_h, out_w) / _check_resample_gpu_memory(out_h, out_w) guards (12 B/cell budget covering float64 working buffer + float32 output + map_coordinates temporary), wired into resample() before backend dispatch. Eager numpy and cupy paths run the guard; dask paths skip it because per-chunk allocations are bounded by chunk size. Mirrors the kde / line_density (#1287), focal (#1284), geodesic (#1283), cost_distance (#1262), and diffuse (#1267) patterns. No other findings: _validate_raster called at line 698, scale_y > 0 / scale_x > 0 enforced, AGGREGATE_METHODS rejects scale > 1.0, identity fast path bypasses dispatch entirely, all numba kernels guard count > 0 before division, no CUDA kernels (cupy paths use cupy ufuncs + cupyx.scipy.ndimage), no file I/O, all backends cast to float64 before computation and float32 on output." sieve,2026-04-28,1296,HIGH,1,,"HIGH (fixed #1296): sieve() on numpy and cupy backends had no memory guard. _label_connected allocates parent (int32, 4B/px), rank (int32, 4B/px, reused as root_to_id), region_map_flat (int32, 4B/px), plus a float64 result copy (8B/px) ~ 20 B/pixel of working memory before any check. The dask paths (_sieve_dask line 343 and _sieve_dask_cupy line 366) already raised MemoryError via _available_memory_bytes() at 28 B/pixel budget, but the public sieve() API at line 489 dispatched np.ndarray inputs straight into _sieve_numpy with no guard, and _sieve_cupy at line 308 transferred to host via data.get() then called _sieve_numpy, inheriting the gap. A 50000x50000 numpy raster requested ~50 GB silently. Fixed by extracting _check_memory(rows, cols) and _check_gpu_memory(rows, cols) helpers (mirrors cost_distance #1262 / mahalanobis #1288 / multispectral #1291 / kde #1287 pattern) at 28 B/pixel host budget plus 16 B/pixel GPU round-trip budget at 50% of available memory threshold. _check_memory wired into _sieve_numpy at the top before the float64 copy. _check_gpu_memory wired into _sieve_cupy before data.get(); it also calls _check_memory so the host budget still applies. Consolidated _available_memory_bytes definition (was duplicated). All 47 tests pass including 2 new memory-guard tests for the numpy backend (_sieve_numpy direct call + public sieve() API). No other findings: Cat 2 int32 indexing in _label_connected docstring acknowledges <2.1B pixel limit; the new memory guard rejects rasters that large before the int32 issue can trigger so this is a documentation/clarity follow-up rather than an exploitable bug. Cat 3 NaN handled via valid mask; Cat 4 no CUDA kernels; Cat 5 only /proc/meminfo read; Cat 6 _validate_raster called at line 478." +slope,2026-04-28,,,,,"Clean. slope() validates input via _validate_raster (line 383) and _validate_boundary (line 389). Cat 1: planar _cpu/_run_cupy allocate output matching input shape; geodesic paths build (3,H,W) float64 stacked array but are gated by _check_geodesic_memory(rows, cols) at line 410 (already fixed under geodesic audit, PR #1285). Cat 2: no int32 flat-index math; all loops 2D with range(). Cat 3: NaN propagates through arctan in planar kernels; geodesic delegates to _local_frame_project_and_fit which has explicit NaN guards and degenerate det check. Cat 4: _run_gpu (line 146) uses combined bounds+stencil guard 'i-di>=0 and i+di=0 and j+dj ~2B elements in the numpy path. MEDIUM (unfixed): hypsometric_integral() skips _validate_raster on zones/values; _regions_numpy has no memory guard (numpy-only path, bounded by caller-allocated input). MEDIUM (unfixed): _stats_numpy return_type='xarray.DataArray' allocates np.full((n_stats, values.size)) with no guard." diff --git a/xrspatial/terrain_metrics.py b/xrspatial/terrain_metrics.py index 6bcec50e..74caf605 100644 --- a/xrspatial/terrain_metrics.py +++ b/xrspatial/terrain_metrics.py @@ -403,6 +403,46 @@ def roughness(agg: xr.DataArray, # TPI at arbitrary radius (helpers for landforms) # --------------------------------------------------------------------------- +def _available_memory_bytes(): + """Best-effort estimate of available memory in bytes.""" + # Try /proc/meminfo (Linux) + try: + with open('/proc/meminfo', 'r') as f: + for line in f: + if line.startswith('MemAvailable:'): + return int(line.split()[1]) * 1024 + except (OSError, ValueError, IndexError): + pass + # Try psutil + try: + import psutil + return psutil.virtual_memory().available + except (ImportError, AttributeError): + pass + # Fallback: 2 GB + return 2 * 1024 ** 3 + + +def _check_kernel_memory(radius, param_name='radius'): + """Raise MemoryError if a circular kernel of *radius* won't fit in RAM. + + ``_circular_kernel`` allocates a ``(2*radius+1, 2*radius+1)`` float64 + array. ``np.ogrid`` plus the boolean comparison adds intermediates + of similar size, so budget ~16 bytes per cell. + """ + side = 2 * int(radius) + 1 + cells = side * side + required = cells * 16 + available = _available_memory_bytes() + if required > 0.5 * available: + raise MemoryError( + f"{param_name}={radius} implies a {side}x{side} kernel that " + f"needs ~{required / 1e9:.1f} GB, but only " + f"{available / 1e9:.1f} GB is available. " + f"Use a smaller {param_name}." + ) + + def _circular_kernel(radius): """Circular boolean kernel with center excluded.""" y, x = np.ogrid[-radius:radius + 1, -radius:radius + 1] @@ -534,6 +574,12 @@ def landforms(agg: xr.DataArray, 10 Mountain top / high ridge == ================================= + Raises + ------ + MemoryError + If ``inner_radius`` or ``outer_radius`` would require a kernel + larger than half of the available memory. + References ---------- Weiss, A. (2001). Topographic Position and Landforms Analysis. @@ -550,6 +596,11 @@ def landforms(agg: xr.DataArray, f"outer_radius ({outer_radius}) must be greater than " f"inner_radius ({inner_radius})") + # Guard against unbounded kernel allocations. Both radii flow into + # ``_circular_kernel`` which allocates a (2r+1)^2 float64 array. + _check_kernel_memory(inner_radius, param_name='inner_radius') + _check_kernel_memory(outer_radius, param_name='outer_radius') + # 1. TPI at two scales tpi_s = _compute_tpi_at_radius(agg, inner_radius) tpi_l = _compute_tpi_at_radius(agg, outer_radius) diff --git a/xrspatial/tests/test_terrain_metrics.py b/xrspatial/tests/test_terrain_metrics.py index ee5cc623..0764e49d 100644 --- a/xrspatial/tests/test_terrain_metrics.py +++ b/xrspatial/tests/test_terrain_metrics.py @@ -431,6 +431,23 @@ def test_landforms_invalid_outer_radius(): landforms(agg, inner_radius=5, outer_radius=3) +def test_landforms_outer_radius_memory_guard(): + """Issue #1302: huge outer_radius must raise MemoryError before + allocating the circular kernel.""" + data = np.ones((10, 10), dtype=np.float64) + agg = create_test_raster(data) + with pytest.raises(MemoryError, match="outer_radius=200000"): + landforms(agg, inner_radius=3, outer_radius=200000) + + +def test_landforms_inner_radius_memory_guard(): + """Issue #1302: huge inner_radius must also raise MemoryError.""" + data = np.ones((10, 10), dtype=np.float64) + agg = create_test_raster(data) + with pytest.raises(MemoryError, match="inner_radius=200000"): + landforms(agg, inner_radius=200000, outer_radius=300000) + + def test_landforms_output_shape_and_attrs(): data = np.random.default_rng(42).random((30, 40)) * 100 agg = create_test_raster(data)