diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv index a1fccf78..def4fb18 100644 --- a/.claude/sweep-security-state.csv +++ b/.claude/sweep-security-state.csv @@ -24,6 +24,7 @@ hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(d hydro,2026-04-17,,MEDIUM,1;3;6,, kde,2026-04-27,1287,HIGH,1,,"HIGH (fixed #1287): kde() and line_density() accepted user-controlled width/height with no upper bound. The eager numpy and cupy backends allocated np.zeros((height, width), dtype=float64) (or cupy.zeros) up front (kde.py: _run_kde_numpy line 308, _run_kde_cupy line 314, line_density inline at line 706). width=1_000_000, height=1_000_000 requested ~8 TB of float64 (or VRAM on the GPU path) before any check ran. Fixed by adding local _available_memory_bytes() helper (mirrors convolution/morphology/bump pattern) and _check_grid_memory(rows, cols) that raises MemoryError when rows*cols*8 exceeds 50% of available RAM. Wired into kde() (skipped for dask paths since _run_kde_dask_numpy/_run_kde_dask_cupy build per-tile via da.from_delayed and are bounded by chunk size) and line_density() (single numpy backend, always guarded). Error message names width/height so the caller knows which knob to turn. No other HIGH findings: Cat 2 (no int32 flat-index math, numba range loops are int64), Cat 3 (bandwidth <= 0 rejected, Silverman fallback returns 1.0 when sigma==0, NaN coords clamp to empty range via min/max), Cat 4 (_kde_cuda has 'if r >= rows or c >= cols: return' bounds guard at line 254, no shared memory, each thread writes own pixel), Cat 5 (no file I/O), Cat 6 (template only used for shape/coords, output dtype forced to float64). MEDIUM (unfixed, Cat 6): _validate_template only checks DataArray + ndim; does not call _validate_raster, but template dtype does not affect compute correctness here." mahalanobis,2026-04-27,1288,HIGH,1,,"HIGH (fixed #1288): mahalanobis() had no memory guard. Both _compute_stats_numpy/_compute_stats_cupy and _mahalanobis_pixel_numpy/_mahalanobis_pixel_cupy materialise float64 buffers of shape (n_bands, H*W) -- the np.stack at line 45/80, the reshape+transpose at line 184 (which forces a contiguous BLAS copy), the centered diff, and the diff @ inv_cov result are all live at peak. A 100kx100k 5-band raster projected to ~400 GB of host memory just for the stack. Fixed by adding _available_memory_bytes()/_available_gpu_memory_bytes() (mirroring cost_distance.py:261-292) plus _check_memory/_check_gpu_memory at 32 bytes/cell/band budget, and wiring them into the public mahalanobis() entry point before any np.stack runs. Eager paths (numpy, cupy) are guarded; dask paths skip the check because chunks are bounded by user-supplied chunksize. MEDIUM (unfixed, Cat 6): mahalanobis() does not call _validate_raster on each band -- validate_arrays only enforces matching shape and array-type, so boolean / non-numeric DataArrays silently coerce. Deferred to a separate PR per the security-sweep one-fix-per-PR policy. No other HIGH findings: Cat 2 (no int32 indexing, numpy default int64), Cat 3 (singular covariance raises a clean ValueError, dist_sq is clamped to 0 before sqrt to absorb numerical noise, NaN mask propagates correctly), Cat 4 (no CUDA kernels), Cat 5 (no file I/O beyond /proc/meminfo)." +mcda,2026-04-29,1311,HIGH,3,,Cat 3: NaN/Inf weights silently pass _validate_weights (combine.py:35-39) and owa order_weights check (combine.py:154-158) because abs(NaN-1.0) > 0.01 is False; produces all-NaN raster. Same shape of bug in ahp_weights (weights.py:94) where val<=0 lets NaN slip past. Fixed in #1311 with explicit np.isfinite checks. MEDIUM Cat 1 noted: sensitivity._monte_carlo eagerly computes full dask Dataset; combine.owa stacks all criteria via xr.concat without size guard. MEDIUM Cat 3 noted: sensitivity n_samples=0 divides by zero; wpm permits zero-base/negative-weight without bounds check. No CUDA kernels (Cat 4 N/A); no file I/O (Cat 5 N/A); no int32 index math (Cat 2 N/A). morphology,2026-04-24,1256,HIGH,1,,"HIGH (fixed #1256): morph_erode/morph_dilate/morph_opening/morph_closing/morph_gradient/morph_white_tophat/morph_black_tophat accepted a user-supplied kernel with only shape/dtype/odd-size validation. Kernel dimensions drove np.pad/cp.pad on every backend and map_overlap depth on dask paths; a 99999x99999 kernel on a 1000x1000 raster would try to allocate ~80 GB of padded float64 memory with no warning. Fixed by adding local _available_memory_bytes() helper and _check_kernel_memory(rows, cols, ky, kx) that raises MemoryError before allocation when padded size exceeds 50% of available RAM; wired into _dispatch() so every public API entry point is guarded across all four backends. Mirrors bilateral #1236, convolution #1241, bump #1231. No other HIGH findings: Cat 2 (loop indices are Python ints, numba promotes to int64), Cat 3 (NaN propagation explicit via v!=v in both numpy and CUDA paths, tests verify), Cat 4 (GPU kernels _erode_gpu/_dilate_gpu have if i=0 and i+di=0 and j+dj0, weight in ridged mode, w_noise in worley mode, plus a tmp buffer on GPU); for a 30000x30000 caller-allocated template (~7 GB float64) the marginal scratch footprint is ~14-28 GB without a memory check. The dask + worley path (line 197 _terrain_dask_numpy, line 466 _terrain_dask_cupy) calls dask.persist(raw_worley) which materialises the entire dask array in worker memory before computing min/max for the worley_norm_range pre-pass, defeating chunked processing when worley_blend>0. Cat 2 N/A: no flat-index math in this module (perlin/worley kernels live elsewhere). Cat 3 MEDIUM (unfixed): octaves is bounded >= 1 (line 594) but lacunarity and persistence are not validated; lacunarity**i with user-supplied octaves could overflow to inf for large octaves -- exploitable only via explicit large octaves which is also bounded by user. Division by w_ptp guarded at line 110 (numpy) and line 364 (gpu). Division by warp_norm at line 67 / 184 / 289 / 451 is safe because persistence**0=1 always contributes >= 1 to the sum and warp_octaves >=1 in practice (no validation but caller-controlled). Cat 4 N/A: no CUDA kernels owned by this module; perlin/worley GPU kernels are imported from sibling modules. Cat 5 N/A: no file I/O. Cat 6: _validate_raster runs on agg; noise_mode validated against {'fbm','ridged'}; numeric scalars (seed, zfactor, lacunarity, persistence, warp_strength, warp_octaves, worley_blend, worley_seed) accepted without _validate_scalar but mostly cosmetic since arithmetic naturally rejects bad types. MEDIUM not fixed per task instructions." sky_view_factor,2026-04-28,1299,HIGH,1,,"Unbounded numpy/cupy allocation; fixed via _check_memory and _check_gpu_memory guards (16 B/pixel, 50% threshold). Dask paths skip the guard." +slope,2026-04-28,,,,,"Clean. slope() validates input via _validate_raster (line 383) and _validate_boundary (line 389). Cat 1: planar _cpu/_run_cupy allocate output matching input shape; geodesic paths build (3,H,W) float64 stacked array but are gated by _check_geodesic_memory(rows, cols) at line 410 (already fixed under geodesic audit, PR #1285). Cat 2: no int32 flat-index math; all loops 2D with range(). Cat 3: NaN propagates through arctan in planar kernels; geodesic delegates to _local_frame_project_and_fit which has explicit NaN guards and degenerate det check. Cat 4: _run_gpu (line 146) uses combined bounds+stencil guard 'i-di>=0 and i+di=0 and j+dj0, weight in ridged mode, w_noise in worley mode, plus a tmp buffer on GPU); for a 30000x30000 caller-allocated template (~7 GB float64) the marginal scratch footprint is ~14-28 GB without a memory check. The dask + worley path (line 197 _terrain_dask_numpy, line 466 _terrain_dask_cupy) calls dask.persist(raw_worley) which materialises the entire dask array in worker memory before computing min/max for the worley_norm_range pre-pass, defeating chunked processing when worley_blend>0. Cat 2 N/A: no flat-index math in this module (perlin/worley kernels live elsewhere). Cat 3 MEDIUM (unfixed): octaves is bounded >= 1 (line 594) but lacunarity and persistence are not validated; lacunarity**i with user-supplied octaves could overflow to inf for large octaves -- exploitable only via explicit large octaves which is also bounded by user. Division by w_ptp guarded at line 110 (numpy) and line 364 (gpu). Division by warp_norm at line 67 / 184 / 289 / 451 is safe because persistence**0=1 always contributes >= 1 to the sum and warp_octaves >=1 in practice (no validation but caller-controlled). Cat 4 N/A: no CUDA kernels owned by this module; perlin/worley GPU kernels are imported from sibling modules. Cat 5 N/A: no file I/O. Cat 6: _validate_raster runs on agg; noise_mode validated against {'fbm','ridged'}; numeric scalars (seed, zfactor, lacunarity, persistence, warp_strength, warp_octaves, worley_blend, worley_seed) accepted without _validate_scalar but mostly cosmetic since arithmetic naturally rejects bad types. MEDIUM not fixed per task instructions." viewshed,2026-04-22,1229,HIGH,1,,"HIGH (fixed #1229): _viewshed_cpu allocated ~500 bytes/pixel of working memory (event_list 3*H*W*7*8 bytes + status_values/status_struct/idle + visibility_grid + lexsort temporary) with no guard. A 20000x20000 raster tried to allocate ~200 GB. Fixed by adding peak-memory guard mirroring the _viewshed_dask pattern (_available_memory_bytes() check, raises MemoryError with max_distance= hint). No other HIGH findings: dask path already guarded, _validate_raster is called, distance-sweep uses dtype=float64, _calc_dist_n_grad guards zero distance." visibility,2026-04-28,,,,,"Clean. line_of_sight (line 190) and cumulative_viewshed (line 259) call _validate_raster; visibility_frequency delegates. Cat 1: cumulative_viewshed allocates int32 accumulator (4 B/px) but delegates per-observer to viewshed() which has 500 B/px memory guard at viewshed.py:1523-1531; viewshed will fail first on oversize rasters. _bresenham_line (line 35) and _los_kernel (lines 112-143) bounded by transect length (<=W+H+1). Cat 2: int64 throughout, no int32 overflow path. Cat 3: divisions in _los_kernel guarded (D==0 in _fresnel_radius_1 line 87, distance[i]==0 continue line 133, total_dist>0 check line 123); NaN elevation at observer cell would taint los_height but is a correctness not DoS concern. Cat 4: no CUDA kernels. Cat 5: no file I/O. Cat 6: elevations cast to float64 in _extract_transect line 79." diff --git a/xrspatial/mcda/combine.py b/xrspatial/mcda/combine.py index 52fef03a..538a471f 100644 --- a/xrspatial/mcda/combine.py +++ b/xrspatial/mcda/combine.py @@ -19,6 +19,30 @@ def _validate_criteria(criteria: xr.Dataset) -> None: raise ValueError("criteria Dataset has no variables") +def _check_finite_weights(weights: dict[str, float], label: str) -> None: + """Reject weight dicts that contain NaN or infinite values. + + Without this guard, a single NaN weight slips past the sum-to-1 check + (``abs(NaN - 1.0) > 0.01`` is False) and propagates through every + pixel, returning an all-NaN raster with no error. + """ + bad = [] + for name, value in weights.items(): + try: + v = float(value) + except (TypeError, ValueError): + raise ValueError( + f"{label}[{name!r}] must be a real number, got {value!r}" + ) + if not np.isfinite(v): + bad.append((name, v)) + if bad: + details = ", ".join(f"{n!r}={v}" for n, v in bad) + raise ValueError( + f"{label} must be finite; got non-finite value(s): {details}" + ) + + def _validate_weights( weights: dict[str, float], criteria: xr.Dataset ) -> None: @@ -32,6 +56,7 @@ def _validate_weights( raise ValueError( f"Weights contain keys not in criteria Dataset: {extra}" ) + _check_finite_weights(weights, "weights") total = sum(weights.values()) if abs(total - 1.0) > 0.01: raise ValueError( @@ -151,6 +176,24 @@ def owa( f"order_weights length ({len(order_weights)}) must match " f"number of criteria ({n})" ) + # Reject NaN/Inf entries before the sum check, since + # ``abs(NaN - 1.0) > 0.01`` is False and would let the bad value through. + bad_ow = [] + for i, value in enumerate(order_weights): + try: + v = float(value) + except (TypeError, ValueError): + raise ValueError( + f"order_weights[{i}] must be a real number, got {value!r}" + ) + if not np.isfinite(v): + bad_ow.append((i, v)) + if bad_ow: + details = ", ".join(f"[{i}]={v}" for i, v in bad_ow) + raise ValueError( + f"order_weights must be finite; got non-finite value(s): " + f"{details}" + ) ow_sum = sum(order_weights) if abs(ow_sum - 1.0) > 0.01: raise ValueError( diff --git a/xrspatial/mcda/weights.py b/xrspatial/mcda/weights.py index c79dc359..7d8a1540 100644 --- a/xrspatial/mcda/weights.py +++ b/xrspatial/mcda/weights.py @@ -91,14 +91,26 @@ def ahp_weights( f"Self-comparison ({a!r}, {b!r}) is not allowed; " f"diagonal entries are always 1" ) - if val <= 0: + try: + v = float(val) + except (TypeError, ValueError): + raise ValueError( + f"Comparison value must be a real number, got {val!r} " + f"for ({a!r}, {b!r})" + ) + if not np.isfinite(v): + raise ValueError( + f"Comparison value must be finite, got {val} " + f"for ({a!r}, {b!r})" + ) + if v <= 0: raise ValueError( f"Comparison value must be positive, got {val} " f"for ({a!r}, {b!r})" ) i, j = idx[a], idx[b] - matrix[i, j] = val - matrix[j, i] = 1.0 / val + matrix[i, j] = v + matrix[j, i] = 1.0 / v # Principal eigenvector eigenvalues, eigenvectors = np.linalg.eig(matrix) diff --git a/xrspatial/tests/test_mcda.py b/xrspatial/tests/test_mcda.py index ad8bf656..686047c9 100644 --- a/xrspatial/tests/test_mcda.py +++ b/xrspatial/tests/test_mcda.py @@ -1710,3 +1710,121 @@ def test_site_suitability_workflow(self): valid = suitability.values[~water.values & np.isfinite(suitability.values)] assert valid.min() >= 0.0 assert valid.max() <= 1.0 + + +# =========================================================================== +# NaN/Inf weight rejection (#1311) +# =========================================================================== + +class TestNonFiniteWeights1311: + """Reject NaN and Inf weights before they propagate to all-NaN output. + + Without these guards, ``abs(NaN - 1.0) > 0.01`` is False, the + sum-to-1 check passes, and the bad value reaches every pixel. + """ + + def test_wlc_rejects_nan_weight(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("nan")} + with pytest.raises(ValueError, match="finite"): + wlc(criteria_dataset, bad) + + def test_wlc_rejects_inf_weight(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("inf")} + with pytest.raises(ValueError, match="finite"): + wlc(criteria_dataset, bad) + + def test_wlc_rejects_negative_inf_weight(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("-inf")} + with pytest.raises(ValueError, match="finite"): + wlc(criteria_dataset, bad) + + def test_wlc_error_names_offending_criterion(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("nan")} + with pytest.raises(ValueError, match="ndvi"): + wlc(criteria_dataset, bad) + + def test_wpm_rejects_nan_weight(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("nan")} + with pytest.raises(ValueError, match="finite"): + wpm(criteria_dataset, bad) + + def test_wpm_rejects_inf_weight(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("inf")} + with pytest.raises(ValueError, match="finite"): + wpm(criteria_dataset, bad) + + def test_owa_rejects_nan_criterion_weight(self, criteria_dataset): + bad = {"slope": 0.4, "distance": 0.35, "ndvi": float("nan")} + with pytest.raises(ValueError, match="finite"): + owa(criteria_dataset, bad, [0.5, 0.3, 0.2]) + + def test_owa_rejects_nan_order_weight(self, criteria_dataset, weights_3): + bad_order = [0.5, float("nan"), 0.5] + with pytest.raises(ValueError, match="finite"): + owa(criteria_dataset, weights_3, bad_order) + + def test_owa_rejects_inf_order_weight(self, criteria_dataset, weights_3): + bad_order = [0.5, float("inf"), 0.5] + with pytest.raises(ValueError, match="finite"): + owa(criteria_dataset, weights_3, bad_order) + + def test_owa_order_weight_error_names_position( + self, criteria_dataset, weights_3, + ): + bad_order = [0.5, float("nan"), 0.5] + with pytest.raises(ValueError, match=r"\[1\]"): + owa(criteria_dataset, weights_3, bad_order) + + def test_ahp_rejects_nan_comparison(self): + with pytest.raises(ValueError, match="finite"): + ahp_weights( + ["a", "b", "c"], + {("a", "b"): float("nan"), ("a", "c"): 2.0, ("b", "c"): 3.0}, + ) + + def test_ahp_rejects_inf_comparison(self): + with pytest.raises(ValueError, match="finite"): + ahp_weights( + ["a", "b", "c"], + {("a", "b"): float("inf"), ("a", "c"): 2.0, ("b", "c"): 3.0}, + ) + + def test_ahp_rejects_negative_inf_comparison(self): + # -inf is non-finite; the finite check fires before <= 0. + with pytest.raises(ValueError, match="finite"): + ahp_weights( + ["a", "b", "c"], + {("a", "b"): float("-inf"), ("a", "c"): 2.0, + ("b", "c"): 3.0}, + ) + + def test_ahp_still_rejects_zero(self): + # Pre-existing positivity guard must still fire. + with pytest.raises(ValueError, match="positive"): + ahp_weights( + ["a", "b", "c"], + {("a", "b"): 0.0, ("a", "c"): 2.0, ("b", "c"): 3.0}, + ) + + def test_ahp_still_rejects_negative(self): + with pytest.raises(ValueError, match="positive"): + ahp_weights( + ["a", "b", "c"], + {("a", "b"): -1.0, ("a", "c"): 2.0, ("b", "c"): 3.0}, + ) + + def test_wlc_valid_weights_still_work(self, criteria_dataset, weights_3): + # Sanity: the new check does not break the happy path. + result = wlc(criteria_dataset, weights_3) + assert np.all(np.isfinite(result.values)) + + def test_owa_valid_weights_still_work(self, criteria_dataset, weights_3): + result = owa(criteria_dataset, weights_3, [0.5, 0.3, 0.2]) + assert np.all(np.isfinite(result.values)) + + def test_ahp_valid_comparisons_still_work(self): + weights, _ = ahp_weights( + ["a", "b", "c"], + {("a", "b"): 2.0, ("a", "c"): 4.0, ("b", "c"): 2.0}, + ) + assert all(np.isfinite(v) for v in weights.values())