From 3316db1d41ffa7e2adf46f66faa015fea5460da4 Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Thu, 23 Apr 2026 13:15:56 -0700 Subject: [PATCH] Add security sweep state for classify module --- .claude/sweep-security-state.json | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/.claude/sweep-security-state.json b/.claude/sweep-security-state.json index 9a184549..c7d70c08 100644 --- a/.claude/sweep-security-state.json +++ b/.claude/sweep-security-state.json @@ -89,6 +89,14 @@ "severity_max": null, "categories_found": [], "notes": "Clean. Cat 1: memory guard at lines 311-326 uses _available_memory_bytes() and raises MemoryError when total_estimate (array_bytes * (n_sources + 3)) exceeds 0.8 * avail BEFORE computing any cost surface. Trivial n_sources==0/1 paths only allocate arrays matching input size. Cat 2: np.prod(raster.shape) returns int64, no overflow. Cat 3: divisions by target_weight (lines 373, 380) are guarded by total==0 break (364) and target_weight>0 check (379); fric_weight strips NaN via np.where(np.isfinite & >0). Cat 4: no CUDA kernels. Cat 5: no file I/O. Cat 6: _validate_raster called on both raster and friction (lines 275-277)." + }, + "classify": { + "last_inspected": "2026-04-23", + "issue": null, + "followup_issues": [1244, 1246], + "severity_max": "MEDIUM", + "categories_found": [1, 3], + "notes": "No CRITICAL/HIGH findings. All 10 public APIs call _validate_raster; natural_breaks/quantile/maximum_breaks also call _validate_scalar(k). Both CUDA kernels (_run_gpu_binary at line 72, _run_gpu_bin at line 241) have bounds guards. Dask paths use seeded sampling via _generate_sample_indices rather than materializing full arrays. MEDIUM (fixed #1244, Cat 3): equal_interval raised ZeroDivisionError when max==min and ValueError on all-NaN input (_run_equal_interval.np.arange with width=0 or NaN). Fixed by collapsing the degenerate case to a single bin that maps finite pixels to class 0. MEDIUM (fixed #1246, Cat 1): _compute_natural_break_bins passed the full raster to _run_jenks when num_sample=None on numpy/cupy backends; _run_numpy_jenks_matrices then allocated two (n_data+1, n_classes+1) float64 matrices (~9.6 GB per matrix pair for a 10kx10k k=5 call). Fixed by adding an _available_memory_bytes() guard that raises MemoryError when the matrices would exceed 50% of available memory. LOW (unfixed): _cpu_bin has `elif val > bins[mid - 1]` which would index bins[-1] at mid=0 via numba wraparound, but the outer val <= bins[0] / val <= bins[nbins-1] guards mean mid=0 cannot reach that branch in valid execution." } } }