diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv
index 88aee018..2a6a4bc5 100644
--- a/.claude/sweep-security-state.csv
+++ b/.claude/sweep-security-state.csv
@@ -22,6 +22,7 @@ geotiff,2026-04-17,1215,HIGH,1;4,1219;1220,Follow-up #1220: GPU predictor=2 kern
 glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
 hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64."
 hydro,2026-04-17,,MEDIUM,1;3;6,,
+kde,2026-04-27,1287,HIGH,1,,"HIGH (fixed #1287): kde() and line_density() accepted user-controlled width/height with no upper bound. The eager numpy and cupy backends allocated np.zeros((height, width), dtype=float64) (or cupy.zeros) up front (kde.py: _run_kde_numpy line 308, _run_kde_cupy line 314, line_density inline at line 706). width=1_000_000, height=1_000_000 requested ~8 TB of float64 (or VRAM on the GPU path) before any check ran. Fixed by adding local _available_memory_bytes() helper (mirrors convolution/morphology/bump pattern) and _check_grid_memory(rows, cols) that raises MemoryError when rows*cols*8 exceeds 50% of available RAM. Wired into kde() (skipped for dask paths since _run_kde_dask_numpy/_run_kde_dask_cupy build per-tile via da.from_delayed and are bounded by chunk size) and line_density() (single numpy backend, always guarded). Error message names width/height so the caller knows which knob to turn. No other HIGH findings: Cat 2 (no int32 flat-index math, numba range loops are int64), Cat 3 (bandwidth <= 0 rejected, Silverman fallback returns 1.0 when sigma==0, NaN coords clamp to empty range via min/max), Cat 4 (_kde_cuda has 'if r >= rows or c >= cols: return' bounds guard at line 254, no shared memory, each thread writes own pixel), Cat 5 (no file I/O), Cat 6 (template only used for shape/coords, output dtype forced to float64). MEDIUM (unfixed, Cat 6): _validate_template only checks DataArray + ndim; does not call _validate_raster, but template dtype does not affect compute correctness here."
 mahalanobis,2026-04-27,1288,HIGH,1,,"HIGH (fixed #1288): mahalanobis() had no memory guard. Both _compute_stats_numpy/_compute_stats_cupy and _mahalanobis_pixel_numpy/_mahalanobis_pixel_cupy materialise float64 buffers of shape (n_bands, H*W) -- the np.stack at line 45/80, the reshape+transpose at line 184 (which forces a contiguous BLAS copy), the centered diff, and the diff @ inv_cov result are all live at peak. A 100kx100k 5-band raster projected to ~400 GB of host memory just for the stack. Fixed by adding _available_memory_bytes()/_available_gpu_memory_bytes() (mirroring cost_distance.py:261-292) plus _check_memory/_check_gpu_memory at 32 bytes/cell/band budget, and wiring them into the public mahalanobis() entry point before any np.stack runs. Eager paths (numpy, cupy) are guarded; dask paths skip the check because chunks are bounded by user-supplied chunksize. MEDIUM (unfixed, Cat 6): mahalanobis() does not call _validate_raster on each band -- validate_arrays only enforces matching shape and array-type, so boolean / non-numeric DataArrays silently coerce. Deferred to a separate PR per the security-sweep one-fix-per-PR policy. No other HIGH findings: Cat 2 (no int32 indexing, numpy default int64), Cat 3 (singular covariance raises a clean ValueError, dist_sq is clamped to 0 before sqrt to absorb numerical noise, NaN mask propagates correctly), Cat 4 (no CUDA kernels), Cat 5 (no file I/O beyond /proc/meminfo)."
 morphology,2026-04-24,1256,HIGH,1,,"HIGH (fixed #1256): morph_erode/morph_dilate/morph_opening/morph_closing/morph_gradient/morph_white_tophat/morph_black_tophat accepted a user-supplied kernel with only shape/dtype/odd-size validation. Kernel dimensions drove np.pad/cp.pad on every backend and map_overlap depth on dask paths; a 99999x99999 kernel on a 1000x1000 raster would try to allocate ~80 GB of padded float64 memory with no warning. Fixed by adding local _available_memory_bytes() helper and _check_kernel_memory(rows, cols, ky, kx) that raises MemoryError before allocation when padded size exceeds 50% of available RAM; wired into _dispatch() so every public API entry point is guarded across all four backends. Mirrors bilateral #1236, convolution #1241, bump #1231. No other HIGH findings: Cat 2 (loop indices are Python ints, numba promotes to int64), Cat 3 (NaN propagation explicit via v!=v in both numpy and CUDA paths, tests verify), Cat 4 (GPU kernels _erode_gpu/_dilate_gpu have if i<rows and j<cols bounds guards, no shared memory), Cat 5 (no file I/O), Cat 6 (_validate_raster called in _dispatch, all backends cast to float64 before kernel)."
 multispectral,2026-04-27,1291,HIGH,1,1293,"HIGH (fixed PR #1292): true_color() stacked three same-shape bands into an (H, W, 4) RGBA float64 cube on numpy/cupy backends with no memory check; a 100k x 100k true-color call would request ~320 GB before any error. Fixed by adding _available_memory_bytes / _available_gpu_memory_bytes helpers and _check_true_color_memory / _check_true_color_gpu_memory budget checks (24 bytes/pixel, 50% of available RAM/VRAM threshold) wired into _true_color_numpy and _true_color_cupy; mirrors the dasymetric/cost_distance/diffusion pattern. Dask paths skipped because they build the cube lazily. 151/151 tests pass including 4 new memory-guard tests. Other findings clean: 10 CUDA kernels all have bounds guards (per-pixel index math, no stencil); every per-index public function (NDVI/EVI/SAVI/ARVI/GCI/NDMI/NBR/NBR2) calls _validate_raster on each band and validate_arrays for shape match; division denominators in normalized-difference indices are guarded by NaN propagation; no int32 overflow paths; no file I/O. MEDIUM follow-up #1293 (Cat 6): true_color() does not call validate_arrays(r, g, b) to enforce equal band shapes -- separate PR per the one-fix-per-security-PR policy."
@@ -32,6 +33,6 @@ polygon_clip,2026-04-27,,,,,"Clean. Module is a raster mask-and-clip wrapper --
 proximity,2026-04-22,,,,,"Clean. Public APIs (proximity/allocation/direction) all call _validate_raster. GPU kernel _proximity_cuda_kernel has bounds guard at lines 359-360. Dask KDTree path has explicit memory guards (lines 897-903 result array, 1297-1312 unbounded distance fallback, 681-682 cache budget). Index math uses np.int64 for pan_near_x/pan_near_y, target_counts, y_offsets/x_offsets -- no int32 overflow risk. Target detection filters NaN via np.isfinite (lines 533, 657). _calc_direction guards x1==x2 & y1==y2 before arctan2. No file I/O. LOW (not flagged): line 1235 pad_y/pad_x omit abs() while line 437 uses it -- minor inconsistency, not exploitable."
 rasterize,2026-04-21,1223,HIGH,1;2,,HIGH: unbounded out/written allocation in _run_numpy/_run_cupy driven by user-supplied width/height/resolution (no cap). MEDIUM (unfixed): _build_row_csr_numba total=row_ptr[height] is int32 and can wrap for very tall rasters with many long edges.
 reproject,2026-04-17,,MEDIUM,1;3,,
+sieve,2026-04-28,1296,HIGH,1,,"HIGH (fixed #1296): sieve() on numpy and cupy backends had no memory guard. _label_connected allocates parent (int32, 4B/px), rank (int32, 4B/px, reused as root_to_id), region_map_flat (int32, 4B/px), plus a float64 result copy (8B/px) ~ 20 B/pixel of working memory before any check. The dask paths (_sieve_dask line 343 and _sieve_dask_cupy line 366) already raised MemoryError via _available_memory_bytes() at 28 B/pixel budget, but the public sieve() API at line 489 dispatched np.ndarray inputs straight into _sieve_numpy with no guard, and _sieve_cupy at line 308 transferred to host via data.get() then called _sieve_numpy, inheriting the gap. A 50000x50000 numpy raster requested ~50 GB silently. Fixed by extracting _check_memory(rows, cols) and _check_gpu_memory(rows, cols) helpers (mirrors cost_distance #1262 / mahalanobis #1288 / multispectral #1291 / kde #1287 pattern) at 28 B/pixel host budget plus 16 B/pixel GPU round-trip budget at 50% of available memory threshold. _check_memory wired into _sieve_numpy at the top before the float64 copy. _check_gpu_memory wired into _sieve_cupy before data.get(); it also calls _check_memory so the host budget still applies. Consolidated _available_memory_bytes definition (was duplicated). All 47 tests pass including 2 new memory-guard tests for the numpy backend (_sieve_numpy direct call + public sieve() API). No other findings: Cat 2 int32 indexing in _label_connected docstring acknowledges <2.1B pixel limit; the new memory guard rejects rasters that large before the int32 issue can trigger so this is a documentation/clarity follow-up rather than an exploitable bug. Cat 3 NaN handled via valid mask; Cat 4 no CUDA kernels; Cat 5 only /proc/meminfo read; Cat 6 _validate_raster called at line 478."
 viewshed,2026-04-22,1229,HIGH,1,,"HIGH (fixed #1229): _viewshed_cpu allocated ~500 bytes/pixel of working memory (event_list 3*H*W*7*8 bytes + status_values/status_struct/idle + visibility_grid + lexsort temporary) with no guard. A 20000x20000 raster tried to allocate ~200 GB. Fixed by adding peak-memory guard mirroring the _viewshed_dask pattern (_available_memory_bytes() check, raises MemoryError with max_distance= hint). No other HIGH findings: dask path already guarded, _validate_raster is called, distance-sweep uses dtype=float64, _calc_dist_n_grad guards zero distance."
 zonal,2026-04-22,1227,HIGH,1;2;6,,"HIGH (fixed #1227): _stats_cupy used `if nodata_values:` (truthy) so nodata_values=0 silently skipped the filter on the cupy backend, producing wrong stats vs every other backend. MEDIUM (unfixed): _strides uses np.int32 for stride indices — can wrap for arrays > ~2B elements in the numpy path. MEDIUM (unfixed): hypsometric_integral() skips _validate_raster on zones/values; _regions_numpy has no memory guard (numpy-only path, bounded by caller-allocated input). MEDIUM (unfixed): _stats_numpy return_type='xarray.DataArray' allocates np.full((n_stats, values.size)) with no guard."
-kde,2026-04-27,1287,HIGH,1,,"HIGH (fixed #1287): kde() and line_density() accepted user-controlled width/height with no upper bound. The eager numpy and cupy backends allocated np.zeros((height, width), dtype=float64) (or cupy.zeros) up front (kde.py: _run_kde_numpy line 308, _run_kde_cupy line 314, line_density inline at line 706). width=1_000_000, height=1_000_000 requested ~8 TB of float64 (or VRAM on the GPU path) before any check ran. Fixed by adding local _available_memory_bytes() helper (mirrors convolution/morphology/bump pattern) and _check_grid_memory(rows, cols) that raises MemoryError when rows*cols*8 exceeds 50% of available RAM. Wired into kde() (skipped for dask paths since _run_kde_dask_numpy/_run_kde_dask_cupy build per-tile via da.from_delayed and are bounded by chunk size) and line_density() (single numpy backend, always guarded). Error message names width/height so the caller knows which knob to turn. No other HIGH findings: Cat 2 (no int32 flat-index math, numba range loops are int64), Cat 3 (bandwidth <= 0 rejected, Silverman fallback returns 1.0 when sigma==0, NaN coords clamp to empty range via min/max), Cat 4 (_kde_cuda has 'if r >= rows or c >= cols: return' bounds guard at line 254, no shared memory, each thread writes own pixel), Cat 5 (no file I/O), Cat 6 (template only used for shape/coords, output dtype forced to float64). MEDIUM (unfixed, Cat 6): _validate_template only checks DataArray + ndim; does not call _validate_raster, but template dtype does not affect compute correctness here."
diff --git a/xrspatial/sieve.py b/xrspatial/sieve.py
index ff35a226..0f34a0a1 100644
--- a/xrspatial/sieve.py
+++ b/xrspatial/sieve.py
@@ -223,6 +223,93 @@ def _collect(a, b):
     return adjacency
 
 
+# ---------------------------------------------------------------------------
+# Memory guards
+# ---------------------------------------------------------------------------
+
+# Peak working set for the union-find pass:
+#   result copy             8 bytes (float64)
+#   parent                  4 bytes (int32)
+#   rank / root_to_id       4 bytes (int32, reused)
+#   region_map_flat         4 bytes (int32)
+#   slack for region_val,
+#   region_size, valid mask 8 bytes
+# Total ~28 bytes/pixel.  Matches the budget the dask paths already use.
+_BYTES_PER_PIXEL = 28
+
+
+def _available_memory_bytes():
+    """Best-effort estimate of available memory in bytes."""
+    try:
+        with open("/proc/meminfo", "r") as f:
+            for line in f:
+                if line.startswith("MemAvailable:"):
+                    return int(line.split()[1]) * 1024
+    except (OSError, ValueError, IndexError):
+        pass
+    try:
+        import psutil
+
+        return psutil.virtual_memory().available
+    except (ImportError, AttributeError):
+        pass
+    return 2 * 1024**3
+
+
+def _available_gpu_memory_bytes():
+    """Best-effort estimate of free GPU memory in bytes.
+
+    Returns 0 when CuPy / CUDA is unavailable or the query fails -- callers
+    treat that as "no GPU info, skip the guard".
+    """
+    try:
+        import cupy as _cp
+
+        free, _total = _cp.cuda.runtime.memGetInfo()
+        return int(free)
+    except Exception:
+        return 0
+
+
+def _check_memory(rows, cols):
+    """Raise MemoryError if the union-find pass would exceed 50% of RAM."""
+    required = int(rows) * int(cols) * _BYTES_PER_PIXEL
+    available = _available_memory_bytes()
+    if required > 0.5 * available:
+        raise MemoryError(
+            f"sieve() on a {rows}x{cols} raster needs "
+            f"~{required / 1e9:.1f} GB of working memory but only "
+            f"~{available / 1e9:.1f} GB is available.  "
+            f"Connected-component labeling is a global operation that "
+            f"cannot be chunked.  Consider downsampling or tiling the "
+            f"input manually."
+        )
+
+
+def _check_gpu_memory(rows, cols):
+    """Raise MemoryError when the CuPy round-trip would not fit.
+
+    The CuPy backend transfers to host and runs the CPU sieve, so the
+    host budget still applies; we also check free GPU RAM so a user
+    with little VRAM gets a clear error before ``data.get()`` runs.
+    Skips silently when the GPU memory query fails.
+    """
+    _check_memory(rows, cols)
+    available = _available_gpu_memory_bytes()
+    if available <= 0:
+        return
+    # Round-trip needs the float64 input on device plus a float64 result.
+    required = int(rows) * int(cols) * 16
+    if required > 0.5 * available:
+        raise MemoryError(
+            f"sieve() on a {rows}x{cols} cupy raster needs "
+            f"~{required / 1e9:.1f} GB of GPU memory for the round-trip "
+            f"but only ~{available / 1e9:.1f} GB is free on the active "
+            f"device.  Use a dask+cupy DataArray for out-of-core "
+            f"processing or downsample the input."
+        )
+
+
 # ---------------------------------------------------------------------------
 # numpy backend
 # ---------------------------------------------------------------------------
@@ -237,6 +324,7 @@ def _sieve_numpy(data, threshold, neighborhood, skip_values):
     so that earlier merges can grow a neighbor above threshold for
     later ones within the same pass.
     """
+    _check_memory(data.shape[0], data.shape[1])
     result = data.astype(np.float64, copy=True)
     is_float = np.issubdtype(data.dtype, np.floating)
     valid = ~np.isnan(result) if is_float else np.ones(result.shape, dtype=bool)
@@ -309,6 +397,7 @@ def _sieve_cupy(data, threshold, neighborhood, skip_values):
     """CuPy backend: transfer to CPU, sieve, transfer back."""
     import cupy as cp
 
+    _check_gpu_memory(data.shape[0], data.shape[1])
     np_result = _sieve_numpy(
         data.get(), threshold, neighborhood, skip_values
     )
@@ -320,24 +409,6 @@ def _sieve_cupy(data, threshold, neighborhood, skip_values):
 # ---------------------------------------------------------------------------
 
 
-def _available_memory_bytes():
-    """Best-effort estimate of available memory in bytes."""
-    try:
-        with open("/proc/meminfo", "r") as f:
-            for line in f:
-                if line.startswith("MemAvailable:"):
-                    return int(line.split()[1]) * 1024
-    except (OSError, ValueError, IndexError):
-        pass
-    try:
-        import psutil
-
-        return psutil.virtual_memory().available
-    except (ImportError, AttributeError):
-        pass
-    return 2 * 1024**3
-
-
 def _sieve_dask(data, threshold, neighborhood, skip_values):
     """Dask+numpy backend: compute to numpy, sieve, wrap back."""
     avail = _available_memory_bytes()
diff --git a/xrspatial/tests/test_sieve.py b/xrspatial/tests/test_sieve.py
index 1d56a1cc..efb69da9 100644
--- a/xrspatial/tests/test_sieve.py
+++ b/xrspatial/tests/test_sieve.py
@@ -396,6 +396,36 @@ def test_sieve_dask_memory_guard():
             _sieve_dask(huge, 10, 4, None)
 
 
+def test_sieve_numpy_memory_guard():
+    """Numpy backend raises MemoryError before allocating CCL buffers."""
+    from unittest.mock import patch
+
+    from xrspatial.sieve import _sieve_numpy
+
+    # Use a tiny array but mock available memory to 1 byte so the
+    # guard fires regardless of host RAM.
+    arr = np.zeros((4, 4), dtype=np.float64)
+    with patch("xrspatial.sieve._available_memory_bytes", return_value=1):
+        with pytest.raises(MemoryError, match="working memory"):
+            _sieve_numpy(arr, 10, 4, None)
+
+
+def test_sieve_numpy_memory_guard_via_public_api():
+    """The public sieve() entry point trips the numpy guard."""
+    from unittest.mock import patch
+
+    from xrspatial.sieve import sieve as _sieve
+
+    raster = xr.DataArray(
+        np.zeros((4, 4), dtype=np.float64), dims=["y", "x"]
+    )
+
+    # Mock available memory to 1 byte so even a 4x4 raster trips it.
+    with patch("xrspatial.sieve._available_memory_bytes", return_value=1):
+        with pytest.raises(MemoryError, match="working memory"):
+            _sieve(raster, threshold=2)
+
+
 # ---------------------------------------------------------------------------
 # Numpy / dask consistency
 # ---------------------------------------------------------------------------