After classification, raster outputs often have salt-and-pepper noise: tiny clumps of 1-3 pixels that don't represent real features. GDAL provides gdal_sieve.py for this, but there's no equivalent in xarray-spatial.
Scope
Given a categorical/classified raster and a minimum pixel count threshold, replace clumps smaller than the threshold with the value of their largest neighbor.
- Connectivity: 4-connected or 8-connected (user's choice).
- Build on
regions(), which already does connected component labeling. The sieve would label regions, measure their size, and merge the small ones.
- Selective sieving: option to specify which values to sieve and which to skip (e.g., nodata, specific classes that should never be merged).
Why this matters
This pairs naturally with classification functions (natural_breaks(), reclassify(), etc.) and with polygonize() to clean up results before vectorization. Right now users either pull in GDAL for a one-line sieve call or write their own loop, neither of which fits well into a pure xarray-spatial workflow.
Backend support
All four backends: numpy, cupy, dask+numpy, dask+cupy. The dask path needs care at chunk boundaries since clumps can span chunks. Likely requires either a multi-pass approach or halo/overlap handling to stitch cross-boundary regions correctly.
After classification, raster outputs often have salt-and-pepper noise: tiny clumps of 1-3 pixels that don't represent real features. GDAL provides
gdal_sieve.pyfor this, but there's no equivalent in xarray-spatial.Scope
Given a categorical/classified raster and a minimum pixel count threshold, replace clumps smaller than the threshold with the value of their largest neighbor.
regions(), which already does connected component labeling. The sieve would label regions, measure their size, and merge the small ones.Why this matters
This pairs naturally with classification functions (
natural_breaks(),reclassify(), etc.) and withpolygonize()to clean up results before vectorization. Right now users either pull in GDAL for a one-line sieve call or write their own loop, neither of which fits well into a pure xarray-spatial workflow.Backend support
All four backends: numpy, cupy, dask+numpy, dask+cupy. The dask path needs care at chunk boundaries since clumps can span chunks. Likely requires either a multi-pass approach or halo/overlap handling to stitch cross-boundary regions correctly.