Skip to content

Add kernel density estimation (KDE)#1170

Merged
brendancol merged 7 commits into
masterfrom
issue-1143
Apr 6, 2026
Merged

Add kernel density estimation (KDE)#1170
brendancol merged 7 commits into
masterfrom
issue-1143

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

The repo has rasterize() for burning discrete values onto a grid but nothing for continuous density surfaces. This adds that.

Closes #1143

Summary

  • kde() turns point coordinates into a density raster. Gaussian, Epanechnikov, and quartic kernels. Automatic bandwidth (Silverman's rule) or manual. Optional per-point weights. All four backends: numpy, cupy, dask+numpy, dask+cupy.
  • line_density() does the same for line segments. numpy only for now.
  • 35 tests: correctness, edge cases, kernel types, bandwidth, weights, cross-backend parity (dask, cupy).
  • User guide notebook (examples/user_guide/49_KDE.ipynb) with earthquake cluster and road network examples.
  • Docs: docs/source/reference/kde.rst added, README feature matrix updated.

Test plan

  • pytest xrspatial/tests/test_kde.py -- 35/35 passing
  • Notebook executes via jupyter nbconvert --execute
  • Verify docs build with make html

_extract_transect was calling .compute() on the full dask array just to
read a handful of transect cells. Now uses vindex fancy indexing so only
the relevant chunks are materialized.

cumulative_viewshed was allocating a full-size np.zeros count array and
calling .values on each viewshed result, forcing materialization every
iteration. Now accumulates lazily with da.zeros and dask array addition
when the input is dask-backed.
The dask Tier B memory guard underestimated peak usage at 280 bytes/pixel.
Actual peak during lexsort reaches ~360 bytes/pixel (sorted + unsorted
event_list coexist) plus 8 bytes/pixel for the computed raster. Updated
estimate to 368 bytes/pixel to prevent borderline OOM.

Also use astype(copy=False) to skip the float64 copy when data is already
float64.
Implements kde() and line_density() for point-to-raster and
line-to-raster density surfaces.  Supports Gaussian, Epanechnikov,
and quartic kernels with automatic bandwidth selection via
Silverman's rule.  All four backends: numpy, cupy, dask+numpy,
dask+cupy.
35 tests covering correctness, edge cases, kernel types,
bandwidth selection, weights, and cross-backend parity
(dask+numpy, cupy). Removes hard cutoff from GPU Gaussian
kernel to avoid box-vs-circle mismatch with CPU.
Creates docs/source/reference/kde.rst with autosummary entries
for kde() and line_density(). Adds both functions to __init__.py
and the docs toctree.
Covers Gaussian/Epanechnikov/quartic kernels, bandwidth effects,
weighted KDE, and line density with synthetic earthquake and
road network data.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Apr 6, 2026
@brendancol brendancol merged commit 46ee269 into master Apr 6, 2026
11 checks passed
@brendancol brendancol deleted the issue-1143 branch May 4, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add kernel density estimation (KDE) for point-to-raster conversion

1 participant