A Python implementation of the Spatial Point Pattern Test (SPPT) for aggregated count data. Uses bootstrap resampling to compare spatial distributions between variables and calculates S-Index metrics to quantify spatial pattern overlap.
Based on the original R package
sppt.aggregated.databy Martin A. Andresen. This Python port faithfully reimplements the statistical methods, algorithms, and outputs of the R version.A detailed discussion of the spatial point pattern test is available in an open access journal:
Andresen, M.A. (2016). An area-based nonparametric spatial point pattern test: the test, its applications, and the future. Methodological Innovations, 9, Article 12. DOI: 10.1177/2059799116630659
- Bootstrap resampling with sparse-matrix acceleration (
scipy.sparse+numpy) - S-Index & Robust S-Index for quantifying spatial pattern overlap
- Bivariate comparison (base vs. test variable) with directional change detection
- Percentage or count mode — compare spatial distributions or absolute values
- Fixed base option — bootstrap only the test variable when the base is known
- Automatic choropleth maps via
matplotlib+geopandas - Multiple export formats — Shapefile, GeoPackage, CSV, TXT, Pickle
- Google Colab compatible — works out of the box in cloud notebooks
pip install spptFor development:
git clone https://github.com/yunusserhat/sppt-python.git
cd sppt-python
pip install -e ".[dev]"import geopandas as gpd
from sppt import sppt
# Load spatial data
data = gpd.read_file("your_data.shp")
# Compare two variables across spatial units
result = sppt(
data=data,
group_col="DAUID", # spatial unit identifier
count_col=["Crime_2020", "Crime_2021"], # [base, test]
B=200, # bootstrap samples
check_overlap=True, # compute S-Index
seed=42, # reproducibility
)
# Access results
print(result.s_index) # e.g. 0.7380
print(result.robust_s_index) # e.g. 0.7289
print(result.data.head()) # DataFrame with CI bounds + overlap columns- Expand aggregated counts to individual events (uncount)
- Build a sparse one-hot matrix (n × G) for group membership
- Draw B multinomial bootstrap samples
- Aggregate via matrix multiply:
group_counts = onehot.T @ W - Convert to percentages (optional) and extract quantile-based confidence intervals
- Compare intervals between variables to detect significant spatial changes
| S-Index | Meaning |
|---|---|
| 1.0 | Perfect overlap — no spatial pattern change |
| 0.5 | Half the areas show significant change |
| 0.0 | Complete spatial difference |
The Robust S-Index excludes spatial units where all variables are zero.
| Value | Meaning |
|---|---|
| -1 | Base > Test (decline) |
| 0 | No significant difference |
| +1 | Test > Base (increase) |
| Parameter | Default | Description |
|---|---|---|
data |
— | GeoDataFrame or DataFrame with count data |
group_col |
"group" |
Column identifying spatial units |
count_col |
— | Column name(s) with counts. Pass ["base", "test"] for bivariate |
B |
200 |
Number of bootstrap samples |
seed |
None |
Random seed for reproducibility |
conf_level |
0.95 |
Confidence level for intervals |
check_overlap |
False |
Compute overlap + S-Index statistics |
fix_base |
False |
Skip bootstrapping the base (first) variable |
use_percentages |
True |
Compare spatial distributions (%) vs. raw counts |
create_maps |
True |
Generate choropleth map for bivariate case |
export_maps |
False |
Save map to disk |
export_dir |
None |
Directory for map export |
map_dpi |
300 |
Resolution for exported maps |
export_results |
False |
Save results to disk |
export_format |
"shp" |
Format: "shp", "gpkg", "csv", "txt", "pickle" |
export_results_dir |
None |
Directory for results export |
import geopandas as gpd
from sppt import sppt
data = gpd.read_file("Vancouver_DAs_Crime_2021.shp")
data = data.to_crs(epsg=26910)
result = sppt(
data=data,
group_col="DAUID",
count_col=["TFV", "TOV"], # Total Family Violence vs Total Other Violence
B=200,
check_overlap=True,
create_maps=True,
seed=171717,
)Output:
========================================
Spatial Pattern Overlap Statistics
Using: Percentages (spatial distribution)
========================================
S-Index: 0.7380
Robust S-Index: 0.7289
----------------------------------------
Total observations: 1019
Observations with overlap: 752
Observations with non-zero counts: 985
========================================
result = sppt(
data=data,
group_col="DAUID",
count_col=["Census_Official", "Survey_Estimate"],
B=200,
fix_base=True, # don't bootstrap the census data
check_overlap=True,
seed=42,
)result = sppt(
data=data,
group_col="DAUID",
count_col=["Crime_2020", "Crime_2021"],
B=200,
use_percentages=False, # compare absolute counts
check_overlap=True,
seed=42,
)result = sppt(
data=data,
group_col="DAUID",
count_col=["TFV", "TOV"],
B=500,
check_overlap=True,
export_results=True,
export_format="gpkg", # GeoPackage
export_results_dir="output/",
export_maps=True,
export_dir="output/maps/",
map_dpi=600, # publication quality
seed=171717,
)| Notebook | Description | Colab |
|---|---|---|
| Quickstart | Basic usage with Vancouver crime data | |
| Advanced Examples | All modes, export, publication maps |
The package includes the Vancouver Dissemination Areas Crime 2021 dataset (1,019 polygons) for testing:
from sppt import load_sample_data
data = load_sample_data()
print(data.columns)
# ['DAUID', 'DGUID', 'LANDAREA', 'PRUID', 'BNEC', 'BNER',
# 'MISCHIEF', 'TFV', 'THEFT', 'TOB', 'TOV', 'geometry']After running sppt(), your data gains these columns:
| Column | Description |
|---|---|
{var}_L |
Lower bound of confidence interval |
{var}_U |
Upper bound of confidence interval |
intervals_overlap |
1 if CIs overlap, 0 otherwise |
SIndex_Bivariate |
-1 (base > test), 0 (overlap), 1 (test > base) |
If you use this package in your research, please cite both the Python package and the original R implementation:
@software{bicakci2026sppt,
author = {Bıçakçı, Yunus Serhat},
title = {sppt: Spatial Point Pattern Test for Aggregated Data (Python)},
year = {2026},
url = {https://github.com/yunusserhat/sppt-python},
doi = {10.5281/zenodo.18813433},
note = {Python implementation based on the R package by Martin A. Andresen}
}
@software{andresen2025sppt,
author = {Andresen, Martin A.},
title = {sppt.aggregated.data: Spatial Point Pattern Test for Aggregated Data (R)},
year = {2025},
url = {https://github.com/martin-a-andresen/sppt.aggregated.data}
}
@article{andresen2016area,
author = {Andresen, Martin A.},
title = {An area-based nonparametric spatial point pattern test: the test, its applications, and the future},
journal = {Methodological Innovations},
volume = {9},
pages = {Article 12},
year = {2016},
doi = {10.1177/2059799116630659}
}This package is a faithful Python reimplementation of the R package sppt.aggregated.data created by Martin A. Andresen. The statistical methodology, bootstrap algorithm, S-Index calculations, and output structure are directly based on his original work.