A comprehensive geospatial analysis pipeline for modeling electricity supply networks, projecting future energy demand, and identifying optimal locations for renewable energy infrastructure across 189+ countries.
- Overview
- Features
- Project Structure
- Installation
- Quick Start
- Configuration
- Data Requirements
- Scripts Reference
- Workflow Guide
- Output Formats
- High-Performance Computing (HPC)
- Troubleshooting
- Citation
This project performs country-level analysis of electricity supply and demand networks by:
- Integrating global datasets — Power plant locations (Global Energy Monitor), electricity statistics (Ember), population distributions (JRC GHS-POP), and grid infrastructure (GridFinder)
- Projecting future scenarios — 2030 and 2050 energy demand based on IEA World Energy Outlook and UN population projections
- Modeling supply networks — Network graph analysis to match power generation facilities with population demand centers
- Identifying underserved areas — Siting analysis for remote settlements requiring new infrastructure
- Assessing climate impacts — CMIP6-based projections of solar PV output, wind power density, and hydropower runoff changes
- Multi-scale analysis: From global aggregation to individual settlement resolution (~9km grid cells)
- Multiple scenarios: Configurable supply factors (60%–100%) and target years (2030, 2050)
- Energy type differentiation: Solar, Wind, Hydro, Other Renewables, Nuclear, Fossil
- Parallel processing: Automatic CPU detection with HPC cluster support (SLURM)
- Maritime support: Includes offshore facilities using EEZ boundaries
- Climate projections: CMIP6 ensemble-mean projections for solar, wind, and hydro with uncertainty quantification
- Publication-ready outputs: GeoPackage and Parquet formats for GIS visualization and analysis
├── config.py # Central configuration parameters
│
├── # ═══ Data Preparation Scripts ═══
├── p1_a_ember_gem_2024.py # Harmonize Ember + Global Energy Monitor data
├── p1_b_ember_2024_30_50.py # Project 2030/2050 energy scenarios
├── p1_c_prep_landcover.py # Download ESA CCI Land Cover 2022 from CDS
├── p1_d_viable_solar.py # CMIP6 solar projections + viability filter
├── p1_e_viable_wind.py # CMIP6 wind projections + viability filter
├── p1_f_utils_hydro.py # Shared utilities for hydro processing
├── p1_f_viable_hydro.py # ERA5-Land/CMIP6 runoff + RiverATLAS
│
├── # ═══ Core Analysis Scripts ═══
├── process_country_supply.py # Main supply-demand network analysis
├── process_country_siting.py # Remote settlement siting analysis
├── generate_hpc_scripts.py # Generate country list and HPC scripts
│
├── # ═══ Results Processing ═══
├── combine_one_results.py # Combine single country to GeoPackage + clip TIFs
├── combine_global_results.py # Combine all countries to global GeoPackage
├── p1_y_results_data_etl.py # Exposure analysis ETL pipeline
│
├── # ═══ Figure Generation ═══
├── p1_z_fig12.py # Figures 1-2: Global energy exposure
├── p1_z_fig34.py # Figures 3-4: Exposure by type/year
├── p1_z_fig56.py # Figures 5-6: Detailed exposure
├── p1_z_fig7.py # Figure 7: Sensitivity analysis
├── p1_z_fig8.py # Figure 8: Hazard-specific breakdown
│
├── # ═══ HPC Execution Scripts ═══
├── submit_all_parallel.sh # Submit all supply analysis jobs
├── submit_all_parallel_siting.sh # Submit all siting analysis jobs
├── submit_one.sh # Submit single supply script
├── submit_one_siting.sh # Submit single siting script
├── submit_workflow.sh # Submit results combination job
├── parallel_scripts/ # 40 supply analysis SLURM scripts
├── parallel_scripts_siting/ # 24 siting analysis SLURM scripts
│
├── # ═══ Data Directories ═══
├── bigdata_gadm/ # GADM administrative boundaries
├── bigdata_eez/ # Marine Regions EEZ boundaries
├── bigdata_gridfinder/ # GridFinder electrical grid data
├── bigdata_settlements_jrc/ # JRC GHS-POP population raster
├── bigdata_landcover/ # ESA CCI Land Cover 2022
├── bigdata_solar_pvout/ # Global Solar Atlas baseline
├── bigdata_wind_atlas/ # Global Wind Atlas baseline
├── bigdata_solar_wind_ms/ # Microsoft renewable energy sites
├── bigdata_landcover_cds/ # ESA CCI Land Cover 2022 (downloads/extracted/outputs)
├── bigdata_solar_cmip6/ # CMIP6 solar projections + outputs
│ └── outputs/ # Solar TIFs + viable centroids
├── bigdata_wind_cmip6/ # CMIP6 wind projections + outputs
│ └── outputs/ # Wind TIFs + viable centroids
├── bigdata_hydro_cmip6/ # CMIP6 runoff projections + outputs
│ └── outputs/ # Hydro TIFs + river projections + viable centroids
├── bigdata_hydro_era5_land/ # ERA5-Land runoff data
├── bigdata_hydro_atlas/ # HydroATLAS river datasets
├── data_energy_ember/ # Ember electricity statistics
├── data_energy_projections_iea/ # IEA World Energy Outlook data
├── data_pop_un/ # UN population projections
├── data_country_class_wb/ # World Bank country classifications
│
├── # ═══ Output Directories ═══
├── outputs_per_country/ # Country-level Parquet + GeoPackage outputs
│ └── parquet/{scenario}/ # Parquet files per scenario
├── outputs_global/ # Combined global GeoPackage outputs
├── outputs_processed_data/ # Processed analysis results
└── outputs_processed_fig/ # Generated figures
```---
## Installation
### Prerequisites
- Python 3.11+
- Conda or Mamba package manager
- ~50GB disk space for datasets
- 16GB+ RAM (32GB+ recommended for large countries)
### Environment Setup
```bash
# Clone repository
git clone <repository-url>
cd p1_test
# Create conda environment
conda env create -f environment.yml
conda activate p1_etl
# Verify installation
python -c "import geopandas; import networkx; print('Ready!')"
Key dependencies (full list in environment.yml):
geopandas— Geospatial data handlingnetworkx— Graph-based network analysisrasterio— Raster data processingscikit-learn— K-means clusteringscipy— Minimum spanning tree algorithmspandas,numpy— Data manipulationpyarrow— Parquet I/O
# Activate environment
conda activate p1_etl
# Run supply analysis for Kenya
python process_country_supply.py KEN
# Run siting analysis (after supply completes)
python process_country_siting.py KEN
# Combine results to GeoPackage for visualization
python combine_one_results.py KEN# Process multiple countries sequentially
python process_country_supply.py USA CHN IND
# Combine all completed countries to global dataset
python combine_global_results.py --input-dir outputs_per_country# Generate parallel SLURM scripts
python generate_hpc_scripts.py --create-parallel
# Fix line endings (if prepared on Windows)
sed -i 's/\r$//' submit_all_parallel.sh parallel_scripts/*.sh
chmod +x submit_all_parallel.sh parallel_scripts/*.sh
# Submit all 40 parallel jobs (single scenario: 100%)
./submit_all_parallel.sh
# OR: Submit with ALL scenarios (100%, 90%, 80%, 70%, 60%)
./submit_all_parallel.sh --run-all-scenarios
# OR: Submit with a specific supply factor (e.g., 90% only)
./submit_all_parallel.sh --supply-factor 0.9
# Monitor progress
squeue -u $USER
tail -f outputs_per_country/logs/parallel_*.outUse submit_one.sh and submit_one_siting.sh to submit individual parallel scripts.
Each script contains one or more countries grouped by computational tier.
# List available scripts and see which countries are in each
cat parallel_scripts/submit_parallel_01.sh | grep "Processing"
# Output: Processing 1 countries in this batch: CHN
# Submit a specific supply script by number (single scenario: 100%)
./submit_one.sh 01 # Submit script 01 (CHN)
./submit_one.sh 5 # Leading zero optional
# Submit with all 5 scenarios (100%, 90%, 80%, 70%, 60%)
./submit_one.sh 01 --run-all-scenarios
# Submit with a specific supply factor (e.g., 90% only)
./submit_one.sh 01 --supply-factor 0.9
# Same for siting analysis
./submit_one_siting.sh 03
./submit_one_siting.sh 03 --run-all-scenarios
./submit_one_siting.sh 03 --supply-factor 0.9
# Check which script contains a specific country
grep -l "USA" parallel_scripts/*.sh
# Output: parallel_scripts/submit_parallel_05.shScript-to-Country Mapping (Tier 1-2):
| Script | Countries | Tier | Notes |
|---|---|---|---|
| 01 | CHN | T1 | 168h Long, ouce-cn64 (450GB dedicated) |
| 02 | USA | T2 | 168h Long |
| 03 | IND | T2 | 168h Long |
| 04 | BRA | T2 | 168h Long |
| 05 | DEU | T2 | 168h Long |
| 06+ | Multiple | T3-T5 | Medium/Small countries |
All configurable parameters are centralized in config.py:
When you modify configuration parameters, certain outputs need to be regenerated:
| Change | Scripts to Re-run |
|---|---|
POP_AGGREGATION_FACTOR |
p1_d_viable_solar.py, p1_e_viable_wind.py, p1_f_viable_hydro.py, then all country supply/siting |
SOLAR_PVOUT_THRESHOLD |
p1_d_viable_solar.py --process-only |
WIND_WPD_THRESHOLD |
p1_e_viable_wind.py --process-only |
HYDRO_RUNOFF_THRESHOLD_MM |
p1_f_viable_hydro.py --process-only |
LANDCOVER_VALID_* |
Respective viable script with --process-only |
| Network settings | Country supply analysis only (process_country_supply.py) |
| Siting settings | Country siting analysis only (process_country_siting.py) |
Typical regeneration workflow:
# After modifying viability thresholds:
python p1_d_viable_solar.py --process-only
python p1_e_viable_wind.py --process-only
python p1_f_viable_hydro.py --process-only
# Then re-run country analysis and combine:
python process_country_supply.py KEN
python combine_one_results.py KEN| Parameter | Default | Description |
|---|---|---|
ANALYSIS_YEAR |
2030 | Target year: 2024, 2030, or 2050 |
SUPPLY_FACTOR |
1.0 | Sensitivity multiplier (0.6–1.0) |
COMMON_CRS |
EPSG:4326 | Coordinate reference system |
DEMAND_TYPES |
Solar, Wind, Hydro, Other Renewables, Nuclear, Fossil | Energy categories |
| Parameter | Default | Description |
|---|---|---|
POP_AGGREGATION_FACTOR |
10 | Aggregation factor for population grid |
TARGET_RESOLUTION_ARCSEC |
300 | Final resolution (~9km at equator) |
Note: After changing
POP_AGGREGATION_FACTOR, regenerate resource outputs:python p1_d_viable_solar.py --process-only python p1_e_viable_wind.py --process-only python p1_f_viable_hydro.py --process-only
| Parameter | Default | Description |
|---|---|---|
SOLAR_PVOUT_THRESHOLD |
3.0 | Min PVOUT (kWh/kWp/day) for viable solar |
WIND_WPD_THRESHOLD |
150 | Min WPD at 100m (W/m²) for viable wind |
HYDRO_RUNOFF_THRESHOLD_MM |
100 | Min runoff (mm/year) for viable hydro |
HYDRO_RIVER_BUFFER_M |
5000 | Buffer distance (m) around rivers for hydro siting |
| Parameter | Classes | Description |
|---|---|---|
LANDCOVER_VALID_SOLAR |
10, 20, 30, 40, 130, 150, 200 | Cropland, grassland, sparse veg, bare |
LANDCOVER_VALID_WIND |
10, 20, 30, 40, 130, 150, 200 | Same as solar (open terrain) |
LANDCOVER_VALID_HYDRO |
160, 170, 180, 210 | Flooded areas, water bodies |
| Parameter | Default | Description |
|---|---|---|
GRID_STITCH_DISTANCE_KM |
30 | Threshold for stitching grid segments |
NODE_SNAP_TOLERANCE_M |
100 | Snap tolerance for grid nodes |
MAX_CONNECTION_DISTANCE_M |
50,000 | Max facility-to-grid distance |
FACILITY_SEARCH_RADIUS_KM |
300 | Max facility search radius |
| Parameter | Default | Description |
|---|---|---|
CLUSTER_RADIUS_KM |
50 | K-means clustering radius |
GRID_DISTANCE_THRESHOLD_KM |
50 | Remote vs near-grid classification |
DROP_PERCENTAGE |
0.01 | Filter bottom X% settlements by demand |
| Dataset | Path | Source | Description |
|---|---|---|---|
| GADM Boundaries | bigdata_gadm/gadm_410-levels.gpkg |
GADM v4.1 | Country land boundaries |
| EEZ Boundaries | bigdata_eez/eez_v12.gpkg |
Marine Regions v12 | Maritime territorial waters |
| GridFinder | bigdata_gridfinder/grid.gpkg |
GridFinder | Global grid infrastructure |
| JRC Population | bigdata_settlements_jrc/GHS_POP_E2025_*.tif |
JRC GHSL | Population distribution |
| Solar Baseline | bigdata_solar_pvout/PVOUT.tif |
Global Solar Atlas | PVOUT baseline |
| Wind Baseline | bigdata_wind_atlas/gasp_*.tif |
Global Wind Atlas | Wind power density |
| HydroATLAS | bigdata_hydro_atlas/RiverATLAS_Data_v10.gdb |
HydroATLAS | River reach attributes |
| MS Solar Sites | bigdata_solar_wind_ms/solar_all_2024q2_v1.gpkg |
Microsoft Planetary Computer | Existing solar installations |
| MS Wind Sites | bigdata_solar_wind_ms/wind_all_2024q2_v1.gpkg |
Microsoft Planetary Computer | Existing wind installations |
| ESA Land Cover | bigdata_landcover_cds/outputs/landcover_2022_300arcsec.tif |
CDS ERA5-Land | ESA CCI Land Cover 2022 (upscaled, GHS-POP aligned) |
| Dataset | Path | Source |
|---|---|---|
| Ember Data | data_energy_ember/yearly_full_release_*.csv |
Ember |
| IEA Projections | data_energy_projections_iea/WEO*.csv |
IEA WEO 2024 |
| UN Population | data_pop_un/ |
UN WPP 2024 |
| World Bank | data_country_class_wb/ |
World Bank |
These files are generated by the data preparation scripts and used by combine scripts:
Solar (bigdata_solar_cmip6/outputs/):
| File | Description |
|---|---|
PVOUT_2030_300arcsec.tif |
Projected PVOUT for 2030 (raw, no viability filter) |
PVOUT_2050_300arcsec.tif |
Projected PVOUT for 2050 (raw, no viability filter) |
PVOUT_baseline_300arcsec.tif |
Baseline PVOUT (Global Solar Atlas) |
PVOUT_UNCERTAINTY_2030_300arcsec.tif |
Ensemble range uncertainty for 2030 |
PVOUT_UNCERTAINTY_2050_300arcsec.tif |
Ensemble range uncertainty for 2050 |
SOLAR_VIABLE_CENTROIDS_2030.tif |
Viable solar cells raster for 2030 |
SOLAR_VIABLE_CENTROIDS_2050.tif |
Viable solar cells raster for 2050 |
SOLAR_VIABLE_CENTROIDS_2030.parquet |
Viable solar centroids (is_viable=True only) |
SOLAR_VIABLE_CENTROIDS_2050.parquet |
Viable solar centroids (is_viable=True only) |
Wind (bigdata_wind_cmip6/outputs/):
| File | Description |
|---|---|
WPD100_2030_300arcsec.tif |
Projected WPD at 100m for 2030 (raw, no viability filter) |
WPD100_2050_300arcsec.tif |
Projected WPD at 100m for 2050 (raw, no viability filter) |
WPD100_baseline_300arcsec.tif |
Baseline WPD (Global Wind Atlas) |
WPD100_UNCERTAINTY_2030_300arcsec.tif |
Ensemble range uncertainty for 2030 |
WPD100_UNCERTAINTY_2050_300arcsec.tif |
Ensemble range uncertainty for 2050 |
WIND_VIABLE_CENTROIDS_2030.tif |
Viable wind cells raster for 2030 |
WIND_VIABLE_CENTROIDS_2050.tif |
Viable wind cells raster for 2050 |
WIND_VIABLE_CENTROIDS_2030.parquet |
Viable wind centroids (is_viable=True only) |
WIND_VIABLE_CENTROIDS_2050.parquet |
Viable wind centroids (is_viable=True only) |
Hydro (bigdata_hydro_cmip6/outputs/):
| File | Description |
|---|---|
HYDRO_RUNOFF_2030_300arcsec.tif |
Projected runoff for 2030 (mm/year) |
HYDRO_RUNOFF_2050_300arcsec.tif |
Projected runoff for 2050 |
HYDRO_RUNOFF_baseline_300arcsec.tif |
Baseline runoff (ERA5-Land) |
HYDRO_ATLAS_DELTA_2030_300arcsec.tif |
Climate delta for rivers for 2030 |
HYDRO_ATLAS_DELTA_2050_300arcsec.tif |
Climate delta for rivers for 2050 |
river_proximity_mask_5km.tif |
Boolean river proximity mask |
HYDRO_VIABLE_CENTROIDS_2030.tif |
Viable hydro cells raster for 2030 |
HYDRO_VIABLE_CENTROIDS_2050.tif |
Viable hydro cells raster for 2050 |
HYDRO_VIABLE_CENTROIDS_2030.parquet |
Viable hydro centroids (is_viable=True only) |
HYDRO_VIABLE_CENTROIDS_2050.parquet |
Viable hydro centroids (is_viable=True only) |
RiverATLAS_projected_2030.parquet |
River reaches with projected discharge for 2030 |
RiverATLAS_projected_2050.parquet |
River reaches with projected discharge for 2050 |
Harmonizes Ember country-level statistics with Global Energy Monitor facility data.
Features:
- Integrates country totals with facility locations
- Spatially clusters facilities within 300 arcsec (~10km) grid cells
- Validates coordinates against GADM + EEZ boundaries
- Filters out offshore facilities beyond territorial waters
Output: data_facilities_gem/p1_a_ember_2024_30.xlsx
Projects 2030 and 2050 electricity generation scenarios.
Features:
- Incorporates UN population growth factors
- Processes National Determined Contributions (NDCs)
- Applies IEA growth rates for fossil/nuclear
- Disaggregates broad renewable targets
Output: data_facilities_gem/p1_b_ember_2024_30_50.xlsx
Download ESA CCI Land Cover 2022 from Copernicus Climate Data Store.
Features:
- Downloads global land cover at ~300m resolution (10 arcsec native)
- Converts NetCDF to GeoTIFF format
- Upscales to 300 arcsec with GHS-POP grid alignment (mode resampling)
- Used for viability filtering in solar/wind/hydro scripts
Output:
bigdata_landcover_cds/extracted/C3S-LC-L4-LCCS-Map-300m-P1Y-2022-v2.1.1.nc(raw NetCDF)bigdata_landcover_cds/outputs/landcover_2022_10arcsec.tif(native resolution)bigdata_landcover_cds/outputs/landcover_2022_300arcsec.tif(upscaled, GHS-POP aligned)
Generate CMIP6-based climate projections for solar and wind resources with viability filtering.
Viability Filter Logic: A cell (300 arcsec) is considered viable if:
- MS site present — Microsoft renewable energy dataset shows existing installation, OR
- Land cover valid AND resource >= threshold — ESA CCI land cover is suitable AND resource value meets minimum threshold
Thresholds (configurable in config.py):
- Solar:
SOLAR_PVOUT_THRESHOLD = 3.0kWh/kWp/day - Wind:
WIND_WPD_THRESHOLD = 150W/m²
Data Sources:
- Microsoft Viable Sites:
bigdata_solar_wind_ms/solar_all_2024q2_v1.gpkg(polygons),wind_all_2024q2_v1.gpkg(points) - ESA CCI Land Cover: Classes 10-40 (cropland), 130 (grassland), 150 (sparse vegetation), 200 (bare areas)
- CMIP6 Models: CESM2, EC-Earth3-veg-lr, MPI-ESM1-2-lr (ensemble mean + IQR uncertainty)
Method:
- Download CMIP6 ensemble data for historical + SSP245
- Calculate delta: Δ = Future_period / Historical_period
- Apply to baseline: Future = Baseline × Δ
- Apply viability filter: MS_present OR (landcover_valid AND resource >= threshold)
- Compute uncertainty (interquartile range)
Usage:
# Download only
python p1_d_viable_solar.py --download-only
# Process only (assumes downloads exist)
python p1_d_viable_solar.py --process-only
# Full pipeline
python p1_d_viable_solar.pySolar Outputs (bigdata_solar_cmip6/outputs/):
- GeoTIFF rasters (raw resource, no viability filter):
PVOUT_{2030,2050}_300arcsec.tif— Projected PVOUT (climate delta applied)PVOUT_baseline_300arcsec.tif— Baseline PVOUT (Global Solar Atlas)PVOUT_UNCERTAINTY_{2030,2050}_300arcsec.tif— Ensemble range uncertainty
- GeoTIFF rasters (viability-filtered):
SOLAR_VIABLE_CENTROIDS_{2030,2050}.tif— Viable cells only (0 = not viable)
- Parquet centroids (raw resource, no viability filter):
PVOUT_{2030,2050}_300arcsec.parquet— All cells with resource value > 0
- Parquet centroids (viability-filtered, matches TIF):
SOLAR_VIABLE_CENTROIDS_{2030,2050}.parquet— Only viable cells
Viable Centroids Parquet Schema:
| Column | Type | Description |
|---|---|---|
geometry |
Point | Pixel center coordinate (WGS84) |
source |
string | Resource type ("solar", "wind", "hydro") |
value_{year} |
float | Projected resource value for target year |
value_baseline |
float | Baseline resource value |
delta |
float | Climate change ratio (projected / baseline) |
uncertainty |
float | Ensemble range (max - min) |
is_ms_viable |
bool | True if MS renewable site present |
is_lc_valid |
bool | True if land cover class is valid |
meets_threshold |
bool | True if resource ≥ threshold |
is_viable |
bool | True (always, by construction: filtered) |
Wind Outputs (bigdata_wind_cmip6/outputs/):
- GeoTIFF rasters (raw resource, no viability filter):
WPD100_{2030,2050}_300arcsec.tif— Projected WPD at 100mWPD100_baseline_300arcsec.tif— Baseline WPD (Global Wind Atlas)WPD100_UNCERTAINTY_{2030,2050}_300arcsec.tif— Ensemble range uncertainty
- GeoTIFF rasters (viability-filtered):
WIND_VIABLE_CENTROIDS_{2030,2050}.tif— Viable cells only (0 = not viable)
- Parquet centroids (raw resource, no viability filter):
WPD100_{2030,2050}_300arcsec.parquet— All cells with resource value > 0
- Parquet centroids (viability-filtered, matches TIF):
WIND_VIABLE_CENTROIDS_{2030,2050}.parquet— Only viable cells (same schema as solar)
Unified hydro processing: ERA5-Land/CMIP6 runoff projections + RiverATLAS river reach projections.
Data Sources:
- Runoff Baseline: ERA5-Land monthly runoff (reanalysis, 0.1° resolution)
- Climate Projections: CMIP6
total_runoff(SSP2-4.5 scenario) - River Network: HydroATLAS RiverATLAS river reach dataset
Processing Parts:
-
Part 1: ERA5-Land + CMIP6 Runoff
- Download ERA5-Land runoff baseline (1995-2014)
- Download CMIP6 total_runoff for historical + SSP245
- Compute delta: Δ = CMIP6_future / CMIP6_historical
- Apply delta to ERA5-Land baseline
- Regrid to 300 arcsec (aligned with GHS-POP)
- Apply hydro filter (water/wetland OR river proximity)
-
Part 2: RiverATLAS Projections
- Load RiverATLAS river reaches with discharge (dis_m3_pyr)
- Generate 5km river proximity mask
- Extract delta values at river reach centroids
- Apply delta to baseline discharge: Future_Q = Baseline_Q × Δ
-
Part 3: Viable Hydro Centroids
- Combine runoff-based centroids (from Part 1 rasters)
- Combine river-based centroids (from Part 2 reaches)
- Apply runoff threshold filter (configurable in config.py)
CMIP6 Models:
- CESM2, EC-Earth3-veg-lr, MPI-ESM1-2-lr (ensemble mean)
Usage:
# Download all data
python p1_f_viable_hydro.py --download-only
# Process only (assumes downloads exist)
python p1_f_viable_hydro.py --process-only
# Generate river proximity mask only
python p1_f_viable_hydro.py --mask-only
# Full pipeline
python p1_f_viable_hydro.pyHydro Outputs (bigdata_hydro_cmip6/outputs/):
- GeoTIFF rasters:
HYDRO_RUNOFF_baseline_300arcsec.tif— ERA5-Land baseline runoffHYDRO_RUNOFF_{2030,2050}_300arcsec.tif— Projected runoffHYDRO_ATLAS_DELTA_{2030,2050}_300arcsec.tif— Climate delta for river reachesriver_proximity_mask_5km.tif— Boolean mask for river proximityHYDRO_VIABLE_CENTROIDS_{2030,2050}.tif— Viable cells only raster (for ArcGIS)
- Parquet outputs:
RiverATLAS_projected_{2030,2050}.parquet— River reaches with projected dischargeHYDRO_VIABLE_CENTROIDS_{2030,2050}.parquet— Viable hydro centroids (runoff + river-based)
Main supply-demand network analysis pipeline.
# Basic usage (single scenario: 100%)
python process_country_supply.py <ISO3>
# All supply scenarios (100%, 90%, 80%, 70%, 60%)
python process_country_supply.py <ISO3> --run-all-scenarios
# Single specific supply factor (e.g., 90% only)
python process_country_supply.py <ISO3> --supply-factor 0.9
# Multiple countries
python process_country_supply.py USA CHN IND
# Custom scenario
python process_country_supply.py KEN --scenario 2050_supply_100%
# Test mode (outputs GeoPackage)
python process_country_supply.py KEN --testPipeline Steps:
- Load boundaries — GADM (land) + EEZ (maritime)
- Process facilities — Filter, cluster, validate locations
- Build grid network — Load GridFinder, create NetworkX graph
- Allocate demand — Distribute national demand to population centroids
- Network analysis — Calculate shortest paths, match supply to demand
- Output generation — Parquet files per layer
Outputs:
outputs_per_country/parquet/{scenario}/centroids_{ISO3}.parquetoutputs_per_country/parquet/{scenario}/facilities_{ISO3}.parquetoutputs_per_country/parquet/{scenario}/grid_lines_{ISO3}.parquetoutputs_per_country/parquet/{scenario}/polylines_{ISO3}.parquet
Siting analysis for underserved remote settlements.
⚠️ Prerequisite: Must run AFTERprocess_country_supply.pycompletes.
# Single scenario (100%)
python process_country_siting.py KEN
# All supply scenarios (100%, 90%, 80%, 70%, 60%)
python process_country_siting.py KEN --run-all-scenarios
# Single specific supply factor (e.g., 90% only)
python process_country_siting.py KEN --supply-factor 0.9Pipeline Steps:
- Filter settlements — Select "Partially Filled" or "Not Filled" status
- Geographic clustering — DBSCAN with 50km threshold for isolated regions
- Capacity-driven K-means — Cluster by remaining facility capacity
- Grid distance analysis — Classify remote (>50km) vs near-grid
- Network design — Minimum spanning tree for remote clusters
- Boundary clipping — Ensure networks stay within country bounds
Outputs:
siting_clusters_{ISO3}.parquet— Cluster centers with assignmentssiting_networks_{ISO3}.parquet— Network geometriessiting_summary_{ISO3}.xlsx— Summary statistics
Generate country list and SLURM batch scripts for HPC cluster execution.
# Generate country list only
python generate_hpc_scripts.py
# Generate 40 parallel supply analysis scripts
python generate_hpc_scripts.py --create-parallel
# Generate 25 parallel siting analysis scripts
python generate_hpc_scripts.py --create-parallel-sitingFeatures:
- Reads country list from energy demand data (
p1_b_ember_2024_30_50.xlsx) - Validates countries against GADM boundaries (excludes HKG, MAC, XKX)
- Groups countries into computational tiers (T1-T5) based on size/complexity
- Generates optimized SLURM scripts with appropriate resource allocation
Generated Scripts:
| Script | Description |
|---|---|
submit_all_parallel.sh |
Submit all 40 supply analysis jobs |
submit_one.sh |
Submit individual supply script by number |
submit_all_parallel_siting.sh |
Submit all 25 siting analysis jobs |
submit_one_siting.sh |
Submit individual siting script by number |
submit_workflow.sh |
Combine results after all jobs complete |
parallel_scripts/*.sh |
40 individual supply SLURM scripts |
parallel_scripts_siting/*.sh |
25 individual siting SLURM scripts |
Scenario Flags (all wrapper scripts support these):
| Flag | Description |
|---|---|
| (none) | Run single scenario (100% supply factor) |
--run-all-scenarios |
Run all 5 scenarios (100%, 90%, 80%, 70%, 60%) |
--supply-factor 0.9 |
Run single specific supply factor (e.g., 90%) |
Convert country Parquet files to GeoPackage for visualization.
# Basic (4 layers)
python combine_one_results.py KEN
# With siting layers (7 layers)
python combine_one_results.py KEN # Auto-detects siting outputs
# Custom scenario
python combine_one_results.py KEN --scenario 2050_supply_100%Output: outputs_per_country/{scenario}_{ISO3}.gpkg
Layers included:
- Core supply analysis:
centroids,facilities,grid_lines,polylines - Siting analysis (if available):
siting_clusters,siting_networks - Viable centroids (CMIP6-based, if available):
SOLAR_VIABLE_CENTROIDS_{year},WIND_VIABLE_CENTROIDS_{year},HYDRO_VIABLE_CENTROIDS_{year}
CMIP6 Climate TIF Layers (auto-clipped if global TIFs exist):
The combine script automatically clips 12 CMIP6 TIF layers to the country extent for each target year:
| Layer | Description | Source |
|---|---|---|
PVOUT_{year} |
Projected solar PVOUT (kWh/kWp/day) | p1_d_viable_solar.py |
PVOUT_{year}_uncertainty |
IQR uncertainty from CMIP6 ensemble | p1_d_viable_solar.py |
PVOUT_baseline |
Baseline PVOUT from Global Solar Atlas | p1_d_viable_solar.py |
SOLAR_VIABLE_CENTROIDS_{year} |
Viable solar cells raster | p1_d_viable_solar.py |
WPD100_{year} |
Projected wind power density (W/m²) | p1_e_viable_wind.py |
WPD100_{year}_uncertainty |
IQR uncertainty from CMIP6 ensemble | p1_e_viable_wind.py |
WPD100_baseline |
Baseline WPD from Global Wind Atlas | p1_e_viable_wind.py |
WIND_VIABLE_CENTROIDS_{year} |
Viable wind cells raster | p1_e_viable_wind.py |
HYDRO_RUNOFF_{year} |
Projected runoff (mm/year) | p1_f_viable_hydro.py |
HYDRO_RUNOFF_baseline |
Baseline runoff from ERA5-Land | p1_f_viable_hydro.py |
HYDRO_ATLAS_DELTA_{year} |
Climate delta for river reaches | p1_f_viable_hydro.py |
HYDRO_VIABLE_CENTROIDS_{year} |
Viable hydro cells raster | p1_f_viable_hydro.py |
Note: GPKG raster layers are visible in QGIS but may not display in ArcGIS. For ArcGIS users, the global TIF files in
bigdata_*/outputs/directories can be used directly.
Merge all country outputs into global GeoPackage.
# Auto-detect scenarios
python combine_global_results.py --input-dir outputs_per_country
# Specific scenario
python combine_global_results.py --scenario 2030_supply_100%
# Subset of countries
python combine_global_results.py --countries USA CHN INDOutput: outputs_global/{scenario}_global.gpkg
Generate exposure analysis dataset across scenarios.
Dimensions:
- Years: 2030, 2050
- Supply factors: 100%, 90%, 80%, 70%, 60%
- Buffer distances: 0km, 10km, 20km, 30km, 40km
- Energy types: All 6 categories
Output: outputs_processed_data/exposure_analysis.parquet
| Script | Output | Description |
|---|---|---|
p1_z_fig12.py |
Figures 1-2 | Global energy exposure stacked bars |
p1_z_fig34.py |
Figures 3-4 | Exposure by type and year |
p1_z_fig56.py |
Figures 5-6 | Detailed exposure analysis |
p1_z_fig7.py |
Figure 7 | Sensitivity heatmaps (3×6 grid) |
p1_z_fig8.py |
Figure 8 | Hazard-specific breakdown |
Output directory: outputs_processed_fig/
┌─────────────────────────────────────────────────────────────────┐
│ STEP 1: Supply Analysis │
│ ───────────────────────── │
│ process_country_supply.py │
│ │
│ Outputs: │
│ └── 2030_supply_100%/ │
│ ├── centroids_{ISO3}.parquet │
│ ├── facilities_{ISO3}.parquet │
│ ├── grid_lines_{ISO3}.parquet │
│ └── polylines_{ISO3}.parquet │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 2: Siting Analysis (Optional) │
│ ────────────────────────────────── │
│ process_country_siting.py │
│ │
│ Outputs (same directory): │
│ └── 2030_supply_100%/ │
│ ├── siting_clusters_{ISO3}.parquet ← NEW │
│ ├── siting_networks_{ISO3}.parquet ← NEW │
│ └── siting_summary_{ISO3}.xlsx ← NEW │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ STEP 3: ADD_V2 Integration (Optional) │
│ ───────────────────────────────────── │
│ Re-run process_country_supply.py (auto-detects siting) │
│ │
│ Outputs: │
│ └── 2030_supply_100%_add_v2/ │
│ ├── centroids_{ISO3}_add_v2.parquet │
│ ├── facilities_{ISO3}_add_v2.parquet ← Includes synthetic │
│ ├── grid_lines_{ISO3}_add_v2.parquet ← Includes networks │
│ └── polylines_{ISO3}_add_v2.parquet ← Updated routes │
└─────────────────────────────────────────────────────────────────┘
# Step 1: Supply analysis
python process_country_supply.py KEN
# → Creates: outputs_per_country/parquet/2030_supply_100%/facilities_KEN.parquet
# Step 2: Siting analysis
python process_country_siting.py KEN
# → Creates: outputs_per_country/parquet/2030_supply_100%/siting_clusters_KEN.parquet
# Step 3: Integrated analysis (optional)
python process_country_supply.py KEN
# → Detects siting outputs, creates _add_v2 files
# Combine to GeoPackage
python combine_one_results.py KEN
# → Creates: outputs_per_country/2030_supply_100%_KEN_add_v2.gpkgEfficient columnar storage for analysis pipelines.
Location: outputs_per_country/parquet/{scenario}/
Layers:
| File | Geometry | Key Attributes |
|---|---|---|
centroids_{ISO3}.parquet |
Point | population, demand_mwh, supply_status, matched_facility |
facilities_{ISO3}.parquet |
Point | capacity_mw, generation_mwh, facility_type, num_merged |
grid_lines_{ISO3}.parquet |
LineString | distance_km, line_type, line_id |
polylines_{ISO3}.parquet |
LineString | centroid_id, facility_id, network_distance_km |
Multi-layer spatial database for GIS software (QGIS, ArcGIS).
Per-country: outputs_per_country/{scenario}_{ISO3}.gpkg
Global: outputs_global/{scenario}_global.gpkg
| Tier | Countries | CPUs | Memory | Time | Partition | Node | Examples |
|---|---|---|---|---|---|---|---|
| 1 | CHN | 40 | 450GB | 168h | Long | ouce-cn64 | China (dedicated node) |
| 2 | 4 large | 40 | 95GB | 168h | Long | - | USA, IND, BRA, DEU |
| 3 | 11 medium-large | 40 | 95GB | 48h | Medium | - | CAN, MEX, RUS, AUS, ARG, etc. |
| 4 | ~19 medium | 40 | 95GB | 12h | Short | - | TUR, NGA, COL, PAK, etc. (2/script) |
| 5 | ~150 small | 40 | 25GB | 12h | Short | - | All others (11/script) |
# ═══════════════════════════════════════════════════════════════
# PREPARATION
# ═══════════════════════════════════════════════════════════════
# Generate parallel scripts
python generate_hpc_scripts.py --create-parallel # 40 supply scripts
python generate_hpc_scripts.py --create-parallel-siting # 25 siting scripts
# Fix line endings (if prepared on Windows)
sed -i 's/\r$//' submit_all_parallel.sh submit_one.sh parallel_scripts/*.sh
sed -i 's/\r$//' submit_all_parallel_siting.sh submit_one_siting.sh parallel_scripts_siting/*.sh
chmod +x submit_*.sh parallel_scripts/*.sh parallel_scripts_siting/*.sh
# ═══════════════════════════════════════════════════════════════
# STEP 1: SUPPLY ANALYSIS (~8-12 hours for single scenario)
# ═══════════════════════════════════════════════════════════════
# Single scenario (100% only - faster)
./submit_all_parallel.sh
# OR: All 5 scenarios (100%, 90%, 80%, 70%, 60%) - takes ~5x longer
./submit_all_parallel.sh --run-all-scenarios
# OR: Single specific scenario (e.g., 90% only)
./submit_all_parallel.sh --supply-factor 0.9
# Monitor
squeue -u $USER
tail -f outputs_per_country/logs/parallel_*.out
# Verify completion (~189 countries)
find outputs_per_country/parquet -name "facilities_*.parquet" | wc -l
# ═══════════════════════════════════════════════════════════════
# STEP 2: SITING ANALYSIS (~4-6 hours for single scenario)
# ═══════════════════════════════════════════════════════════════
# Single scenario
./submit_all_parallel_siting.sh
# OR: All 5 scenarios
./submit_all_parallel_siting.sh --run-all-scenarios
# OR: Single specific scenario (e.g., 90% only)
./submit_all_parallel_siting.sh --supply-factor 0.9
# Monitor
tail -f outputs_per_country/logs/siting_*.out
# Verify completion
find outputs_per_country/parquet -name "siting_clusters_*.parquet" | wc -l
# ═══════════════════════════════════════════════════════════════
# STEP 3: ADD_V2 INTEGRATION (Optional, ~8-12 hours)
# ═══════════════════════════════════════════════════════════════
./submit_all_parallel.sh # Re-run supply to merge siting
# Verify _add_v2 files
find outputs_per_country/parquet -name "*_add_v2.parquet" | wc -l
# ═══════════════════════════════════════════════════════════════
# STEP 4: COMBINE RESULTS (~1-2 hours)
# ═══════════════════════════════════════════════════════════════
sbatch submit_workflow.sh
# Verify outputs
ls -lh outputs_global/*_global.gpkg| Phase | Single Scenario | All 5 Scenarios | Output |
|---|---|---|---|
| Supply Analysis | 8-12 hours | 40-60 hours | ~189 country parquets |
| Siting Analysis | 4-6 hours | 20-30 hours | ~150 siting parquets |
| ADD_V2 Integration | 8-12 hours | 40-60 hours | ~150 integrated parquets |
| Results Combination | 1-2 hours | 1-2 hours | Global GeoPackages |
| Total (full) | 21-32 hours | 101-152 hours | |
| Total (no ADD_V2) | 13-20 hours | 61-92 hours |
Note: Running
--run-all-scenariosprocesses 5 supply factors (100%, 90%, 80%, 70%, 60%) sequentially per country, taking ~5x longer than single scenario.
# Submit specific supply script (single scenario)
./submit_one.sh 06
# Submit specific supply script (all 5 scenarios)
./submit_one.sh 06 --run-all-scenarios
# Submit specific siting script (single scenario)
./submit_one_siting.sh 03
# Submit specific siting script (all 5 scenarios)
./submit_one_siting.sh 03 --run-all-scenarios| Issue | Solution |
|---|---|
'\r': command not found |
Run sed -i 's/\r$//' *.sh on Linux |
Permission denied |
Run chmod +x *.sh |
| Memory errors | Check with sacct -j <JOB_ID> --format=MaxRSS |
| Missing country outputs | Check countries_list.txt and job logs |
# Verify siting outputs exist
ls outputs_per_country/parquet/2030_supply_100%/siting_summary_*.xlsx
# Check exact filename (case-sensitive)
# Must be: siting_summary_{ISO3}.xlsx# Verify columns in Parquet
import geopandas as gpd
gdf = gpd.read_parquet("grid_lines_KEN.parquet")
print(gdf.columns)
print(gdf['line_type'].unique())
# Expected: ['grid_infrastructure', 'siting_networks', 'component_stitch']# Check parallelization in logs
grep "Using parallel processing" outputs_per_country/logs/parallel_*.out
# Verify CPU allocation
grep "MAX_WORKERS" outputs_per_country/logs/parallel_*.out| Log | Location | Content |
|---|---|---|
| Supply jobs | outputs_per_country/logs/parallel_*.out |
Processing output |
| Siting jobs | outputs_per_country/logs/siting_*.out |
Siting output |
| Combination | outputs_per_country/logs/workflow_*.out |
Merge output |
| Errors | *.err files |
Error messages |
| Column | Type | Description |
|---|---|---|
geometry |
Point | Population centroid location |
Population_centroid |
int | Population at centroid |
Total_Demand_{year}_centroid |
float | Energy demand (MWh) |
supply_status |
str | "Filled", "Partially Filled", "Not Filled" |
matched_facility_id |
str | Assigned facility ID |
network_distance_km |
float | Distance to matched facility |
GID_0 |
str | ISO3 country code |
| Column | Type | Description |
|---|---|---|
geometry |
Point | Facility location |
capacity_mw |
float | Generation capacity (MW) |
generation_mwh |
float | Annual generation (MWh) |
facility_type |
str | Solar, Wind, Hydro, etc. |
num_merged_units |
int | Number of clustered facilities |
remaining_capacity_mwh |
float | Unmatched capacity |
GID_0 |
str | ISO3 country code |
| Column | Type | Description |
|---|---|---|
geometry |
LineString | Grid line geometry |
distance_km |
float | Line segment length |
line_type |
str | "grid_infrastructure", "siting_networks", "component_stitch" |
line_id |
str | Unique line identifier |
GID_0 |
str | ISO3 country code |
| Country | Time | Memory |
|---|---|---|
| Small (TLS) | <5 min | <4GB |
| Medium (KEN) | 10-15 min | <8GB |
| Large (KOR) | 20-30 min | <16GB |
| Very Large (USA) | 1-2 hours | <32GB |
| Country | Time | Memory |
|---|---|---|
| Large (CHN, USA) | 15-30 min | <100GB |
| Medium | 5-15 min | <50GB |
| Small | <5 min | <20GB |
If you use this code or data in your research, please cite:
@software{electricity_supply_analysis,
title = {Global Electricity Supply-Demand Analysis Framework},
author = {[Author Names]},
year = {2025},
url = {[Repository URL]}
}This project is licensed under the MIT License - see the LICENSE file for details.
- Data Sources: GADM, Marine Regions, GridFinder, JRC GHSL, Global Solar Atlas, Global Wind Atlas, Ember, IEA, UN DESA
- Computing: [HPC Cluster Name] for computational resources
- Funding: [Funding Sources]