Skip to content

Fix three accuracy bugs in zonal stats dask backend (#1090)#1091

Merged
brendancol merged 2 commits into
masterfrom
issue-1090
Mar 30, 2026
Merged

Fix three accuracy bugs in zonal stats dask backend (#1090)#1091
brendancol merged 2 commits into
masterfrom
issue-1090

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Fixes #1090. Three bugs in zonal.stats() where the dask backend diverged from numpy:

  • All-NaN zones returned 0 instead of NaN. np.nansum converts all-NaN slices to 0. Added _nanreduce_preserve_allnan wrapper that restores NaN for zones where every block had no valid data.
  • Dask std/var used a numerically unstable one-pass formula. Replaced (Σx² - (Σx)²/n) / n with the Chan-Golub-LeVeque parallel merge algorithm. Block-level computation now produces M2 (sum of squared deviations from block mean) instead of raw sum-of-squares, so precision holds even when values are near 1e8.
  • _calc_stats compared zone_values != None when nodata_values was None, triggering a numpy FutureWarning. Now skips the comparison entirely.

Test plan

  • test_stats_all_nan_zone updated: dask now expects NaN (not 0), passes on all 4 backends
  • test_stats_nodata_wipes_zone updated: same fix
  • test_stats_variance_numerical_stability_1090: values near 1e8 with spread of 1, verifies dask matches numpy within 1e-6
  • test_stats_nodata_none_no_warning_1090: confirms no FutureWarning with default nodata_values=None
  • Full test_zonal.py suite: 119 passed
  • test_dataset_support.py: 18 passed

1. Dask sum/count/min/max now return NaN (not 0) for zones with all-NaN
   values, matching the numpy backend. Uses _nanreduce_preserve_allnan
   wrapper around np.nansum/nanmax/nanmin.

2. Dask std/var replaced the naive one-pass formula with the
   Chan-Golub-LeVeque parallel merge algorithm, which avoids catastrophic
   cancellation when the mean is large relative to the variance.

3. _calc_stats and crosstab helpers now skip the nodata_values != comparison
   when nodata_values is None, avoiding numpy FutureWarning.
- Block-level sum_squares now computes M2 (sum of squared deviations
  from block mean) instead of raw sum(x²), avoiding float64 precision
  loss for large values.
- Updated test_stats_all_nan_zone and test_stats_nodata_wipes_zone to
  expect NaN from dask (no longer 0).
- Added test_stats_variance_numerical_stability_1090: values near 1e8
  with spread of 1, verifying dask matches numpy to 1e-6.
- Added test_stats_nodata_none_no_warning_1090: confirms no
  FutureWarning when nodata_values=None.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 30, 2026
@brendancol brendancol merged commit 65b354f into master Mar 30, 2026
11 checks passed
@brendancol brendancol deleted the issue-1090 branch May 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix three accuracy bugs in zonal stats dask backend

1 participant