Skip to content

Fix NaN handling in focal_stats CUDA kernels (#1092)#1093

Merged
brendancol merged 2 commits into
masterfrom
issue-1092
Mar 30, 2026
Merged

Fix NaN handling in focal_stats CUDA kernels (#1092)#1093
brendancol merged 2 commits into
masterfrom
issue-1092

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Summary

Fixes #1092. The focal_stats CUDA kernels (_focal_mean_cuda, _focal_sum_cuda, _focal_std_cuda, _focal_var_cuda, _focal_range_cuda, _focal_min_cuda, _focal_max_cuda) propagated NaN through arithmetic instead of skipping it. The numpy path uses np.nanmean/nansum/nanstd/etc which skip NaN, so the same data gave different results on GPU vs CPU.

The fix adds if v != v: continue NaN checks to each CUDA kernel, matching the numpy nan-safe behavior. The non-focal mean() function already had these checks in its _mean_gpu kernel.

For min/max, there was also a subtler bug: if the first neighbor encountered was NaN, m = NaN and found = True, after which all subsequent v < m / v > m comparisons returned False (NaN comparisons always return False), so NaN got stuck as the result.

Test plan

  • test_focal_stats_nan_handling_1092: tests all 7 stats (mean, sum, min, max, std, var, range) with NaN in input, across all 4 backends (numpy, cupy, dask+numpy, dask+cupy)
  • test_focal_stats_all_nan_window_1092: all-NaN window produces NaN for mean/min/max, 0 for sum (matching numpy nansum)
  • Full test_focal.py suite: 122 passed

All focal_stats CUDA kernels (_focal_mean_cuda, _focal_sum_cuda,
_focal_std_cuda, _focal_var_cuda, _focal_range_cuda, _focal_min_cuda,
_focal_max_cuda) now skip NaN neighbors with `if v != v: continue`,
matching the numpy path which uses np.nanmean/nansum/nanstd/etc.

Previously, NaN propagated through arithmetic, giving different
results on GPU vs CPU when input contained NaN.
- test_focal_stats_nan_handling_1092: verifies all 7 stats (mean, sum,
  min, max, std, var, range) skip NaN neighbors across all 4 backends.
- test_focal_stats_all_nan_window_1092: all-NaN window gives NaN for
  mean/min/max and 0 for sum (matching numpy nansum behavior).
- Fixed sum kernel to return 0 (not NaN) for all-NaN windows, matching
  numpy nansum semantics.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 30, 2026
@brendancol brendancol merged commit 1437c61 into master Mar 30, 2026
11 checks passed
@brendancol brendancol deleted the issue-1092 branch May 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

focal_stats CUDA kernels propagate NaN instead of skipping it

1 participant