Skip to content

Fix normalize dask paths: replace boolean indexing with lazy reductions#1125

Merged
brendancol merged 4 commits into
masterfrom
issue-1124
Mar 31, 2026
Merged

Fix normalize dask paths: replace boolean indexing with lazy reductions#1125
brendancol merged 4 commits into
masterfrom
issue-1124

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented Mar 31, 2026

Summary

  • Replace data[finite_mask] (boolean fancy indexing that materializes dask arrays) with da.where(finite_mask, data, nan) + da.nanmin()/da.nanmax()/da.nanmean()/da.nanstd()
  • Guard division by zero in rescale with safe_range to prevent inf/nan from lazy branch evaluation
  • All four dask paths fixed: rescale (dask+numpy, dask+cupy) and standardize (dask+numpy, dask+cupy)

Context

Found during performance sweep (#1124). Boolean fancy indexing on dask arrays forces full materialization into a single chunk. The lazy reduction functions (da.nanmin etc.) do per-chunk reductions that never materialize the full array.

Test plan

  • All 29 existing normalize tests pass (verified)

Parallel subagent triage + ralph-loop workflow for auditing all
xrspatial modules for performance bottlenecks, OOM risk under
30TB dask workloads, and backend-specific anti-patterns.
7 tasks covering command scaffold, module scoring, parallel subagent
dispatch, report merging, ralph-loop generation, and smoke tests.
…#1124)

Replace `data[finite_mask]` (boolean fancy indexing that materializes
dask arrays) with `da.where(finite_mask, data, nan)` + `da.nanmin()`/
`da.nanmax()`/`da.nanmean()`/`da.nanstd()` for lazy per-chunk
reductions.

Guard division by zero in rescale with safe_range to prevent inf/nan
in lazy evaluation (da.where evaluates both branches).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 31, 2026
@brendancol brendancol merged commit 9edd073 into master Mar 31, 2026
11 checks passed
@brendancol brendancol deleted the issue-1124 branch May 4, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant