Skip to content

Memory-safe rechunk, preview chunk budget, plot improvements#1075

Merged
brendancol merged 8 commits into
masterfrom
dask-memory-safe-rechunk-and-plot
Mar 27, 2026
Merged

Memory-safe rechunk, preview chunk budget, plot improvements#1075
brendancol merged 8 commits into
masterfrom
dask-memory-safe-rechunk-and-plot

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

@brendancol brendancol commented Mar 27, 2026

Summary

  • rechunk_no_shuffle accepts xr.Dataset and re-opens unmodified zarr stores with larger chunks instead of adding a rechunk-merge graph layer. For file-backed data the task graph is much smaller.
  • preview() derives a per-task memory budget from the active dask cluster (worker_memory * 0.7 / nthreads) and re-opens zarr sources when chunks exceed it, avoiding OOM on multi-threaded workers.
  • DataArray.xrs.plot() auto-computes dask arrays, sets equal aspect ratio, and creates a figure with reasonable defaults. New Dataset.xrs.plot() renders 2D variables as a subplot grid with GeoTIFF colormap support.
  • Removes fused_overlap/multi_overlap from the Dataset accessor (the underlying functions only accept DataArrays, so these would always raise TypeError).
  • Adds tests for zarr re-open and Dataset rechunk fallback path.
  • Updates rechunk and reprojection example notebooks.

Test plan

  • All 13 rechunk tests pass (including 3 new: zarr re-open, zarr sel skip, Dataset fallback)
  • Run distributed reprojection notebook against a local zarr store to verify rechunk + preview end-to-end
  • Spot-check Dataset.xrs.plot() with a multi-variable Dataset

Add column/rasterize_kw params, fix accessor namespace to .xrs,
clarify nodata semantics, specify float64 output dtype, add
list-of-pairs zones support, note dask chunk alignment strategy.
rechunk_no_shuffle now accepts Datasets and re-opens unmodified zarr
stores with larger chunks instead of layering a rechunk-merge graph.
preview() derives a per-task memory budget from the active dask cluster
and re-opens zarr sources when chunks exceed it.

Accessor gains Dataset.xrs.plot() for subplot grids and DataArray plot
auto-computes dask arrays, sets equal aspect, and avoids kwargs mutation.

Removes fused_overlap/multi_overlap from the Dataset accessor (the
underlying functions only accept DataArrays).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Mar 27, 2026
Keep our fixes: kwargs mutation fix in DataArray.xrs.plot(),
removal of fused_overlap/multi_overlap from Dataset accessor.
The zarr tests called xr.open_zarr / to_zarr which fails on CI
environments without zarr (e.g. macOS 3.14). The failure triggered
fail-fast cancellation of all other matrix jobs.
Drop _is_unmodified_zarr, _reopen_preview_chunks, and
_preview_chunk_budget. Dataset rechunk now uses ds.chunk()
instead of re-opening the zarr store under the hood.
@brendancol brendancol merged commit 4ccbddb into master Mar 27, 2026
11 checks passed
@brendancol brendancol deleted the dask-memory-safe-rechunk-and-plot branch May 4, 2026 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant