Cache Dask arrays created from NetCDFDataProxys to speed up loading files with multiple variables#6252
Conversation
916a1df to
c61b12f
Compare
c61b12f to
1249c6b
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #6252 +/- ##
==========================================
+ Coverage 89.85% 89.88% +0.02%
==========================================
Files 88 88
Lines 23401 23430 +29
Branches 4357 4361 +4
==========================================
+ Hits 21028 21059 +31
+ Misses 1646 1644 -2
Partials 727 727 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
⏱️ Performance Benchmark Report: 953e8f9Performance shiftsFull benchmark resultsGenerated by GHA run |
|
The benchmarks showing changes aren't really the ones I'd expect. |
⏱️ Performance Benchmark Report: 953e8f9Performance shiftsFull benchmark resultsGenerated by GHA run |
|
I added a benchmark in bfbd625 that should show the improvement. |
💯 it will be great to see this come together ! |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
⏱️ Performance Benchmark Report: ad1e4f1Performance shiftsFull benchmark resultsGenerated by GHA run |
ESadek-MO
left a comment
There was a problem hiding this comment.
Thanks @bouweandela, looks really good to me for the most part!
Only one suggestion, and I'm happy for you to oppose that. Other than that happy for this to be merged.
ESadek-MO
left a comment
There was a problem hiding this comment.
Happy with this, thank you!
* upstream/main: (98 commits) [pre-commit.ci] pre-commit autoupdate (SciTools#6335) SPEC 0: drop py310 and support py313 (SciTools#6195) Better benchmarking Python version handling (SciTools#6329) Move loading and combine code into their own submodules. (SciTools#6321) Bump scitools/workflows from 2025.02.1 to 2025.02.2 (SciTools#6327) replaced reference from build to python build (SciTools#6324) [pre-commit.ci] pre-commit autoupdate (SciTools#6315) Cache Dask arrays created from `NetCDFDataProxy`s to speed up loading files with multiple variables (SciTools#6252) Bump scitools/workflows from 2025.02.0 to 2025.02.1 (SciTools#6313) [pre-commit.ci] pre-commit autoupdate (SciTools#6310) Bump scitools/workflows from 2025.01.5 to 2025.02.0 (SciTools#6306) Updated environment lockfiles (SciTools#6301) Improve speed of loading small NetCDF files (SciTools#6229) [pre-commit.ci] pre-commit autoupdate (SciTools#6298) Use cube chunks for weights in aggregations with smart weights (SciTools#6288) Updated environment lockfiles (SciTools#6296) Bump scitools/workflows from 2025.01.4 to 2025.01.5 (SciTools#6300) Bump scitools/workflows from 2025.01.3 to 2025.01.4 (SciTools#6295) Lazy rectilinear interpolator (SciTools#6084) Revert "Fix broken link. (SciTools#6246)" (SciTools#6297) ...
… files with multiple variables (SciTools#6252) * Cache Dask arrays to speed up loading files with multiple variables * Add benchmark for files with many cubes * Add whatsnew * Add test * Add license header * Use a global to set the cache size * Update whatsnew
🚀 Pull Request
Description
Another idea to speed up loading NetCDF files with many variables. This caches the last 100 Dask arrays created from
NetCDFDataProxys so shared coordinates can be re-used. Since copying a Dask array is much faster than creating a new one, this gives a speedup.Consult Iris pull request check list
Add any of the below labels to trigger actions on this PR: