Modernize and simplify iris.analysis._Groupby#5015
Modernize and simplify iris.analysis._Groupby#5015trexfeathers merged 7 commits intoSciTools:mainfrom
Conversation
baa7eed to
730a209
Compare
730a209 to
39a352b
Compare
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #5015 +/- ##
==========================================
+ Coverage 89.28% 89.29% +0.01%
==========================================
Files 88 88
Lines 22269 22228 -41
Branches 4870 4865 -5
==========================================
- Hits 19882 19849 -33
+ Misses 1641 1634 -7
+ Partials 746 745 -1
... and 1 file with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
trexfeathers
left a comment
There was a problem hiding this comment.
Great work @bouweandela, and sorry for the delay.
I'm all for anything maintains functionality while doing this:
❗ This deserves a What's New entry.
Responses to bullet points
- Replace iris.coords._GroupIterator by the standard library function itertools.groupby
👍 From debugging through the old and new version of group(), I can see that this works in an equivalent way.
- Fix the iris.analysis._Group.repr method, it seems currently broken because it uses an attribute that the class doesn't have (anymore). I suspect that this code is not covered by tests #5014
👍 A definite improvement! I've left some comments.
- Add type hints
👍 Many thanks for doing this 🙂 I have checked for consistency.
- Simplify the code by removing the special treatment of
groupby_slicesif it had a length of 1
👍 I assume you're referring to _slice_merge()? I agree with the decision - see the relevant comment.
- Simplify the code by removing the unused dictionary keys stored in
_Groupby._slices_by_keyand instead using a list called_Groupby._groupby_indices
👍
- Replace the use of OrderedDict by dict, since dicts are ordered since Python 3.7
❓ I think this point is redundant thanks to making the _Groupby._groupby_indices list?
- Simplfiy the code of
_Groupby.groupby returning an empty list if_Groupby._groupby_coordswas empty. Previously there was a special if statement to make it return None, but this behaviour was not used anywhere in the code.
👍 I think the upstream code protects against there being no coordinates, and the downstream code wouldn't be able to handle None if that was returned.
|
@bouweandela any comment on my comments? |
Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
|
Thanks for reviewing @trexfeathers! I ran out of time for today, will continue tomorrow. |
|
@trexfeathers This is ready for another round of reviewing. Could you have a look, please? |
trexfeathers
left a comment
There was a problem hiding this comment.
Great, thanks @bouweandela 🍻
* upstream/main: (59 commits) Updated environment lockfiles (SciTools#5211) update ci locks location (SciTools#5228) Fixes to _discontiguity_in_bounds (attempt 2) (SciTools#4975) Finalises Lazy Data documentation (SciTools#5137) Modernize and simplify iris.analysis._Groupby (SciTools#5015) clarity on whatsnew entry contributors (SciTools#5240) Handle derived coordinates correctly in `concatenate` (SciTools#5096) Use real array for data of of small netCDF variables. (SciTools#5229) Bump scitools/workflows from 2023.04.1 to 2023.04.2 (SciTools#5236) fixing whatsnew entry remove results creation commit from blame configure codecov adding a whatsnew entry Replacing numpy legacy printing with array2string and remaking results for dependent tests Adding a whatsnew entry for 5224 (SciTools#5234) Cf cell method (SciTools#5224) Bump scitools/workflows from 2023.03.3 to 2023.04.1 (SciTools#5231) [pre-commit.ci] pre-commit autoupdate (SciTools#5230) Bump scitools/workflows from 2023.03.2 to 2023.03.3 (SciTools#5227) raise dask min pin (SciTools#5225) ...
* Modernize and simplify _Groupby * Rename variable to improve readability Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Add a whatsnew entry * Add a type hint to _add_shared_coord * Add a test for iris.analysis._Groupby.__repr__ --------- Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>
* upstream/main: (29 commits) review actions update .git-blame-ignore-revs adopt codespell Adopt sphinx design (SciTools#5127) Bump scitools/workflows from 2023.04.2 to 2023.04.3 (SciTools#5253) refresh manual pypi publish instructions (SciTools#5252) Updated environment lockfiles (SciTools#5250) removed bugfix section Make bm_runner location agnostic and include debugging. (SciTools#5247) Restore latest Whats New files. SciTools#5220 typo github.repository_owner. (SciTools#5248) Whats new updates for v3.5.0rc0. (SciTools#5246) libnetcdf <4.9 pin (SciTools#5242) update cf standard units (SciTools#5244) Updated environment lockfiles (SciTools#5211) update ci locks location (SciTools#5228) Fixes to _discontiguity_in_bounds (attempt 2) (SciTools#4975) Finalises Lazy Data documentation (SciTools#5137) Modernize and simplify iris.analysis._Groupby (SciTools#5015) clarity on whatsnew entry contributors (SciTools#5240) ...
* Basic functional lazy saving. * Simplify function signature which upsets Sphinx. * Non-lazy saves return nothing. * Now fixed to enable use with process/distributed scheduling. * Remove dask.utils.SerializableLock, which I think was a mistake. * Make DefferedSaveWrapper use _thread_safe_nc. * Fixes for non-lazy save. * Avoid saver error when no deferred writes. * Reorganise locking code, ready for shareable locks. * Remove optional usage of 'filelock' for lazy saves. * Document dask-specific locking; implement differently for threads or distributed schedulers. * Minor fix for unit-tests. * Pin libnetcdf to avoid problems -- see #5187. * Minor test fix. * Move DeferredSaveWrapper into _thread_safe_nc; replicate the NetCDFDataProxy fix; use one lock per Saver; add extra up-scaled test * Update lib/iris/fileformats/netcdf/saver.py Co-authored-by: Bouwe Andela <bouweandela@gmail.com> * Update lib/iris/fileformats/netcdf/_dask_locks.py Co-authored-by: Bouwe Andela <bouweandela@gmail.com> * Update lib/iris/fileformats/netcdf/saver.py Co-authored-by: Bouwe Andela <bouweandela@gmail.com> * Small rename + reformat. * Remove Saver lazy option; all lazy saves are delayed; factor out fillvalue checks and make them delayable. * Repurposed 'test__FillValueMaskCheckAndStoreTarget' to 'test__data_fillvalue_check', since old class is gone. * Disable (temporary) saver debug printouts. * Fix test problems; Saver automatically completes to preserve existing direct usage (which is public API). * Fix docstring error. * Fix spurious error in old saver test. * Fix Saver docstring. * More robust exit for NetCDFWriteProxy operation. * Fix doctests by making the Saver example functional. * Improve docstrings; unify terminology; simplify non-lazy save call. * Moved netcdf cell-method handling into nc_load_rules.helpers, and various tests into more specific test folders. * Fix lockfiles and Makefile process. * Add unit tests for routine _fillvalue_report(). * Remove debug-only code. * Added tests for what the save function does with the 'compute' keyword. * Fix mock-specific problems, small tidy. * Restructure hierarchy of tests.unit.fileformats.netcdf * Tidy test docstrings. * Correct test import. * Avoid incorrect checking of byte data, and a numpy deprecation warning. * Alter parameter names to make test reports clearer. * Test basic behaviour of _lazy_stream_data; make 'Saver._delayed_writes' private. * Add integration tests, and distributed dependency. * Docstring fixes. * Documentation section and whatsnew entry. * Various fixes to whatsnew, docstrings and docs. * Minor review changes, fix doctest. * Arrange tests + results to organise by package-name alone. * Review changes. * Review changes. * Enhance tests + debug. * Support scheduler type 'single-threaded'; allow retries on delayed-save test. * Improve test. * Adding a whatsnew entry for 5224 (#5234) * Adding a whatsnew entry explaining 5224 * Fixing link and format error * Replacing numpy legacy printing with array2string and remaking results for dependent tests * adding a whatsnew entry * configure codecov * remove results creation commit from blame * fixing whatsnew entry * Bump scitools/workflows from 2023.04.1 to 2023.04.2 (#5236) Bumps [scitools/workflows](https://github.com/scitools/workflows) from 2023.04.1 to 2023.04.2. - [Release notes](https://github.com/scitools/workflows/releases) - [Commits](SciTools/workflows@2023.04.1...2023.04.2) --- updated-dependencies: - dependency-name: scitools/workflows dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Use real array for data of of small netCDF variables. (#5229) * Small netCDF variable data is real. * Various test fixes. * More test fixing. * Fix printout in Mesh documentation. * Whatsnew + doctests fix. * Tweak whatsnew. * Handle derived coordinates correctly in `concatenate` (#5096) * First working prototype of concatenate that handels derived coordinates correctly * Added checks for derived coord metadata during concatenation * Added tests * Fixed defaults * Added what's new entry * Optimized test coverage * clarity on whatsnew entry contributors (#5240) * Modernize and simplify iris.analysis._Groupby (#5015) * Modernize and simplify _Groupby * Rename variable to improve readability Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Add a whatsnew entry * Add a type hint to _add_shared_coord * Add a test for iris.analysis._Groupby.__repr__ --------- Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Finalises Lazy Data documentation (#5137) * cube and io lazy data notes added * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added comments within analysis, as well as palette and iterate, and what's new * fixed docstrings as requested in @trexfeathers review * reverted cube.py for time being * fixed flake8 issue * Lazy data second batch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated lastest what'snew * I almost hope this wasn't the fix, I'm such a moron * adressed review changes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bill Little <bill.james.little@gmail.com> * Fixes to _discontiguity_in_bounds (attempt 2) (#4975) * update ci locks location (#5228) * Updated environment lockfiles (#5211) Co-authored-by: Lockfile bot <noreply@github.com> * Increase retries. * Change debug to show which elements failed. * update cf standard units (#5244) * update cf standard units * added whatsnew entry * Correct pull number Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> --------- Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * libnetcdf <4.9 pin (#5242) * Pin libnetcdf<4.9 and update lock files. * What's New entry. * libnetcdf not available on PyPI. * Fix for Pandas v2.0. * Fix for Pandas v2.0. * Avoid possible same-file crossover between tests. * Ensure all-different testfiles; load all vars lazy. * Revert changes to testing framework. * Remove repeated line from requirements/py*.yml (?merge error), and re-fix lockfiles. * Revert some more debug changes. * Reorganise test for better code clarity. * Use public 'Dataset.isopen()' instead of '._isopen'. * Create output files in unique temporary directories. * Tests for fileformats.netcdf._dask_locks. * Fix attribution names. * Fixed new py311 lockfile. * Fix typos spotted by codespell. * Add distributed test dep for python 3.11 * Fix lockfile for python 3.11 --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Bouwe Andela <bouweandela@gmail.com> Co-authored-by: Henry Wright <84939917+HGWright@users.noreply.github.com> Co-authored-by: Henry Wright <henrywright@sky.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Manuel Schlund <32543114+schlunma@users.noreply.github.com> Co-authored-by: Bill Little <bill.james.little@gmail.com> Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl> Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> Co-authored-by: Elias <110238618+ESadek-MO@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stephenworsley <49274989+stephenworsley@users.noreply.github.com> Co-authored-by: scitools-ci[bot] <107775138+scitools-ci[bot]@users.noreply.github.com> Co-authored-by: Lockfile bot <noreply@github.com>

🚀 Pull Request
Description
Modernize and simplify
iris.analysis._Groupby. This started out from the idea discussed in #4956.Here is an overview of some of the improvements here:
iris.coords._GroupIteratorby the standard library function itertools.groupbyiris.analysis._Group.__repr__method, it seems currently broken because it uses an attribute that the class doesn't have (anymore). I suspect that this code is not covered by tests Consider using a code coverage service #5014groupby_slicesif it had a length of 1_Groupby._slices_by_keyand instead using a list called_Groupby._groupby_indices_Groupby.groupby returning an empty list if_Groupby._groupby_coordswas empty. Previously there was a special if statement to make it return None, but this behaviour was not used anywhere in the code.Consult Iris pull request check list