Modernize and simplify iris.analysis._Groupby by bouweandela · Pull Request #5015 · SciTools/iris

bouweandela · 2022-10-06T15:15:24Z

🚀 Pull Request

Description

Modernize and simplify iris.analysis._Groupby. This started out from the idea discussed in #4956.

Here is an overview of some of the improvements here:

Replace iris.coords._GroupIterator by the standard library function itertools.groupby
Fix the iris.analysis._Group.__repr__ method, it seems currently broken because it uses an attribute that the class doesn't have (anymore). I suspect that this code is not covered by tests Consider using a code coverage service #5014
Add type hints
Simplify the code by removing the special treatment of groupby_slices if it had a length of 1
Simplify the code by removing the unused dictionary keys stored in _Groupby._slices_by_key and instead using a list called _Groupby._groupby_indices
Simplfiy the code of _Groupby.group by returning an empty list if _Groupby._groupby_coords was empty. Previously there was a special if statement to make it return None, but this behaviour was not used anywhere in the code.

Consult Iris pull request check list

codecov · 2023-02-23T22:14:10Z

Codecov Report

Patch coverage: 95.00% and project coverage change: +0.01 🎉

Comparison is base (6b25c3a) 89.28% compared to head (a70aea2) 89.29%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5015      +/-   ##
==========================================
+ Coverage   89.28%   89.29%   +0.01%     
==========================================
  Files          88       88              
  Lines       22269    22228      -41     
  Branches     4870     4865       -5     
==========================================
- Hits        19882    19849      -33     
+ Misses       1641     1634       -7     
+ Partials      746      745       -1

Impacted Files	Coverage Δ
lib/iris/analysis/__init__.py	`92.44% <94.91%> (+0.54%)`	⬆️
lib/iris/coords.py	`93.27% <100.00%> (+0.07%)`	⬆️

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

trexfeathers

Great work @bouweandela, and sorry for the delay.

I'm all for anything maintains functionality while doing this:

❗ This deserves a What's New entry.

Responses to bullet points

Replace iris.coords._GroupIterator by the standard library function itertools.groupby

👍 From debugging through the old and new version of group(), I can see that this works in an equivalent way.

Fix the iris.analysis._Group.repr method, it seems currently broken because it uses an attribute that the class doesn't have (anymore). I suspect that this code is not covered by tests #5014

👍 A definite improvement! I've left some comments.

Add type hints

👍 Many thanks for doing this 🙂 I have checked for consistency.

Simplify the code by removing the special treatment of groupby_slices if it had a length of 1

👍 I assume you're referring to _slice_merge()? I agree with the decision - see the relevant comment.

Simplify the code by removing the unused dictionary keys stored in _Groupby._slices_by_key and instead using a list called _Groupby._groupby_indices

👍

Replace the use of OrderedDict by dict, since dicts are ordered since Python 3.7

❓ I think this point is redundant thanks to making the _Groupby._groupby_indices list?

Simplfiy the code of _Groupby.group by returning an empty list if _Groupby._groupby_coords was empty. Previously there was a special if statement to make it return None, but this behaviour was not used anywhere in the code.

👍 I think the upstream code protects against there being no coordinates, and the downstream code wouldn't be able to handle None if that was returned.

lib/iris/analysis/__init__.py

trexfeathers · 2023-03-31T09:10:21Z

@bouweandela any comment on my comments?

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

bouweandela · 2023-04-03T16:17:35Z

Thanks for reviewing @trexfeathers! I ran out of time for today, will continue tomorrow.

bouweandela · 2023-04-04T09:36:39Z

@trexfeathers This is ready for another round of reviewing. Could you have a look, please?

trexfeathers

Great, thanks @bouweandela 🍻

lib/iris/analysis/__init__.py

* upstream/main: (59 commits) Updated environment lockfiles (SciTools#5211) update ci locks location (SciTools#5228) Fixes to _discontiguity_in_bounds (attempt 2) (SciTools#4975) Finalises Lazy Data documentation (SciTools#5137) Modernize and simplify iris.analysis._Groupby (SciTools#5015) clarity on whatsnew entry contributors (SciTools#5240) Handle derived coordinates correctly in `concatenate` (SciTools#5096) Use real array for data of of small netCDF variables. (SciTools#5229) Bump scitools/workflows from 2023.04.1 to 2023.04.2 (SciTools#5236) fixing whatsnew entry remove results creation commit from blame configure codecov adding a whatsnew entry Replacing numpy legacy printing with array2string and remaking results for dependent tests Adding a whatsnew entry for 5224 (SciTools#5234) Cf cell method (SciTools#5224) Bump scitools/workflows from 2023.03.3 to 2023.04.1 (SciTools#5231) [pre-commit.ci] pre-commit autoupdate (SciTools#5230) Bump scitools/workflows from 2023.03.2 to 2023.03.3 (SciTools#5227) raise dask min pin (SciTools#5225) ...

* Modernize and simplify _Groupby * Rename variable to improve readability Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Add a whatsnew entry * Add a type hint to _add_shared_coord * Add a test for iris.analysis._Groupby.__repr__ --------- Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

* upstream/main: (29 commits) review actions update .git-blame-ignore-revs adopt codespell Adopt sphinx design (SciTools#5127) Bump scitools/workflows from 2023.04.2 to 2023.04.3 (SciTools#5253) refresh manual pypi publish instructions (SciTools#5252) Updated environment lockfiles (SciTools#5250) removed bugfix section Make bm_runner location agnostic and include debugging. (SciTools#5247) Restore latest Whats New files. SciTools#5220 typo github.repository_owner. (SciTools#5248) Whats new updates for v3.5.0rc0. (SciTools#5246) libnetcdf <4.9 pin (SciTools#5242) update cf standard units (SciTools#5244) Updated environment lockfiles (SciTools#5211) update ci locks location (SciTools#5228) Fixes to _discontiguity_in_bounds (attempt 2) (SciTools#4975) Finalises Lazy Data documentation (SciTools#5137) Modernize and simplify iris.analysis._Groupby (SciTools#5015) clarity on whatsnew entry contributors (SciTools#5240) ...

@trexfeathers

* Basic functional lazy saving. * Simplify function signature which upsets Sphinx. * Non-lazy saves return nothing. * Now fixed to enable use with process/distributed scheduling. * Remove dask.utils.SerializableLock, which I think was a mistake. * Make DefferedSaveWrapper use _thread_safe_nc. * Fixes for non-lazy save. * Avoid saver error when no deferred writes. * Reorganise locking code, ready for shareable locks. * Remove optional usage of 'filelock' for lazy saves. * Document dask-specific locking; implement differently for threads or distributed schedulers. * Minor fix for unit-tests. * Pin libnetcdf to avoid problems -- see #5187. * Minor test fix. * Move DeferredSaveWrapper into _thread_safe_nc; replicate the NetCDFDataProxy fix; use one lock per Saver; add extra up-scaled test * Update lib/iris/fileformats/netcdf/saver.py Co-authored-by: Bouwe Andela <bouweandela@gmail.com> * Update lib/iris/fileformats/netcdf/_dask_locks.py Co-authored-by: Bouwe Andela <bouweandela@gmail.com> * Update lib/iris/fileformats/netcdf/saver.py Co-authored-by: Bouwe Andela <bouweandela@gmail.com> * Small rename + reformat. * Remove Saver lazy option; all lazy saves are delayed; factor out fillvalue checks and make them delayable. * Repurposed 'test__FillValueMaskCheckAndStoreTarget' to 'test__data_fillvalue_check', since old class is gone. * Disable (temporary) saver debug printouts. * Fix test problems; Saver automatically completes to preserve existing direct usage (which is public API). * Fix docstring error. * Fix spurious error in old saver test. * Fix Saver docstring. * More robust exit for NetCDFWriteProxy operation. * Fix doctests by making the Saver example functional. * Improve docstrings; unify terminology; simplify non-lazy save call. * Moved netcdf cell-method handling into nc_load_rules.helpers, and various tests into more specific test folders. * Fix lockfiles and Makefile process. * Add unit tests for routine _fillvalue_report(). * Remove debug-only code. * Added tests for what the save function does with the 'compute' keyword. * Fix mock-specific problems, small tidy. * Restructure hierarchy of tests.unit.fileformats.netcdf * Tidy test docstrings. * Correct test import. * Avoid incorrect checking of byte data, and a numpy deprecation warning. * Alter parameter names to make test reports clearer. * Test basic behaviour of _lazy_stream_data; make 'Saver._delayed_writes' private. * Add integration tests, and distributed dependency. * Docstring fixes. * Documentation section and whatsnew entry. * Various fixes to whatsnew, docstrings and docs. * Minor review changes, fix doctest. * Arrange tests + results to organise by package-name alone. * Review changes. * Review changes. * Enhance tests + debug. * Support scheduler type 'single-threaded'; allow retries on delayed-save test. * Improve test. * Adding a whatsnew entry for 5224 (#5234) * Adding a whatsnew entry explaining 5224 * Fixing link and format error * Replacing numpy legacy printing with array2string and remaking results for dependent tests * adding a whatsnew entry * configure codecov * remove results creation commit from blame * fixing whatsnew entry * Bump scitools/workflows from 2023.04.1 to 2023.04.2 (#5236) Bumps [scitools/workflows](https://github.com/scitools/workflows) from 2023.04.1 to 2023.04.2. - [Release notes](https://github.com/scitools/workflows/releases) - [Commits](SciTools/workflows@2023.04.1...2023.04.2) --- updated-dependencies: - dependency-name: scitools/workflows dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Use real array for data of of small netCDF variables. (#5229) * Small netCDF variable data is real. * Various test fixes. * More test fixing. * Fix printout in Mesh documentation. * Whatsnew + doctests fix. * Tweak whatsnew. * Handle derived coordinates correctly in `concatenate` (#5096) * First working prototype of concatenate that handels derived coordinates correctly * Added checks for derived coord metadata during concatenation * Added tests * Fixed defaults * Added what's new entry * Optimized test coverage * clarity on whatsnew entry contributors (#5240) * Modernize and simplify iris.analysis._Groupby (#5015) * Modernize and simplify _Groupby * Rename variable to improve readability Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Add a whatsnew entry * Add a type hint to _add_shared_coord * Add a test for iris.analysis._Groupby.__repr__ --------- Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * Finalises Lazy Data documentation (#5137) * cube and io lazy data notes added * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Added comments within analysis, as well as palette and iterate, and what's new * fixed docstrings as requested in @trexfeathers review * reverted cube.py for time being * fixed flake8 issue * Lazy data second batch * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * updated lastest what'snew * I almost hope this wasn't the fix, I'm such a moron * adressed review changes --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Bill Little <bill.james.little@gmail.com> * Fixes to _discontiguity_in_bounds (attempt 2) (#4975) * update ci locks location (#5228) * Updated environment lockfiles (#5211) Co-authored-by: Lockfile bot <noreply@github.com> * Increase retries. * Change debug to show which elements failed. * update cf standard units (#5244) * update cf standard units * added whatsnew entry * Correct pull number Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> --------- Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> * libnetcdf <4.9 pin (#5242) * Pin libnetcdf<4.9 and update lock files. * What's New entry. * libnetcdf not available on PyPI. * Fix for Pandas v2.0. * Fix for Pandas v2.0. * Avoid possible same-file crossover between tests. * Ensure all-different testfiles; load all vars lazy. * Revert changes to testing framework. * Remove repeated line from requirements/py*.yml (?merge error), and re-fix lockfiles. * Revert some more debug changes. * Reorganise test for better code clarity. * Use public 'Dataset.isopen()' instead of '._isopen'. * Create output files in unique temporary directories. * Tests for fileformats.netcdf._dask_locks. * Fix attribution names. * Fixed new py311 lockfile. * Fix typos spotted by codespell. * Add distributed test dep for python 3.11 * Fix lockfile for python 3.11 --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: Bouwe Andela <bouweandela@gmail.com> Co-authored-by: Henry Wright <84939917+HGWright@users.noreply.github.com> Co-authored-by: Henry Wright <henrywright@sky.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Manuel Schlund <32543114+schlunma@users.noreply.github.com> Co-authored-by: Bill Little <bill.james.little@gmail.com> Co-authored-by: Bouwe Andela <b.andela@esciencecenter.nl> Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com> Co-authored-by: Elias <110238618+ESadek-MO@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: stephenworsley <49274989+stephenworsley@users.noreply.github.com> Co-authored-by: scitools-ci[bot] <107775138+scitools-ci[bot]@users.noreply.github.com> Co-authored-by: Lockfile bot <noreply@github.com>

bouweandela force-pushed the update-groupby branch from baa7eed to 730a209 Compare October 7, 2022 08:24

Modernize and simplify _Groupby

39a352b

bouweandela force-pushed the update-groupby branch from 730a209 to 39a352b Compare October 7, 2022 08:52

bouweandela marked this pull request as ready for review October 7, 2022 09:12

stephenworsley assigned bjlittle Oct 19, 2022

rcomer added the Type: Tech Debt label Jan 4, 2023

trexfeathers added the Feature: ESMValTool label Feb 20, 2023

bjlittle removed their assignment Feb 22, 2023

trexfeathers assigned trexfeathers and unassigned trexfeathers Feb 22, 2023

trexfeathers self-requested a review February 22, 2023 11:27

Merge branch 'main' into update-groupby

ea8f4c6

bouweandela mentioned this pull request Feb 24, 2023

Add coverage testing #4765

Merged

trexfeathers requested changes Mar 1, 2023

View reviewed changes

bouweandela and others added 4 commits April 3, 2023 17:38

Rename variable to improve readability

78d9e3b

Co-authored-by: Martin Yeo <40734014+trexfeathers@users.noreply.github.com>

Merge branch 'main' of github.com:scitools/iris into update-groupby

92a5e79

Add a whatsnew entry

1abf0b6

Add a type hint to _add_shared_coord

1a57b41

Add a test for iris.analysis._Groupby.__repr__

a70aea2

trexfeathers approved these changes Apr 12, 2023

View reviewed changes

lib/iris/analysis/__init__.py Show resolved Hide resolved

trexfeathers merged commit 5585684 into SciTools:main Apr 12, 2023

bouweandela deleted the update-groupby branch April 12, 2023 12:50

bouweandela mentioned this pull request Jan 30, 2024

Potentially unused code in iris.coord #4956

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modernize and simplify iris.analysis._Groupby#5015

Modernize and simplify iris.analysis._Groupby#5015
trexfeathers merged 7 commits intoSciTools:mainfrom
bouweandela:update-groupby

bouweandela commented Oct 6, 2022 •

edited

Loading

Uh oh!

codecov bot commented Feb 23, 2023 •

edited

Loading

Uh oh!

trexfeathers left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trexfeathers commented Mar 31, 2023

Uh oh!

bouweandela commented Apr 3, 2023

Uh oh!

bouweandela commented Apr 4, 2023

Uh oh!

trexfeathers left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bouweandela commented Oct 6, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Pull Request

Description

Uh oh!

codecov bot commented Feb 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

trexfeathers left a comment

Choose a reason for hiding this comment

Responses to bullet points

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

trexfeathers commented Mar 31, 2023

Uh oh!

bouweandela commented Apr 3, 2023

Uh oh!

bouweandela commented Apr 4, 2023

Uh oh!

trexfeathers left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

bouweandela commented Oct 6, 2022 •

edited

Loading

codecov bot commented Feb 23, 2023 •

edited

Loading