-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
ENH: add GroupBy.pipe method #17871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add GroupBy.pipe method #17871
Conversation
98949b5 to
9173deb
Compare
doc/source/groupby.rst
Outdated
| allow for a cleaner, more readable syntax. To read about ``.pipe`` in general terms, | ||
| see :ref:`here <basics.pipe>`. | ||
|
|
||
| For a concrete example on combining ``.groupby`` and ``.pipe`` , imagine have a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have -> having
doc/source/groupby.rst
Outdated
|
|
||
| .. ipython:: python | ||
| from numpy.random import choice, random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't import like like, rather import numpy as np and write things out (e.g. np.random.choice
| (base_df.pipe(lambda x: x[x.A>3]) | ||
| .groupby(['Store', 'Product']) | ||
| .pipe(rapport_func) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit abstract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've redone it with the previous df. Alternatively, I could remove this example.
pandas/core/common.py
Outdated
| element of the tuple. | ||
| func : callable or tuple of (callable, string) | ||
| Function to apply to this GroupBy or, alternatively, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove the GroupBy from here and make this more generic
pandas/core/common.py
Outdated
| kwargs[target] = obj | ||
| return func(*args, **kwargs) | ||
| else: | ||
| return func(obj, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need a return at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
@topper-123 Thanks a lot for reviving this feature. |
a88a50a to
956483f
Compare
Codecov Report
@@ Coverage Diff @@
## master #17871 +/- ##
==========================================
+ Coverage 91.23% 91.24% +<.01%
==========================================
Files 163 163
Lines 50105 50110 +5
==========================================
+ Hits 45715 45723 +8
+ Misses 4390 4387 -3
Continue to review full report at Codecov.
|
doc/source/whatsnew/v0.21.0.txt
Outdated
| - :func:`Series.reindex`, :func:`DataFrame.reindex`, :func:`Index.get_indexer` now support list-like argument for ``tolerance``. (:issue:`17367`) | ||
| - ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series`` | ||
| that allow for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. | ||
| See the :ref:`documentation <groupby.pipe>` for more. (:issue:`17871`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can move this to highlites is ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll move this to highlights and add an example in the whatsnew (if I understand you correctly?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
|
small change |
e450478 to
4f3e569
Compare
|
Had some git issue, so squashed all the commits. I'm off to sleep, I'll look into eventual comments tomorrow. |
doc/source/groupby.rst
Outdated
| (df.groupby(['Store', 'Product']).pipe(rapport_func) | ||
| where ``rapport_func`` take an arbitrary GroupBy object and creates a rapport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 'report' is the correct English word? (also above in the code example)
(no native speaker, but in my mother tongue 'rapport' exists, but the english translation is 'report' :-))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's rapport in my language too. I'm changed that
doc/source/groupby.rst
Outdated
|
|
||
| .. versionadded:: 0.21.0 | ||
|
|
||
| Similar to the functionality provided by ``DataFrames`` and ``Series``, functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrames -> DataFrame (or otherwise do not put it as a code object)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor comments
pandas/core/generic.py
Outdated
| Use ``.pipe`` when chaining together functions that expect | ||
| on Series or DataFrames. Instead of writing | ||
| Series, DataFrames or GroupBys. Instead of writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this change is needed? (as the docstring of GroupBy.pipe has a separate one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "on" was gramatically incorrect, So I changed the line. I think adding "GroupBy objects" add clarity that GroupBy objrcts can be used with piping. I didnt intend for this to only refer to series/DataFrames, but is too easy to misunderstand?
pandas/core/groupby.py
Outdated
| Parameters | ||
| ---------- | ||
| func : callable or tuple of (callable, string) | ||
| Function to apply to this GroupBy or, alternatively, a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The start of "Function ..." does not need to be aligned with the "callable .. " above, but just have a single identation of 4 spaces (numpy docstring specifics ..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would do "this GroupBy" -> "this GroupBy object"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, changing.
680473c to
adfb81c
Compare
|
The comments from @jorisvandenbossche have been addressed. Thanks for reviewing. |
adfb81c to
efeaf8c
Compare
doc/source/whatsnew/v0.21.0.txt
Outdated
| - The behavior of ``sum`` and ``prod`` on all-NaN Series/DataFrames is now consistent and no longer depends on whether `bottleneck <http://berkeleyanalytics.com/bottleneck>`__ is installed, see :ref:`here <whatsnew_0210.api_breaking.bottleneck>` | ||
| - Compatibility fixes for pypy, see :ref:`here <whatsnew_0210.pypy>`. | ||
| - ``GroupBy`` objects now have a ``pipe`` method, similar to the one on ``DataFrame`` and ``Series``, | ||
| that allows for functions that take a ``GroupBy`` to be composed in a clean, readable syntax. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a ref to the subsection
| def pipe(self, func, *args, **kwargs): | ||
| """ Apply a function with arguments to this GroupBy object | ||
| .. versionadded:: 0.21.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of mostly repeated doc strings in both places, you can template _pipe and use and Appender/Substitution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would propose to not do this here. The dosctrings are sufficiently different to just make it a lot harder to develop when trying to merge them in a single one with a lot of substituted parts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok thats fine
| expected = pd.Series([-79.5160891089, -78.4839108911, None], | ||
| index=index) | ||
|
|
||
| assert_series_equal(expected, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a tests on SeriesGroupBy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
pandas/core/groupby.py
Outdated
| -------- | ||
| pandas.Series.pipe | ||
| pandas.DataFrame.pipe | ||
| pandas.GroupBy.apply |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a short note on the difference here? (and also add a See also in apply to pipe, again with a short note on the difference)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but admittedly found it very hard to do in so few lines, see result. Do you have suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like: "Apply function to each group instead of to the full GroupBy object" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(IIUC) perhaps
``apply`` applies a function to each group. ``pipe`` applies a function to a ``GroupBy`` object.
would work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've uploaded new ones.
I've built the documentation, and pipe doesn't show up in the API. Do I need to add something somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, In groupby.rst line 1178 I have a link ":ref:here <basics.pipe>".
This link doesn't show up in the docs, can anyone see why? It seems normal
|
Hmm, I can see the newest code changes on my github repository, but they don't show up in the PR. Any suggestions on how to proceed? for reference, the newest code her: topper-123@8614c32 EDIT: never mind, it works now. |
efeaf8c to
8614c32
Compare
8614c32 to
702c1af
Compare
|
I think you need to add in api.rst as well. if docs build ok, ping on green. |
702c1af to
608a0e4
Compare
|
thanks @ghl3 and @topper-123 for the revival |
|
Great work, @jreback and @topper-123 |
git diff upstream/master -u -- "*.py" | flake8 --diffThis PR adds a
.pipemethod to GroupBy objects like forDataFrame.pipeandSeries .pipe.This PR is basically #10466 written by @ghl3 with some very minor updates, because that PR somehow got stalled and subsequently was closed.
This PR says it's new in 0.21, but I'll change that if it's too late to add this.