Introduce Grouper objects internally#7561
Conversation
78f6bda to
f46f5d9
Compare
94aca25 to
6fba398
Compare
Upstream bug pandas-dev/pandas#12813 is fixed
This reverts commit 2a36e21a031b9e061b932682758551956f3f06d2.
6fba398 to
e863045
Compare
|
@Illviljan could use some typing help here if you have the time :) |
Illviljan
left a comment
There was a problem hiding this comment.
I'll look into it more later.
for more information, see https://pre-commit.ci
|
I'm not sure about the rest of the errors, @dcherian. Maybe IndexVariable needs to use the DataWithCoords mixin? xarray/xarray/core/alignment.py Lines 581 to 588 in d4db166 xarray/xarray/core/alignment.py Line 31 in d4db166 Lines 376 to 377 in d4db166 |
|
Variables don't have coordinates so that won't work. mypy is correct here, it's a bug and we don't test for grouping by index variables. A commit reverting to the old It's not clear to me why we allow this actually. Seems like |
* main: (34 commits) Update whats-new.rst Fix binning by unsorted array (pydata#7762) Bump codecov/codecov-action from 3.1.1 to 3.1.2 (pydata#7760) Fix typing errors using mypy 1.2 (pydata#7752) [skip-ci] dev whats-new Add whats-new for v2023.04.0 (pydata#7757) remove the `black` hook (pydata#7756) reword the what's new entry for the `pandas` 2.0 dtype changes (pydata#7755) restructure the contributing guide (pydata#7681) Continue to use nanosecond-precision Timestamps in precision-sensitive areas (pydata#7731) minor doc updates to clarify extensions using accessors (pydata#7751) align: Avoid reindexing when join="exact" (pydata#7736) `pandas=2.0` support (pydata#7724) Clarify vectorized indexing documentation (pydata#7747) Avoid recasting a CFTimeIndex (pydata#7735) fix typo (pydata#7746) [pre-commit.ci] pre-commit autoupdate (pydata#7745) Bump pypa/gh-action-pypi-publish from 1.8.4 to 1.8.5 (pydata#7743) preserve boolean dtype in encoding (pydata#7720) [skip-ci] Add alignment benchmarks (pydata#7738) ...
* main: Bump codecov/codecov-action from 3.1.2 to 3.1.3 (pydata#7781) Fix whats-new [skip-ci] dev whats-new (pydata#7775) [skip-ci] Release 2023.04.2 (pydata#7774) Fix groupby_bins when labels are specified (pydata#7769) Docstrings examples for string methods (pydata#7669) Add dev whats-new Add benchmark against latest release on main. (pydata#7753)
|
I'd like to merge this soon. It's an internal refactor with no public API changes. I think we can expose the Grouper objects publicly in a new PR |
Illviljan
left a comment
There was a problem hiding this comment.
I added some TODOs I've been thinking about, no stoppers.
| return len(self) | ||
|
|
||
| def __len__(self) -> int: | ||
| return len(self.full_index) # TODO: full_index not def, abstractmethod? |
There was a problem hiding this comment.
This will crash if .factorize hasn't been triggered before.
There was a problem hiding this comment.
Yes but it wouldn't make sense without it, and this class is internal-only. Well it would make sense if the user told us the labels or bin edges, but again its internal-only so eh...
|
|
||
| @dataclass | ||
| class BinGrouper(Grouper): | ||
| bins: Any # TODO: What is the typing? |
There was a problem hiding this comment.
I didn't have the time to figure out the typing on this one.
There was a problem hiding this comment.
At the moment it should be int | Sequence I think, so either number of bins, or actual bin edges in some Iterable where the order matters.
There was a problem hiding this comment.
Never really like to use Sequence because np.ndarrays are not Sequences... And I think it could be quite common to supply the bins with numpy arrays?
There was a problem hiding this comment.
Yes int or sequences or arrays
There was a problem hiding this comment.
Yes but "nested sequences" won't work here:https://numpy.org/doc/stable/reference/typing.html#numpy.typing.ArrayLike, but maybe that's a tiny detail
for more information, see https://pre-commit.ci
This reverts commit 917c77efb05bacffcf901e61eabb9defc9a429d7.
* main: Introduce Grouper objects internally (pydata#7561) [skip-ci] Add cftime groupby, resample benchmarks (pydata#7795) Fix groupby binary ops when grouped array is subset relative to other (pydata#7798) adjust the deprecation policy for python (pydata#7793) [pre-commit.ci] pre-commit autoupdate (pydata#7803) Allow the label run-upstream to run upstream CI (pydata#7787) Update asv links in contributing guide (pydata#7801) Implement DataArray.to_dask_dataframe() (pydata#7635) `ds.to_dict` with data as arrays, not lists (pydata#7739) Add lshift and rshift operators (pydata#7741) Use canonical name for set_horizonalalignment over alias set_ha (pydata#7786) Remove pandas<2 pin (pydata#7785) [pre-commit.ci] pre-commit autoupdate (pydata#7783)
| else: | ||
| newgroup = group | ||
|
|
||
| if newgroup.size == 0: |
There was a problem hiding this comment.
With xarray 2023.5.0 I seem to now get "UnboundLocalError: local variable 'newgroup' referenced before assignment" when using groupby with a IndexVariable object.
There was a problem hiding this comment.
Can you open a new issue please?
Builds on the refactoring in #7206