-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: list-like to_replace on Categorical.replace is ignored or crash #31734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @@ -0,0 +1,44 @@ | |||
| import pandas as pd | |||
| import pandas.testing as tm | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
How does this behave when In [5]: s
Out[5]:
0 0
1 1
2 2
dtype: category
Categories (3, int64): [0, 1, 2]
In [6]: s.replace({0: 1})
Out[6]:
0 1
1 1
2 2
dtype: int64 |
|
@dsaxton this is because in |
|
Thanks @JustinZhengBC ! Overall looks good |
| (1, 2, [2, 2, 3], True), | ||
| (1, 4, [4, 2, 3], True), | ||
| (4, 1, [1, 2, 3], True), | ||
| (5, 6, [1, 2, 3], True), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the case when to_replace and value are different dtypes need testing? e.g. with to_replace=1, value="4"
EDIT
actually, maybe that's outside the scope of the issue this PR is addressing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's doable, with limited dtype-checking as a full check would involve the category index which causes a crash on comparison on mixed dtypes
| ([1, 4], [5, 2], [5, 2, 3], False), | ||
| ], | ||
| ) | ||
| def test_replace(to_replace, value, expected, check_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the issue number here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/_testing.py
Outdated
| check_categorical : bool, default True | ||
| Whether to compare internal Categorical exactly. | ||
| check_category_order : bool, default True | ||
| Whether to compare category order internal Categoricals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a missing "of" here? Should it be
Whether to compare category order of internal Categoricals
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
pandas/core/arrays/categorical.py
Outdated
| def replace_list(self, to_replace, value, inplace: bool = False): | ||
| return self.replace(to_replace, value, inplace) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this and the change below it were experiments in addressing the dict case and list-list case for categoricals, which turned out to be more complex than expected and are not intended to be in this pr
pandas/core/arrays/categorical.py
Outdated
| if is_list_like(to_replace): | ||
| if is_list_like(value): | ||
| if len(to_replace) != len(value): | ||
| raise ValueError("to_replace and value must be same length!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this ValueError be a bit more precise? As in, include something about the lengths of to_replace and value which were received
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, this and the change above it were experiments in addressing the dict case and list-list case for categoricals, which turned out to be more complex than expected and are not intended to be in this pr
|
Hello @JustinZhengBC! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-02-17 00:22:52 UTC |
| ([1, 2, "3"], "5", ["5", "5", 3], True, False), | ||
| ], | ||
| ) | ||
| # GH 31720 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the issue number usually goes inside the test:
def test_me(*args):
# GH 1234
Probably not worth changing on its own, but if you get any other comments requesting changes, perhaps this is worth doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
doc/source/whatsnew/v1.1.0.rst
Outdated
| ^^^^^^^^^^^ | ||
|
|
||
| - | ||
| - Bug in :class:`Categorical` that would ignore or crash when calling :meth:`Series.replace` with a list-like ``to_replace`` (:issue:`31720`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue is tagged with v1.0.2 milestone, I think this should go in that whatsnew file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. move the note to 1.0.2
10fd29e to
fd0d736
Compare
|
@jreback I've moved the whatsnew and addressed the other feedback |
|
also pls merge master |
fd0d736 to
7d3c1a3
Compare
jreback
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very minor comments, ping on green.
|
@jreback green |
|
thanks @JustinZhengBC very nice! |
…l.replace is ignored or crash
…is ignored or crash (#32062) Co-authored-by: Justin Zheng <justinzhengbc@gmail.com>
Series.replacefails on categoricals with list #31720black pandasgit diff upstream/master -u -- "*.py" | flake8 --diffCovers the case where
to_replaceis a list-like andvalueis a string. Other cases, like "to_replaceis dict andvalueis None", or "to_replaceandvalueare both lists" are handled earlier in generic.py