-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Remove old warnings (plus some useless code) #18022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I guess we never put these deprecations in #6581 |
pandas/core/categorical.py
Outdated
| ordered = values.ordered | ||
| if categories is None: | ||
| categories = values.categories | ||
| values = values.get_values() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does this get hit? its prob pretty inefficient (as converting to an array then back), we already catch the dtype conversion above (which is why you can remove this code). maybe try to remove this part as well?
| # after 0.18/ in 2016 | ||
| if (is_integer_dtype(values) and | ||
| not is_integer_dtype(dtype.categories)): | ||
| warn("Values and categories have different dtypes. Did you " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do these get hit anywhere in the tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine with removing them. pls add a note in the deprecations removal section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
=========================================
Coverage ? 91.22%
=========================================
Files ? 163
Lines ? 50089
Branches ? 0
=========================================
Hits ? 45693
Misses ? 4396
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
=========================================
Coverage ? 91.22%
=========================================
Files ? 163
Lines ? 50089
Branches ? 0
=========================================
Hits ? 45693
Misses ? 4396
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
=========================================
Coverage ? 91.22%
=========================================
Files ? 163
Lines ? 50089
Branches ? 0
=========================================
Hits ? 45693
Misses ? 4396
Partials ? 0
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #18022 +/- ##
==========================================
- Coverage 91.43% 91.39% -0.05%
==========================================
Files 163 163
Lines 50091 50083 -8
==========================================
- Hits 45800 45771 -29
- Misses 4291 4312 +21
Continue to review full report at Codecov.
|
754ced2 to
b20fbe5
Compare
pandas/core/categorical.py
Outdated
| from pandas.core.algorithms import _get_data_algo, _hashtables | ||
| if is_categorical_dtype(values.dtype): | ||
| codes = (values.cat.codes if isinstance(values, ABCSeries) | ||
| else values.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There must be a cleaner way... but I couldn't find it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning that the check could also be done before (which I think is how it was before?)
something like
if dtype.categories is None:
codes, categories = ....
elif is_categorical_dtype(values):
.... handle this case
else:
codes = _get_codes_for_values(values, dtype.categories)
I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes vs values.codes ..., so my comment went a bit sideways :-))
doc/source/whatsnew/v0.21.1.txt
Outdated
| ~~~~~~~~~~~~ | ||
|
|
||
| - | ||
| - Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move to the "Removal of prior version deprecations/changes" section (in 0.22.0)
| categories = values.categories | ||
| values = values.get_values() | ||
| if dtype.categories is None: | ||
| dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some lines above, there is already a
elif is_categorical(values):
dtype = values.dtype._from_categorical_dtype(values.dtype,
categories, ordered)
which should catch the same ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... but instead testing dtype.categories must be done here (that is, after all the branches of the above if...elif...else). Viceversa, the dtype must be defined before the fastpath test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but help me understand in what case this line is needed and is doing something different as the already defined dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance the case in which values is already a Categorical, categories is None and dtype is "category".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and in that case
elif is_categorical(values):
dtype = values.dtype._from_categorical_dtype(values.dtype,
categories, ordered)
will already have created an appropriate dtype ?
So I still don't understand why in such a case it would need to be re-determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will already have created an appropriate dtype ?
No, precisely because categories is None. Not just in theory: you can actually try and verify that tests break.
It could be possible to first have a different set of checks which finds the right value for categories, but I doubt it would result in simpler code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assume this small example:
In [1]: values = pd.Categorical(['a', 'b', 'c', 'a'])
In [2]: values
Out[2]:
[a, b, c, a]
Categories (3, object): [a, b, c]
In [4]: categories = None
In [5]: dtype = 'category'
In [9]: ordered = None
In [10]: dtype = values.dtype._from_categorical_dtype(values.dtype, categories, ordered)
In [11]: dtype
Out[11]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)
So if you pass categorical values, it will by definition have categories (can a Categorical have categories of None?), and those will be passed to the resulting dtype object returned from values.dtype._from_categorical_dtype.
So still wondering, in what case can you have categorical values, but where the resulting dtype from above still has dtype.categories is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when removing that check, it is the case where a CategoricalDType instance is passed that differs from the dtype of the values:
dtype = CategoricalDtype(None, ordered=True)
values = Categorical(['a', 'b', 'd'])
Categorical(values, dtype=dtype)
I would find that more logical to handle in the if dtype is not None: path (to clearly see which takes precedence in such a case). But I assume that when fastpath=True we don't want to check that. Although I think that if you use that, dtype.categories will never be None.
pandas/core/categorical.py
Outdated
| from pandas.core.algorithms import _get_data_algo, _hashtables | ||
| if is_categorical_dtype(values.dtype): | ||
| codes = (values.cat.codes if isinstance(values, ABCSeries) | ||
| else values.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?
pandas/core/categorical.py
Outdated
| from pandas.core.algorithms import _get_data_algo, _hashtables | ||
| if is_categorical_dtype(values.dtype): | ||
| codes = (values.cat.codes if isinstance(values, ABCSeries) | ||
| else values.codes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meaning that the check could also be done before (which I think is how it was before?)
something like
if dtype.categories is None:
codes, categories = ....
elif is_categorical_dtype(values):
.... handle this case
else:
codes = _get_codes_for_values(values, dtype.categories)
I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes vs values.codes ..., so my comment went a bit sideways :-))
b20fbe5 to
6df2ea0
Compare
doc/source/whatsnew/v0.22.0.txt
Outdated
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| - | ||
| - Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reference the issue (this PR is good, we also like to reference the original issue if you can find it).
The warnings for construction of a Categorical of the form (......) have been removed
| # sanitize input | ||
| if is_categorical_dtype(values): | ||
| if dtype.categories is None: | ||
| dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what kind of construction actually hits this?
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
what if dtype differs from the dtype of values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
Yes, it does... above (if dtype is not None). That is: dtype here is not necessarily the dtype argument to the function.
what if dtype differs from the dtype of values?
Then it is given priority - rightly so, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you add a comment to that effect here.
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not simply using dtype = values at this point (values is already a dtype)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not simply using dtype = values at this point (values is already a dtype)?
values can have a dtype...
is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?
Sure, whenever you pass values with given categories, unordered, and dtype with the same categories (and possibly more), ordered. Anyway, I added a couple of comments.
| # we're inferring from values | ||
| dtype = CategoricalDtype(categories, dtype.ordered) | ||
|
|
||
| elif is_categorical_dtype(values): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a repeat of the first if?? (this one actually looks good), wondering why it is hitting twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not a repeat, as here the actual 'codes' construction happens (before it was to check the categories)
pandas/core/categorical.py
Outdated
| # call us like that, so make some checks | ||
| # - the new one, where each value is also in the categories array | ||
| # (or np.nan) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some comments on the lines above that can be cleaned up (mentioning the checks for the warnings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done)
6df2ea0 to
3d8f173
Compare
| # sanitize input | ||
| if is_categorical_dtype(values): | ||
| if dtype.categories is None: | ||
| dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, can you add a comment to that effect here.
it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?
| # sanitize input | ||
| if is_categorical_dtype(values): | ||
| if dtype.categories is None: | ||
| dtype = CategoricalDtype(values.categories, dtype.ordered) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we not simply using dtype = values at this point (values is already a dtype)?
pandas/core/categorical.py
Outdated
| else: | ||
| elif not isinstance(values, (ABCIndexClass, ABCSeries)): | ||
|
|
||
| # on numpy < 1.6 datetimelike get inferred to all i8 by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this commnet (about numpy 1.6); also I think we can eliminate this logic (here or antother PR) as we don't care about old numpy any longer
pandas/tests/test_categorical.py
Outdated
|
|
||
| # Catch old style constructor useage: two arrays, codes + categories | ||
| # We can only catch two cases: | ||
| # Catches - now disabled - for old style constructor useage: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need the 'Catches - now disabled', this is not longer relevant to a reader of the code
3b335f7 to
79ea551
Compare
|
I tried to remove the call to |
doc/source/whatsnew/v0.22.0.txt
Outdated
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| - | ||
| - The warnings for construction of a ``Categorical`` in the form ``Categorical(codes, categories)`` have been removed (:issue:`8074`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just tweak this a bit, saying that when codes is an integer dtype; say this is in favor of Categorical.from_codes, otherwise lgtm. ping on green.
|
as far as #18022 (comment) |
79ea551 to
27978f2
Compare
27978f2 to
56656e7
Compare
|
@jreback ping |
|
thanks @toobaz nice PR! |
git diff upstream/master -u -- "*.py" | flake8 --diffThis removes some warnings which @jreback suggested to split from #17934 , plus some other useless, probably obsolete, lines of code.