Remove old warnings (plus some useless code) #18022

toobaz · 2017-10-29T21:44:41Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff

This removes some warnings which @jreback suggested to split from #17934 , plus some other useless, probably obsolete, lines of code.

jreback · 2017-10-29T22:53:33Z

I guess we never put these deprecations in #6581

jreback · 2017-10-29T22:55:22Z

pandas/core/categorical.py

-                ordered = values.ordered
-            if categories is None:
-                categories = values.categories
            values = values.get_values()


how does this get hit? its prob pretty inefficient (as converting to an array then back), we already catch the dtype conversion above (which is why you can remove this code). maybe try to remove this part as well?

jreback · 2017-10-29T22:55:48Z

pandas/core/categorical.py

-            # after 0.18/ in 2016
-            if (is_integer_dtype(values) and
-                    not is_integer_dtype(dtype.categories)):
-                warn("Values and categories have different dtypes. Did you "


do these get hit anywhere in the tests?

fine with removing them. pls add a note in the deprecations removal section.

codecov · 2017-10-29T23:00:50Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a355ed2). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #18022   +/-   ##
=========================================
  Coverage          ?   91.22%           
=========================================
  Files             ?      163           
  Lines             ?    50089           
  Branches          ?        0           
=========================================
  Hits              ?    45693           
  Misses            ?     4396           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.03% <100%> (?)`
#single	`40.25% <50%> (?)`

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.74% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...754ced2. Read the comment docs.

codecov · 2017-10-29T23:00:55Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a355ed2). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #18022   +/-   ##
=========================================
  Coverage          ?   91.22%           
=========================================
  Files             ?      163           
  Lines             ?    50089           
  Branches          ?        0           
=========================================
  Hits              ?    45693           
  Misses            ?     4396           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.03% <100%> (?)`
#single	`40.25% <50%> (?)`

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.74% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...754ced2. Read the comment docs.

codecov · 2017-10-29T23:00:58Z

Codecov Report

❗ No coverage uploaded for pull request base (master@a355ed2). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #18022   +/-   ##
=========================================
  Coverage          ?   91.22%           
=========================================
  Files             ?      163           
  Lines             ?    50089           
  Branches          ?        0           
=========================================
  Hits              ?    45693           
  Misses            ?     4396           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`89.03% <100%> (?)`
#single	`40.25% <50%> (?)`

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.74% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a355ed2...754ced2. Read the comment docs.

codecov · 2017-10-29T23:01:08Z

Codecov Report

Merging #18022 into master will decrease coverage by 0.04%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #18022      +/-   ##
==========================================
- Coverage   91.43%   91.39%   -0.05%     
==========================================
  Files         163      163              
  Lines       50091    50083       -8     
==========================================
- Hits        45800    45771      -29     
- Misses       4291     4312      +21

Flag	Coverage Δ
#multiple	`89.19% <100%> (-0.03%)`	⬇️
#single	`40.36% <37.5%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/categorical.py	`95.75% <100%> (-0.05%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/plotting/_converter.py	`63.38% <0%> (-1.82%)`	⬇️
pandas/core/frame.py	`97.8% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 96a5274...56656e7. Read the comment docs.

toobaz · 2017-10-30T10:02:21Z

pandas/core/categorical.py

    from pandas.core.algorithms import _get_data_algo, _hashtables
+    if is_categorical_dtype(values.dtype):
+        codes = (values.cat.codes if isinstance(values, ABCSeries)
+                 else values.codes)


There must be a cleaner way... but I couldn't find it.

Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?

meaning that the check could also be done before (which I think is how it was before?)

something like

if dtype.categories is None: codes, categories = .... elif is_categorical_dtype(values): .... handle this case else: codes = _get_codes_for_values(values, dtype.categories)

I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes vs values.codes ..., so my comment went a bit sideways :-))

jorisvandenbossche · 2017-10-30T12:44:54Z

doc/source/whatsnew/v0.21.1.txt

 ~~~~~~~~~~~~

-
+- Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed.


Can you move to the "Removal of prior version deprecations/changes" section (in 0.22.0)

jorisvandenbossche · 2017-10-30T12:45:18Z

pandas/core/categorical.py

-                categories = values.categories
-            values = values.get_values()
+            if dtype.categories is None:
+                dtype = CategoricalDtype(values.categories, dtype.ordered)


some lines above, there is already a

elif is_categorical(values): dtype = values.dtype._from_categorical_dtype(values.dtype, categories, ordered)

which should catch the same ?

Yes... but instead testing dtype.categories must be done here (that is, after all the branches of the above if...elif...else). Viceversa, the dtype must be defined before the fastpath test.

OK, but help me understand in what case this line is needed and is doing something different as the already defined dtype

For instance the case in which values is already a Categorical, categories is None and dtype is "category".

and in that case

elif is_categorical(values): dtype = values.dtype._from_categorical_dtype(values.dtype, categories, ordered)

will already have created an appropriate dtype ?
So I still don't understand why in such a case it would need to be re-determined.

will already have created an appropriate dtype ?

No, precisely because categories is None. Not just in theory: you can actually try and verify that tests break.
It could be possible to first have a different set of checks which finds the right value for categories, but I doubt it would result in simpler code.

Assume this small example:

In [1]: values = pd.Categorical(['a', 'b', 'c', 'a']) In [2]: values Out[2]: [a, b, c, a] Categories (3, object): [a, b, c] In [4]: categories = None In [5]: dtype = 'category' In [9]: ordered = None In [10]: dtype = values.dtype._from_categorical_dtype(values.dtype, categories, ordered) In [11]: dtype Out[11]: CategoricalDtype(categories=['a', 'b', 'c'], ordered=False)

So if you pass categorical values, it will by definition have categories (can a Categorical have categories of None?), and those will be passed to the resulting dtype object returned from values.dtype._from_categorical_dtype.
So still wondering, in what case can you have categorical values, but where the resulting dtype from above still has dtype.categories is None?

So when removing that check, it is the case where a CategoricalDType instance is passed that differs from the dtype of the values:

dtype = CategoricalDtype(None, ordered=True) values = Categorical(['a', 'b', 'd']) Categorical(values, dtype=dtype)

I would find that more logical to handle in the if dtype is not None: path (to clearly see which takes precedence in such a case). But I assume that when fastpath=True we don't want to check that. Although I think that if you use that, dtype.categories will never be None.

jorisvandenbossche · 2017-10-30T12:50:57Z

pandas/core/categorical.py

    from pandas.core.algorithms import _get_data_algo, _hashtables
+    if is_categorical_dtype(values.dtype):
+        codes = (values.cat.codes if isinstance(values, ABCSeries)
+                 else values.codes)


Or the question could be: why does this get passed values that are already categorical and don't need a factorize but a recode ?

jorisvandenbossche · 2017-10-30T12:54:38Z

pandas/core/categorical.py

    from pandas.core.algorithms import _get_data_algo, _hashtables
+    if is_categorical_dtype(values.dtype):
+        codes = (values.cat.codes if isinstance(values, ABCSeries)
+                 else values.codes)


meaning that the check could also be done before (which I think is how it was before?)

something like

if dtype.categories is None: codes, categories = .... elif is_categorical_dtype(values): .... handle this case else: codes = _get_codes_for_values(values, dtype.categories)

I personally would find that easier to follow the logic (and this does not necessarily mean you don't need the values.cat.codes vs values.codes ..., so my comment went a bit sideways :-))

jreback · 2017-10-31T00:20:23Z

doc/source/whatsnew/v0.22.0.txt

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-
+- Warnings for deprecated initialization style of ``Categorical`` (``Categorical(codes, categories)``) have been removed.


reference the issue (this PR is good, we also like to reference the original issue if you can find it).

The warnings for construction of a Categorical of the form (......) have been removed

jreback · 2017-10-31T00:22:40Z

pandas/core/categorical.py

        # sanitize input
        if is_categorical_dtype(values):
+            if dtype.categories is None:
+                dtype = CategoricalDtype(values.categories, dtype.ordered)


what kind of construction actually hits this?

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?
what if dtype differs from the dtype of values?

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?

Yes, it does... above (if dtype is not None). That is: dtype here is not necessarily the dtype argument to the function.

what if dtype differs from the dtype of values?

Then it is given priority - rightly so, I think.

ok, can you add a comment to that effect here.

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?

is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?

why are we not simply using dtype = values at this point (values is already a dtype)?

why are we not simply using dtype = values at this point (values is already a dtype)?

values can have a dtype...

is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?

Sure, whenever you pass values with given categories, unordered, and dtype with the same categories (and possibly more), ordered. Anyway, I added a couple of comments.

jreback · 2017-10-31T00:23:26Z

pandas/core/categorical.py

+            # we're inferring from values
+            dtype = CategoricalDtype(categories, dtype.ordered)
+
+        elif is_categorical_dtype(values):


is this a repeat of the first if?? (this one actually looks good), wondering why it is hitting twice.

it's not a repeat, as here the actual 'codes' construction happens (before it was to check the categories)

jorisvandenbossche · 2017-10-31T09:19:48Z

pandas/core/categorical.py

            #   call us like that, so make some checks
            # - the new one, where each value is also in the categories array
            #   (or np.nan)



there are some comments on the lines above that can be cleaned up (mentioning the checks for the warnings)

jreback · 2017-11-10T19:23:14Z

pandas/core/categorical.py

        # sanitize input
        if is_categorical_dtype(values):
+            if dtype.categories is None:
+                dtype = CategoricalDtype(values.categories, dtype.ordered)


ok, can you add a comment to that effect here.

it is a bit inconsistent, e.g. if values is a Categorical then it determines the categories but NOT the ordered?

is there a case which hits this where the passed dtype has a different ordered than values.ordered? (one is False, one is True). do/should we check this?

jreback · 2017-11-10T19:25:09Z

pandas/core/categorical.py

        # sanitize input
        if is_categorical_dtype(values):
+            if dtype.categories is None:
+                dtype = CategoricalDtype(values.categories, dtype.ordered)


why are we not simply using dtype = values at this point (values is already a dtype)?

jreback · 2017-11-10T19:26:03Z

pandas/core/categorical.py

-        else:
+        elif not isinstance(values, (ABCIndexClass, ABCSeries)):

            # on numpy < 1.6 datetimelike get inferred to all i8 by


you can remove this commnet (about numpy 1.6); also I think we can eliminate this logic (here or antother PR) as we don't care about old numpy any longer

jreback · 2017-11-10T19:27:09Z

pandas/tests/test_categorical.py


-        # Catch old style constructor useage: two arrays, codes + categories
-        # We can only catch two cases:
+        # Catches - now disabled - for old style constructor useage:


you don't need the 'Catches - now disabled', this is not longer relevant to a reader of the code

toobaz · 2017-11-11T07:39:48Z

I tried to remove the call to maybe_infer_to_datetimelike, but some tests break. In another PR I can see if we can fix this inside _sanitize_array.

jreback · 2017-11-11T13:19:36Z

doc/source/whatsnew/v0.22.0.txt

 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-
+- The warnings for construction of a ``Categorical`` in the form ``Categorical(codes, categories)`` have been removed (:issue:`8074`)


just tweak this a bit, saying that when codes is an integer dtype; say this is in favor of Categorical.from_codes, otherwise lgtm. ping on green.

jreback · 2017-11-11T13:20:38Z

as far as #18022 (comment)
e.g. trying to remove some sanitizing code, would take that as a followup

toobaz · 2017-11-12T13:06:33Z

@jreback ping

jreback · 2017-11-12T15:22:38Z

thanks @toobaz nice PR!

jreback reviewed Oct 29, 2017

View reviewed changes

jreback added Categorical Categorical Data Type Deprecate Functionality to remove in pandas labels Oct 29, 2017

toobaz force-pushed the remove_old_warnings branch from 754ced2 to b20fbe5 Compare October 30, 2017 08:56

toobaz commented Oct 30, 2017

View reviewed changes

jorisvandenbossche reviewed Oct 30, 2017

View reviewed changes

toobaz force-pushed the remove_old_warnings branch from b20fbe5 to 6df2ea0 Compare October 30, 2017 14:37

jreback requested changes Oct 31, 2017

View reviewed changes

jorisvandenbossche reviewed Oct 31, 2017

View reviewed changes

toobaz force-pushed the remove_old_warnings branch from 6df2ea0 to 3d8f173 Compare November 10, 2017 18:46

jreback requested changes Nov 10, 2017

View reviewed changes

toobaz force-pushed the remove_old_warnings branch 2 times, most recently from 3b335f7 to 79ea551 Compare November 11, 2017 00:42

jreback approved these changes Nov 11, 2017

View reviewed changes

jreback added this to the 0.22.0 milestone Nov 11, 2017

toobaz added 2 commits November 11, 2017 19:12

WARN: remove obsolete warnings

05efd95

REF: remove useless code (dtype has info about ordering)

0f63cb2

toobaz force-pushed the remove_old_warnings branch from 79ea551 to 27978f2 Compare November 11, 2017 18:13

CLN: Explanatory comments

56656e7

toobaz force-pushed the remove_old_warnings branch from 27978f2 to 56656e7 Compare November 12, 2017 12:23

jreback merged commit aebe2a9 into pandas-dev:master Nov 12, 2017

toobaz deleted the remove_old_warnings branch November 12, 2017 16:02

toobaz mentioned this pull request Nov 12, 2017

DEPR: Remove old-style warnings Categorical init #17485

Closed

No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017

Remove old warnings (plus some useless code) (pandas-dev#18022)

3eb4e41

Uh oh!

Remove old warnings (plus some useless code) #18022

Remove old warnings (plus some useless code) #18022

Uh oh!

Conversation

toobaz commented Oct 29, 2017

Uh oh!

jreback commented Oct 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 29, 2017

Codecov Report

Uh oh!

codecov bot commented Oct 29, 2017

Codecov Report

Uh oh!

codecov bot commented Oct 29, 2017

Codecov Report

Uh oh!

codecov bot commented Oct 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toobaz Oct 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toobaz Nov 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 29, 2017 •

edited

Loading

toobaz Oct 30, 2017 •

edited

Loading

toobaz Nov 10, 2017 •

edited

Loading