Skip to content

Conversation

@Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Feb 20, 2020

  • closes convert_dtypes fails with int and str #32117
  • tests added / passed
    • added new cases for test_convert_dtypes
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry
    • placed in v1.0.2 whatsnew

@jorisvandenbossche jorisvandenbossche changed the title Fix for convert_dtypes with mix of int and string BUG: Fix for convert_dtypes with mix of int and string Feb 20, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.0.2 milestone Feb 20, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Small question

},
),
(
[1, 2.0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit hard to interpret below tests / diff, but does pd.Series([1, 2.0], dtype=object).convert_dtypes() still give Int64?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Here's the interpretation of the tests for that case:
The code reads:

            (
                [1, 2.0],
                object,
                {
                    ((True,), (True, False), (True,), (True, False)): "Int64",
                    ((True,), (True, False), (False,), (True, False)): np.dtype(
                        "float"
                    ),
                    ((False,), (True, False), (True, False), (True, False)): np.dtype(
                        "object"
                    ),
                },
            ),

This means the following:

  1. Create a Series with [1, 2.0] as the entries, with dtype object
  2. Consider the 16 possible combinations of the 4 arguments infer_objects, convert_string, convert_integer and convert_boolean
  3. If infer_objects==True and convert_integer==True, result should be Int64
  4. If infer_objects==True and convert_integer==False, result should be float
  5. If infer_objects==False, result is always object

Prior to this PR, the tests were as follows:
p3) If convert_integer==True, result should be Int64 independent of value of infer_objects
p4) If infer_objects==True and convert_integer==False, result should be float (same)
p5) If infer_objects==False and convert_integer==False, result is object

I think the new version is what we want the behavior to be, i.e., if you start with object and you don't do the infer-objects step, it remains an object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks

@jorisvandenbossche
Copy link
Member

@Dr-Irv Thanks!

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Feb 21, 2020
simonjayhawkins pushed a commit that referenced this pull request Feb 21, 2020
…tring (#32153)

Co-authored-by: Irv Lustig <irv@princeton.com>
roberthdevries pushed a commit to roberthdevries/pandas that referenced this pull request Mar 2, 2020
@Dr-Irv Dr-Irv deleted the issue32117 branch February 13, 2023 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

convert_dtypes fails with int and str

2 participants