Skip to content

Conversation

@jbrockmendel
Copy link
Member

Experimenting with #40489 I found that making Series inference behavior more like Index behavior broke a bunch of strings tests. This makes the strings code robust to that potential change by being more explicit about dtypes.

@simonjayhawkins simonjayhawkins added Refactor Internal refactoring of code Strings String extension data type and string data labels May 30, 2021
"""
result = self._data.array._str_startswith(pat, na=na)
return self._wrap_result(result, returns_string=False)
return self._wrap_result(result, returns_string=False, returns_bool=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just change _wrap_result so we don't have 2 keywords? maybe rename to returns_dtype and pass a string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats tough bc the end-result dtype depends on self._is_string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again it really doesnt look worth the trouble to try to combine these args

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think either keywords should really be necessary here, the issue is that the arrays passed when expand is True are object arrays of lists/tuples

which method returns bool, where the dtype of the result is not bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which method returns bool, where the dtype of the result is not bool?

makes sense, ill give this a try

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g isnumeric we can get object-dtype result of bools and nans

Copy link
Member

@simonjayhawkins simonjayhawkins Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the wrapped result would be object in this case? This is for a object dtype array? for StringArray/ArrowString array the return type should always be boolean. object dtype accessors do cast their results, except extract, so if the result is all bools and no nans, the result dtype should be np.bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the result dtype should be np.bool

the result to be wrapped dtype should be np.bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these conversions are handled in _str_map in ObjectStringArrayMixin


values = getattr(data, "values", data) # Series / Index
values = getattr(values, "categories", values) # categorical / normal
# TODO: avoid kludge for tests.extension.test_numpy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm cn you avoid this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can revert to use the getattr pattern, either way is an anti-pattern

@jbrockmendel
Copy link
Member Author

@simonjayhawkins was right, it was possible to remove the returns_bool keyword; green

@jreback jreback added this to the 1.3 milestone Jun 9, 2021
@jreback jreback merged commit 2c1e656 into pandas-dev:master Jun 9, 2021
@jbrockmendel jbrockmendel deleted the ref-string-casting branch June 9, 2021 00:34
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Refactor Internal refactoring of code Strings String extension data type and string data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants