Skip to content

Conversation

@phofl
Copy link
Member

@phofl phofl commented Dec 16, 2021

@phofl phofl added Categorical Categorical Data Type IO CSV read_csv, to_csv labels Dec 16, 2021
@jbrockmendel
Copy link
Member

i think the underlying problem is in Categorical.astype(object)

@phofl phofl marked this pull request as draft December 16, 2021 19:27
@phofl
Copy link
Member Author

phofl commented Dec 16, 2021

Was not sure if this was intentional in the astype call. Will have a look again

try:
new_cats = np.asarray(self.categories)
if is_datetime64_dtype(self.categories):
values = ensure_wrapped_if_datetimelike(np.asarray(self.categories))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.categories._values should be what you want here. thats probably also an improvement if is_datetime64tz_dtype(self.categories.dtype)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tz aware datetimes are alreday handled correctly

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was thinking it might be a perf improvement for dt64tz, but either way its fine to consider that out of scope for this PR

@phofl
Copy link
Member Author

phofl commented Dec 17, 2021

So I think I figured the issue out now.

Categorical.astype was incorrect for object dtype as you suggested. But this does not directly help us in the to_csv case, becaause this has to respect a date_format in to_native_types. We have to convert this to a DatetimeArray before doing the actual conversion.

@phofl phofl marked this pull request as ready for review December 17, 2021 14:34
@jreback jreback added this to the 1.4 milestone Dec 17, 2021
@jreback jreback merged commit 079289c into pandas-dev:master Dec 22, 2021
@jreback
Copy link
Contributor

jreback commented Dec 22, 2021

thanks @phofl

@phofl phofl deleted the 40754 branch December 22, 2021 09:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Categorical Categorical Data Type IO CSV read_csv, to_csv

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: to_csv produces improper output for categorical datetimes .astype('object') misbehaves on Categorical containing Timestamp

3 participants