-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Closed
Labels
API DesignDatetimeDatetime data dtypeDatetime data dtypePerformanceMemory or execution speed performanceMemory or execution speed performance
Description
Say you have to parse some nicely ISO formatted date strings, you can just parse this with todatetime very fast. But if you were 'overcautious' and provided the format="%Y-%m-%d %H:%M:%S" for safety, this seems to be around 20 times slower.
Would it be possible to provide a fastpath for certain provided format strings (as already exists for %Y%m%d I think).
In [129]: s = pd.Series(pd.date_range('2000-01-01', periods=1000, freq='H'))
In [130]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y-%m-%dT%H:%M:%S.%f"))
In [131]: %timeit pd.to_datetime(s_as_dt_strings)
1000 loops, best of 3: 406 µs per loop
In [132]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y-%m-%dT%H:%M:%S.%f")
100 loops, best of 3: 9.73 ms per loop
In [133]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y-%m-%d %H:%M:%S"))
In [134]: %timeit pd.to_datetime(s_as_dt_strings)
1000 loops, best of 3: 361 µs per loop
In [135]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y-%m-%d %H:%M:%S")
100 loops, best of 3: 8.36 ms per loop
For non-standard formats, providing format does give a big improvement:
In [136]: s_as_dt_strings = s.apply(lambda x: x.strftime("%Y/%m/%d %H:%M:%S"))
In [137]: %timeit pd.to_datetime(s_as_dt_strings)
10 loops, best of 3: 92.2 ms per loop
In [138]: %timeit pd.to_datetime(s_as_dt_strings, format="%Y/%m/%d %H:%M:%S")
100 loops, best of 3: 9.08 ms per loop
Metadata
Metadata
Assignees
Labels
API DesignDatetimeDatetime data dtypeDatetime data dtypePerformanceMemory or execution speed performanceMemory or execution speed performance