Skip to content

[C++] Support for fractional seconds in strptime() #20146

@asfimport

Description

@asfimport

Currently, we can't parse "our own" string representation of a timestamp array with the timestamp parser strptime:

import datetime
import pyarrow as pa
import pyarrow.compute as pc

>>> pa.array([datetime.datetime(2022, 3, 5, 9)])
<pyarrow.lib.TimestampArray object at 0x7f00c1d53dc0>
[
  2022-03-05 09:00:00.000000
]

# trying to parse the above representation as string
>>> pc.strptime(["2022-03-05 09:00:00.000000"], format="%Y-%m-%d %H:%M:%S", unit="us")
...
ArrowInvalid: Failed to parse string: '2022-03-05 09:00:00.000000' as a scalar of type timestamp[us]

The reason for this is the fractional second part, so the following works:

>>> pc.strptime(["2022-03-05 09:00:00"], format="%Y-%m-%d %H:%M:%S", unit="us")
<pyarrow.lib.TimestampArray object at 0x7f00c1d6f940>
[
  2022-03-05 09:00:00.000000
]

Now, I think the reason that this fails is because strptime only supports parsing seconds as an integer (https://man7.org/linux/man-pages/man3/strptime.3.html).

But, it creates a strange situation where the timestamp parser cannot parse the representation we use for timestamps.

In addition, for CSV we have a custom ISO parser (used by default), so when parsing the strings while reading a CSV file, the same string with fractional seconds does work:

s = b"""a
2022-03-05 09:00:00.000000"""

from pyarrow import csv

>>> csv.read_csv(io.BytesIO(s))
pyarrow.Table
a: timestamp[ns]
----
a: [[2022-03-05 09:00:00.000000000]]

I realize that you can use the generic "cast" for doing this string parsing:

>>> pc.cast(["2022-03-05 09:00:00.000000"], pa.timestamp("us"))
<pyarrow.lib.TimestampArray object at 0x7f00c1d53d60>
[
  2022-03-05 09:00:00.000000
]

But this was not the first way I thought about (I think it is quite typical to first think of strptime, and it is confusing that that doesn't work; the error message is also not helpful)
cc @pitrou @rok

Reporter: Joris Van den Bossche / @jorisvandenbossche
Watchers: Rok Mihevc / @rok

Related issues:

Note: This issue was originally created as ARROW-15883. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions