Skip to content

Support "standard" / alternate format arguments for to_timestamp #8915

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

After #8886 (thanks to @Omega359) DataFusion supports converting strings to timestamps using a string format:

SELEECT to_timestamp('2020-09-08T12:00:00+00:00', '2020-09-08 12/00/00+00:00', '%c', '%+', '%Y-%m-%d %H/%M/%s%#z'

Which will parse '2020-09-08T12:00:00+00:00' with several possible formats %c', '%+', '%Y-%m-%d

However, as @comphead points out, the format used is specific to chrono , the underlying Rust library used. These are slightly different semantics than any existing to_timestamp (it isn't postgres format strings, nor is it spark format strings, it is something datafusion specific based on the rust chrono format strings)

Describe the solution you'd like

Ideally users could decide what "dialect" of string format specifiers they wanted to support based on configuration option. For example, either postgres or spark,

However, this is non trivial given the scope of those two implementations

Describe alternatives you've considered

Users can always use DataFusion's user defined functions to define the semantics they want, for example with a ScalarUDF that rewrites the specified time string from a postgres format into the chrono format

(though there are likely all sorts of corner cases -- see #8886 (comment))

Additional context

@jhorstmann has notes about Postgres: #5398 (comment)
@Omega359 notes that the spark format library is entirely different still: #5398 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions