Skip to content

Minor: Document timestamp with/without cast behavior#5826

Merged
tustvold merged 5 commits intoapache:masterfrom
alamb:alamb/cast_docs2
Jun 3, 2024
Merged

Minor: Document timestamp with/without cast behavior#5826
tustvold merged 5 commits intoapache:masterfrom
alamb:alamb/cast_docs2

Conversation

@alamb
Copy link
Copy Markdown
Contributor

@alamb alamb commented May 31, 2024

Which issue does this PR close?

Closes #.

Rationale for this change

The behavior of casting timestamps to/from timezones is quite subtle and I spent quite some time testing them out in the context of apache/datafusion#10602

Thus I thought it would be good to document this behavior in the arrow crate itself so I don't have to do that next time (and hopefully) others can benefit from it as well.

What changes are included in this PR?

Document, with examples, what happens when one casts timestamp with timezone to/from timestamp without timezone

Are there any user-facing changes?

Documentation. No changes to code

@github-actions github-actions Bot added the arrow Changes to the arrow crate label May 31, 2024
/// However, note that when casting from a timestamp with timezone BACK to a
/// timestamp without timezone the cast kernel does not adjust the values.
///
/// Thus round trip casting a timestamp without timezone to a timestamp with
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior that round trip casting the timestamp CHANGES the underlying timestamp I think caused a lot of confusion (at least for me) in the context of apache/datafusion#10602

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #5827 to discuss changing the behavior

@alamb
Copy link
Copy Markdown
Contributor Author

alamb commented May 31, 2024

CI integration is failing due to #5815

Copy link
Copy Markdown
Contributor

@Abdullahsab3 Abdullahsab3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for documenting this. Made the behavior much clearer. Minor remark

Comment thread arrow-cast/src/cast/mod.rs Outdated
Comment thread arrow-cast/src/cast/mod.rs
//! use arrow_array::types::Float64Type;
//! use arrow_array::cast::AsArray;
//!
//! # use arrow_array::*;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this just makes the existing example less verbose

Comment thread arrow-cast/src/cast/mod.rs Outdated
///
/// When casting from a timestamp without timezone to a timestamp with
/// timezone, the cast kernel treats the underlying timestamp values as being in
/// UTC and adjusts them to the provided timezone.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct, it interprets the timestamp as being in the destination timezone and then adjusts the value to UTC as required.

From the docs on DataType

One possibility is to assume that the original timestamp values are relative to the epoch of the timezone being set; timestamp values should then adjusted to the Unix epoch (for example, changing the timezone from empty to “Europe/Paris” would require converting the timestamp values from “Europe/Paris” to “UTC”, which seems counter-intuitive but is nevertheless correct).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Than you -- fixed

Comment thread arrow-cast/src/cast/mod.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants