Skip to content

Propose change timestamp casting with timezone to without timezones (also parsing of timestamps without timezones) #5827

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is in the context of implementing date_bin for timestamps with timezones: apache/datafusion#10602

I made #5826 to document the behavior of casting timestamps and I found it very confusing. Specifically when you cast from Timestamp(None) to Timestamp(Some(tz)) and then back to Timetamp(None) the underlying timestamp values are changed as shown in this example

use arrow_array::Int64Array;
use arrow_array::types::{TimestampSecondType};
use arrow_cast::{cast, display};
use arrow_array::cast::AsArray;
use arrow_schema::{DataType, TimeUnit};
let data_type  = DataType::Timestamp(TimeUnit::Second, None);
let data_type_tz = DataType::Timestamp(TimeUnit::Second, Some("-05:00".into()));
let a = Int64Array::from(vec![1_000_000_000, 2_000_000_000, 3_000_000_000]);
let b = cast(&a, &data_type).unwrap(); // cast to timestamp without timezone
let b = b.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_000_000, b.value(1)); // values are still the same

// Convert timestamps without a timezone to timestamps with a timezone
let c = cast(&b, &data_type_tz).unwrap();
let c = c.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_018_000, c.value(1)); // value has been adjusted by offset

// Convert from timestamp with timezone back to timestamp without timezone
let d = cast(&c, &data_type).unwrap();
let d = d.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_018_000, d.value(1)); // <---- **** THIS VALUE IS DIFFERENT THAN IT WAS INITITALLY
assert_eq!("2033-05-18T08:33:20", display::array_value_to_string(&d, 1).unwrap());

Thus I wanted to discuss if we should change the behavior to make it less surprising or if there was a reason to leave the current behavior

Describe the solution you'd like

I propose making casting timestamp with a timezone to timestamp without a timezone do the inverse of casting timestamp withpit a timezone to timestamp with a timezone

This would mean the final value of d in the above example is 2_000_000_000, not 2_000_018_000

Describe alternatives you've considered
Leave existing behavior

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions