Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is in the context of implementing date_bin for timestamps with timezones: apache/datafusion#10602
I made #5826 to document the behavior of casting timestamps and I found it very confusing. Specifically when you cast from Timestamp(None) to Timestamp(Some(tz)) and then back to Timetamp(None) the underlying timestamp values are changed as shown in this example
use arrow_array::Int64Array;
use arrow_array::types::{TimestampSecondType};
use arrow_cast::{cast, display};
use arrow_array::cast::AsArray;
use arrow_schema::{DataType, TimeUnit};
let data_type = DataType::Timestamp(TimeUnit::Second, None);
let data_type_tz = DataType::Timestamp(TimeUnit::Second, Some("-05:00".into()));
let a = Int64Array::from(vec![1_000_000_000, 2_000_000_000, 3_000_000_000]);
let b = cast(&a, &data_type).unwrap(); // cast to timestamp without timezone
let b = b.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_000_000, b.value(1)); // values are still the same
// Convert timestamps without a timezone to timestamps with a timezone
let c = cast(&b, &data_type_tz).unwrap();
let c = c.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_018_000, c.value(1)); // value has been adjusted by offset
// Convert from timestamp with timezone back to timestamp without timezone
let d = cast(&c, &data_type).unwrap();
let d = d.as_primitive::<TimestampSecondType>(); // downcast to result type
assert_eq!(2_000_018_000, d.value(1)); // <---- **** THIS VALUE IS DIFFERENT THAN IT WAS INITITALLY
assert_eq!("2033-05-18T08:33:20", display::array_value_to_string(&d, 1).unwrap());
Thus I wanted to discuss if we should change the behavior to make it less surprising or if there was a reason to leave the current behavior
Describe the solution you'd like
I propose making casting timestamp with a timezone to timestamp without a timezone do the inverse of casting timestamp withpit a timezone to timestamp with a timezone
This would mean the final value of d in the above example is 2_000_000_000, not 2_000_018_000
Describe alternatives you've considered
Leave existing behavior
Additional context
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is in the context of implementing
date_binfor timestamps with timezones: apache/datafusion#10602I made #5826 to document the behavior of casting timestamps and I found it very confusing. Specifically when you cast from
Timestamp(None)toTimestamp(Some(tz))and then back toTimetamp(None)the underlying timestamp values are changed as shown in this exampleThus I wanted to discuss if we should change the behavior to make it less surprising or if there was a reason to leave the current behavior
Describe the solution you'd like
I propose making
casting timestamp with a timezone to timestamp without a timezonedo the inverse ofcasting timestamp withpit a timezone to timestamp with a timezoneThis would mean the final value of d in the above example is
2_000_000_000, not2_000_018_000Describe alternatives you've considered
Leave existing behavior
Additional context