-
Notifications
You must be signed in to change notification settings - Fork 4k
GH-35138: [Java] Fix ArrowFlightJdbcTimeStampVectorAccessor to deal with calendar #35139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -36,7 +36,7 @@ private DateTimeUtils() { | |||
| } | ||||
|
|
||||
| /** | ||||
| * Subtracts given Calendar's TimeZone offset from epoch milliseconds. | ||||
| * Adds given Calendar's TimeZone offset from epoch milliseconds. | ||||
| */ | ||||
| public static long applyCalendarOffset(long milliseconds, Calendar calendar) { | ||||
| if (calendar == null) { | ||||
|
|
@@ -47,7 +47,7 @@ public static long applyCalendarOffset(long milliseconds, Calendar calendar) { | |||
| final TimeZone defaultTz = TimeZone.getDefault(); | ||||
|
|
||||
| if (tz != defaultTz) { | ||||
| milliseconds -= tz.getOffset(milliseconds) - defaultTz.getOffset(milliseconds); | ||||
| milliseconds += tz.getOffset(milliseconds) - defaultTz.getOffset(milliseconds); | ||||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that I made this line of change in order to fix the test failures of this case: Line 292 in eaa1a1e
@Test
public void testShouldGetStringBeConsistentWithVarCharAccessorWithCalendar() throws Exception {
// Ignore for TimeStamp vectors with TZ, as VarChar accessor won't consider their TZ
Assume.assumeTrue(
vector instanceof TimeStampNanoVector || vector instanceof TimeStampMicroVector ||
vector instanceof TimeStampMilliVector || vector instanceof TimeStampSecVector);
Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone(AMERICA_VANCOUVER));
assertGetStringIsConsistentWithVarCharAccessor(calendar);
}
private void assertGetStringIsConsistentWithVarCharAccessor(Calendar calendar) throws Exception {
try (VarCharVector varCharVector = new VarCharVector("",
rootAllocatorTestRule.getRootAllocator())) {
varCharVector.allocateNew(1);
ArrowFlightJdbcVarCharVectorAccessor varCharVectorAccessor =
new ArrowFlightJdbcVarCharVectorAccessor(varCharVector, () -> 0, (boolean wasNull) -> {
});
accessorIterator.iterate(vector, (accessor, currentRow) -> {
final String string = accessor.getString();
varCharVector.set(0, new Text(string));
varCharVector.setValueCount(1);
Timestamp timestampFromVarChar = varCharVectorAccessor.getTimestamp(calendar);
Timestamp timestamp = accessor.getTimestamp(calendar);
collector.checkThat(timestamp, is(timestampFromVarChar));
collector.checkThat(accessor.wasNull(), is(false));
});
}
}
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the test above, the user-supplied timezone is
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To summarize, we have two bugs for now.
For the fix of jdbc implementation of getDate/getTimestamp with a calendar as input, I’d propose following:
IIUC, the timezone info in your derby example comes from the SimpleDateFormat but not the resultset itself. WDYT? @lidavidm
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
That's midnight in Vancouver expressed as GMT so it's consistent with the other case described above.
It seems unintuitive, but at least consistent: it will take that date at midnight in the given time zone, and express it in GMT, which seems to match the docstring of java.sql.Date.
That may also work, but I suppose if the user is explicitly passing a Calendar, the current behavior is what they actually want? This is more a flaw of the original Java date/time APIs in that Date itself doesn't carry the timezone so the display value is weird.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right, at least these behaviors are consistent to each other. It is unintuitive since timestamp values are always returned in the GMT regardless of what calendar is supplied. I cannot say this is wrong because it seems to be aligned with the SQL There was a discussion in the Hadoop community to work on the consistency of timestamp semantics across different SQL-on-Hadoop engines. It proposed a
Most of the SQL engines in the aforementioned doc now support Please feel free to close this PR if you think the current behavior is not an issue, or involve more people to the discussion if a fix is required. Thanks for the discussion in detail!
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for bearing with me. The mess of date/time APIs here is confusing... For Arrow: So the question here is still the semantics of I do see things like this: https://stackoverflow.com/questions/9202857/timezones-in-sql-date-vs-java-sql-date
So that also seems to match the existing behavior, by my reading. I looked at Postgres's JDBC driver: and
I think that also matches with the current behavior, as unintuitive as it is.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the additional links! It is really a notorious topic :) |
||||
| } | ||||
|
|
||||
| return milliseconds; | ||||
|
|
||||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -510,14 +510,23 @@ public void testShouldGetDateReturnValidDateWithCalendar() throws Exception { | |||
| Text value = new Text("2021-07-02"); | ||||
| when(getter.get(0)).thenReturn(value.copyBytes()); | ||||
|
|
||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo")); | ||||
| Date result = accessor.getDate(calendar); | ||||
| { | ||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo")); | ||||
| Date result = accessor.getDate(calendar); | ||||
| calendar.setTime(result); | ||||
|
|
||||
| calendar = Calendar.getInstance(TimeZone.getTimeZone("Etc/UTC")); | ||||
| calendar.setTime(result); | ||||
| collector.checkThat(dateTimeFormat.format(calendar.getTime()), | ||||
| equalTo("2021-07-01T21:00:00.000Z")); | ||||
|
Comment on lines
+518
to
+519
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK. So the assumption is that the underlying value is in UTC, and are converting to the destination time zone? Derby does something different, though: SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSXXX");
try (final Connection connection = DriverManager.getConnection("jdbc:derby:memory:foo;create=true")) {
try (final Statement statement = connection.createStatement()) {
statement.executeUpdate("CREATE TABLE foo (bar DATE)");
statement.executeUpdate("INSERT INTO foo (bar) VALUES '2021-07-02'");
try (final ResultSet rs = statement.executeQuery("SELECT * FROM foo")) {
assertThat(rs.next()).isTrue();
Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo"));
final Date date = rs.getDate(1, calendar);
System.out.println(date);
calendar.setTime(date);
System.out.println(sdf.format(calendar.getTime()));
}
}
}Derby appears to just attach the timezone to the value, instead of adjusting it from an assumed source time zone. The output is:
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It depends on the whether the timezone field is set in the TimeStampVector.
If user does not request a different timezone, then the result is good. Otherwise the problem emerges.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This seems a bit off though, since we don't know the timezone in that case specifically. (Not that the current behavior seems any better!) It seems like the right behavior here would be to either treat this as an error, or behave like Derby and treat the user-supplied calendar to mean, "assume the timestamp is a timestamp as viewed in this time zone".
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is the code where it assumes UTC if timezone is missing from the vector: Line 180 in eaa1a1e
Yes, I agree with you that we shouldn't blindly assume timezone to be UTC if unset. In Apache Hive, this usually means timestamp without timezone where the values are assumed to be in the default timezone in the JVM (which is the local timezone in the usual case).
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IMO, any behavior change may need a discussion as it will affect downstream users. This patch only focuses on the bug fix.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair, though arguably this is replacing buggy behavior with buggy behavior 🙂 Let's put a caveat in the documentation page for the driver? We should consider aligning with what Derby appears to do, too. Either that, or check a few other drivers to see the expected behavior here. I'm still not sure the conversion here is correct, though. From the docs, java.sql.Date is the millisecond value (in GMT) of midnight of a date in some time zone. So if we have the date "2021-07-02" in America/Sao_Paulo, then the GMT timestamp should be 2021-07-02 03:00. So the new behavior doesn't seem correct and the old behavior is what's expected. Or if we are assuming that the date is "2021-07-02" in UTC, and converting that to Sao Paulo, that makes no sense - Date isn't supposed to store a time, so we shouldn't treat it as one.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Therefore I agree with you that we should assume the underlying values we are reading do not have timezone information and the behavior would be much easier to understand if we simply attach the user-supplied timezone to the values to return. The original buggy code that this PR tries to solve actually is not here. The modification here was a result of that fix. Please see my other comments above. Let me double check if there is something wrong.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Right, I'm just trying to use this to understand what's happening here. |
||||
| } | ||||
|
|
||||
| collector.checkThat(dateTimeFormat.format(calendar.getTime()), | ||||
| equalTo("2021-07-02T03:00:00.000Z")); | ||||
| { | ||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("Etc/UTC")); | ||||
| Date result = accessor.getDate(calendar); | ||||
| calendar.setTime(result); | ||||
|
|
||||
| collector.checkThat(dateTimeFormat.format(calendar.getTime()), | ||||
| equalTo("2021-07-02T00:00:00.000Z")); | ||||
| } | ||||
| } | ||||
|
|
||||
| @Test | ||||
|
|
@@ -547,13 +556,21 @@ public void testShouldGetTimeReturnValidDateWithCalendar() throws Exception { | |||
| Text value = new Text("02:30:00"); | ||||
| when(getter.get(0)).thenReturn(value.copyBytes()); | ||||
|
|
||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo")); | ||||
| Time result = accessor.getTime(calendar); | ||||
| { | ||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo")); | ||||
| Time result = accessor.getTime(calendar); | ||||
| calendar.setTime(result); | ||||
|
|
||||
| calendar = Calendar.getInstance(TimeZone.getTimeZone("Etc/UTC")); | ||||
| calendar.setTime(result); | ||||
| collector.checkThat(timeFormat.format(calendar.getTime()), equalTo("23:30:00.000Z")); | ||||
| } | ||||
|
|
||||
| collector.checkThat(timeFormat.format(calendar.getTime()), equalTo("05:30:00.000Z")); | ||||
| { | ||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("Etc/UTC")); | ||||
| Time result = accessor.getTime(calendar); | ||||
| calendar.setTime(result); | ||||
|
|
||||
| collector.checkThat(timeFormat.format(calendar.getTime()), equalTo("02:30:00.000Z")); | ||||
| } | ||||
| } | ||||
|
|
||||
| @Test | ||||
|
|
@@ -584,14 +601,23 @@ public void testShouldGetTimestampReturnValidDateWithCalendar() throws Exception | |||
| Text value = new Text("2021-07-02 02:30:00.000"); | ||||
| when(getter.get(0)).thenReturn(value.copyBytes()); | ||||
|
|
||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo")); | ||||
| Timestamp result = accessor.getTimestamp(calendar); | ||||
| { | ||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("America/Sao_Paulo")); | ||||
| Timestamp result = accessor.getTimestamp(calendar); | ||||
| calendar.setTime(result); | ||||
|
|
||||
| calendar = Calendar.getInstance(TimeZone.getTimeZone("Etc/UTC")); | ||||
| calendar.setTime(result); | ||||
| collector.checkThat(dateTimeFormat.format(calendar.getTime()), | ||||
| equalTo("2021-07-01T23:30:00.000Z")); | ||||
| } | ||||
|
|
||||
| collector.checkThat(dateTimeFormat.format(calendar.getTime()), | ||||
| equalTo("2021-07-02T05:30:00.000Z")); | ||||
| { | ||||
| Calendar calendar = Calendar.getInstance(TimeZone.getTimeZone("Etc/UTC")); | ||||
| Timestamp result = accessor.getTimestamp(calendar); | ||||
| calendar.setTime(result); | ||||
|
|
||||
| collector.checkThat(dateTimeFormat.format(calendar.getTime()), | ||||
| equalTo("2021-07-02T02:30:00.000Z")); | ||||
| } | ||||
| } | ||||
|
|
||||
| private void assertGetBoolean(Text value, boolean expectedResult) throws SQLException { | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original buggy code I want to solve is here. It tries to convert local date time from vector timezone to user-supplied timezone. We should add the timezone offset but not subtract it. A number of test cases need to be fixed as their expected values are computed with the same issue.