-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-2026: [C++] Enforce use_deprecated_int96_timestamps to all time… #3173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2026: [C++] Enforce use_deprecated_int96_timestamps to all time… #3173
Conversation
|
As a note for reviewer, I could not track the exact cause (memory corruption) of the segfault cause in https://issues.apache.org/jira/browse/PARQUET-1274, which I reverted in this PR in order to implement the desired behavior. But I noted that there was a mismatch between the |
|
@fsaintjacques does the build fail here if you do not revert PARQUET-1274? I can look into that bug and try to fix it |
|
I'll just try to repro the segfault in that JIRA on this branch |
|
The segfault is not present anymore. If you want to trigger it, revert the schema.cc changes, then The top stack will be corrupted and meaningless, the real issue is the |
|
Got it. I read the comments in the JIRA and it makes sense (there was an invalid int64 -> int96 type cast happening). I'm going to quickly check the original example |
|
As @joshuastorck recommended, we might prefer a |
|
Yep, that would be it, I get a nullptr. |
|
Sounds good. Can you make all the writer type casts checked_cast here? |
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also confirmed that the failure in PARQUET-1274 isn't present with this patch so no issues reverting that patch
cpp/src/parquet/arrow/schema.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Must be mindful of parquet v1/v2 issues here -- @xhochy could you take a look at this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The python unit tests found another issue, I'll add a c++ test to catch this earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was fixed in GetTimestampMetadata, it's now closer to the original.
…stamps fields. This changed the behavior of use_deprecated_int96_timestamps to support all timestamp fields irregardless of the time unit. It would previously only apply this conversion to fields with Nanosecond resolution. People will only use this option when they use a system that only supports INT96 timestamps, systems that also support INT64 timestamps in other resolutions would not need the option. A notable API change is that this option now take precedence over the coerce_timestamps option.
7a88806 to
2897a72
Compare
Codecov Report
@@ Coverage Diff @@
## master #3173 +/- ##
==========================================
+ Coverage 86.39% 86.39% +<.01%
==========================================
Files 505 505
Lines 69612 69620 +8
==========================================
+ Hits 60140 60149 +9
+ Misses 9371 9370 -1
Partials 101 101
Continue to review full report at Codecov.
|
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Thank you!
|
BTW there's no need to squash your commits in a PR. It's actually better if you push additional commits because then I see e-mail notifications that the branch has been updated, which prompts me to review again. If you force push, I do not get a notification. The other way would be to comment asking me to look again =) |
|
Didn't know about force push and notifications! |
…stamps fields.
This changes the behavior of
use_deprecated_int96_timestampsto supportall timestamp fields irregardless of the time unit. It would previously
only apply this conversion to fields with Nanosecond resolution.
People will only use this option when they use a system that only
supports INT96 timestamps, systems that also support INT64 timestamps in
other resolutions would not need the option.
A notable API change is that this option now take precedence over the
coerce_timestamps option.