-
Notifications
You must be signed in to change notification settings - Fork 3k
Fix casting issues when joining two Iceberg tables together on Tez #1740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| return new Date((Date) o); | ||
| } else { | ||
| return o; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this be called to copy a non-Date object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Tez, this is called with LocalDate. On MR we get Date, which is just a wrapper around LocalDate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not support both representations that are possibly passed in by Hive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I see that you did.
| joinTables("decimaltable", "decimal_col", Types.DecimalType.of(3, 1)); | ||
| joinTables("timestamptable", "timestamp_col", Types.TimestampType.withZone()); | ||
| joinTables("binarytable", "binary_col", Types.BinaryType.get()); | ||
| joinTables("datetable", "date_col", Types.DateType.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rewrite this as a new test case with a list of types to test?
We consider it a best practice to start new test cases rather than adding to existing, complete cases. Each test method is run independently so you see more of the failures that way. By making longer test methods with more than one case, failures can prevent other tests from even running.
I think it would be fine to use a loop over types in a single new case, since most of the code is the same for these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rdblue for the review. I created a new test case.
f41d5fa to
217b0b2
Compare
217b0b2 to
f667f64
Compare
The following casting exceptions were observed when joining two Iceberg tables on Tez:
java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal
java.time.OffsetDateTime cannot be cast to org.apache.hadoop.hive.common.type.Timestamp
java.nio.HeapByteBuffer cannot be cast to [B
java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date
Doesn't happen with Iceberg table - non-Iceberg table joins, or with order by queries on a single table.