Skip to content

Conversation

@lcspinter
Copy link
Contributor

The following casting exceptions were observed when joining two Iceberg tables on Tez:

  • java.math.BigDecimal cannot be cast to org.apache.hadoop.hive.common.type.HiveDecimal

  • java.time.OffsetDateTime cannot be cast to org.apache.hadoop.hive.common.type.Timestamp

  • java.nio.HeapByteBuffer cannot be cast to [B

  • java.time.LocalDate cannot be cast to org.apache.hadoop.hive.common.type.Date

Doesn't happen with Iceberg table - non-Iceberg table joins, or with order by queries on a single table.

return new Date((Date) o);
} else {
return o;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this be called to copy a non-Date object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tez, this is called with LocalDate. On MR we get Date, which is just a wrapper around LocalDate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not support both representations that are possibly passed in by Hive?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I see that you did.

joinTables("decimaltable", "decimal_col", Types.DecimalType.of(3, 1));
joinTables("timestamptable", "timestamp_col", Types.TimestampType.withZone());
joinTables("binarytable", "binary_col", Types.BinaryType.get());
joinTables("datetable", "date_col", Types.DateType.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rewrite this as a new test case with a list of types to test?

We consider it a best practice to start new test cases rather than adding to existing, complete cases. Each test method is run independently so you see more of the failures that way. By making longer test methods with more than one case, failures can prevent other tests from even running.

I think it would be fine to use a loop over types in a single new case, since most of the code is the same for these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rdblue for the review. I created a new test case.

@lcspinter lcspinter force-pushed the CDPD-18710 branch 2 times, most recently from f41d5fa to 217b0b2 Compare November 10, 2020 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants