-
Notifications
You must be signed in to change notification settings - Fork 3k
Flink: add toString, equals, hashCode overrides for RowDataProjection. #7493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@hililiwei @chenjunjiedada @yegangy0718 can you help review? |
127c82a to
955b8ec
Compare
| private final Schema icebergSchema = | ||
| new Schema( | ||
| Types.NestedField.required(1, "partition_field", Types.StringType.get()), | ||
| Types.NestedField.required(1, "row_id", Types.StringType.get()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renaming the field id to also match what the RowDataProjection usage (like idOnly projected schema)
flink/v1.17/flink/src/main/java/org/apache/iceberg/flink/data/RowDataProjection.java
Show resolved
Hide resolved
| for (int pos = 0; pos < getArity(); pos++) { | ||
| if (!isNullAt(pos)) { | ||
| // Arrays.deepHashCode handles array object properly | ||
| result = 31 * result + Arrays.deepHashCode(new Object[] {getValue(pos)}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious, when calculating hashcode, do we need to take Map/List as a special case? Will Arrays.deepHashCode be able to handle that?
Or it's because This projection will not project the nested children types of repeated types like lists and maps, so we can ignore them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so sure, but I prefer the first one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yegangy0718 Java List and Map objects implements hashCode properly. here we are just leveraging java.util.Arrays to handle the hashCode for array type field.
@hililiwei what do you mean by preferring the first one?
| @Override | ||
| public RowKind getRowKind() { | ||
| return rowData.getRowKind(); | ||
| // rowData can be null for nested struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dug out this issue: #2738
If the nested struct is null, right now both StructProjection and RowDataProjection returns a Projection object wrapping a null struct. I actually think it is probably better to just return a null Projection object in this case.
Let me create a separate issue to follow up on this and see if the community agrees with the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created a new issue to discuss how to handle null nested struct. #7507
| import org.apache.avro.SchemaBuilder; | ||
| import org.apache.avro.generic.GenericData; | ||
| import org.apache.avro.util.Utf8; | ||
| import org.apache.commons.lang3.NotImplementedException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we use UnsupportedOperationException in other places, should we keep consistency?
e57a252 to
c11cec3
Compare
hililiwei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late reply, it looks good.
Flink RowData implementation classes (like GenericRowData) all implement proper overrides for those methods. We also intend to use RowDataProjection as map key for MapDataStatistics from sink shuffling work. Hence this change is also required.
c11cec3 to
70dd3a5
Compare
…merged. added Preconditions check for non-null root oject, as RowDataProjection never allowed it.
ed74599 to
674ec82
Compare
|
@pvary @hililiwei @yegangy0718 can you take another look? I made small adjustment in the last commit after rebased with PR #7517 . |
…des for RowDataProjection
…r RowDataProjection (#7631)

Flink RowData implementation classes (like GenericRowData) all implement proper overrides for those methods. We also intend to use RowDataProjection as map key for MapDataStatistics from sink shuffling work. Hence this change is also required.