Skip to content

Conversation

@stevenzwu
Copy link
Contributor

Flink RowData implementation classes (like GenericRowData) all implement proper overrides for those methods. We also intend to use RowDataProjection as map key for MapDataStatistics from sink shuffling work. Hence this change is also required.

@github-actions github-actions bot added the flink label May 1, 2023
@stevenzwu
Copy link
Contributor Author

@hililiwei @chenjunjiedada @yegangy0718 can you help review?

@stevenzwu stevenzwu force-pushed the RowDataProjection branch from 127c82a to 955b8ec Compare May 1, 2023 21:46
private final Schema icebergSchema =
new Schema(
Types.NestedField.required(1, "partition_field", Types.StringType.get()),
Types.NestedField.required(1, "row_id", Types.StringType.get()),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming the field id to also match what the RowDataProjection usage (like idOnly projected schema)

for (int pos = 0; pos < getArity(); pos++) {
if (!isNullAt(pos)) {
// Arrays.deepHashCode handles array object properly
result = 31 * result + Arrays.deepHashCode(new Object[] {getValue(pos)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious, when calculating hashcode, do we need to take Map/List as a special case? Will Arrays.deepHashCode be able to handle that?
Or it's because This projection will not project the nested children types of repeated types like lists and maps, so we can ignore them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so sure, but I prefer the first one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yegangy0718 Java List and Map objects implements hashCode properly. here we are just leveraging java.util.Arrays to handle the hashCode for array type field.

@hililiwei what do you mean by preferring the first one?

@Override
public RowKind getRowKind() {
return rowData.getRowKind();
// rowData can be null for nested struct
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit on this?

Copy link
Contributor Author

@stevenzwu stevenzwu May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the nested struct is null, there is a RowDataProjection instanced created that wraps the null RowData. Let me think if there is other way to fix this. e.g. maybe we should return a null RowDataProjection in this case.

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug out this issue: #2738

If the nested struct is null, right now both StructProjection and RowDataProjection returns a Projection object wrapping a null struct. I actually think it is probably better to just return a null Projection object in this case.

Let me create a separate issue to follow up on this and see if the community agrees with the change.

Copy link
Contributor Author

@stevenzwu stevenzwu May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a new issue to discuss how to handle null nested struct. #7507

import org.apache.avro.SchemaBuilder;
import org.apache.avro.generic.GenericData;
import org.apache.avro.util.Utf8;
import org.apache.commons.lang3.NotImplementedException;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we use UnsupportedOperationException in other places, should we keep consistency?

Copy link
Contributor

@hililiwei hililiwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, it looks good.

stevenzwu added 2 commits May 12, 2023 12:29
Flink RowData implementation classes (like GenericRowData) all implement proper overrides for those methods. We also intend to use RowDataProjection as map key for MapDataStatistics from sink shuffling work. Hence this change is also required.
@stevenzwu stevenzwu force-pushed the RowDataProjection branch from c11cec3 to 70dd3a5 Compare May 12, 2023 19:31
…merged. added Preconditions check for non-null root oject, as RowDataProjection never allowed it.
@stevenzwu stevenzwu force-pushed the RowDataProjection branch from ed74599 to 674ec82 Compare May 12, 2023 20:34
@stevenzwu
Copy link
Contributor Author

@pvary @hililiwei @yegangy0718 can you take another look? I made small adjustment in the last commit after rebased with PR #7517 .

@stevenzwu stevenzwu merged commit 198fb72 into apache:master May 15, 2023
stevenzwu added a commit to stevenzwu/iceberg that referenced this pull request May 17, 2023
stevenzwu added a commit that referenced this pull request May 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants