Core, Spark 3.4: Write properties of PositionDeletesTable should respect ones of BaseTable#8428
Core, Spark 3.4: Write properties of PositionDeletesTable should respect ones of BaseTable#8428szehon-ho merged 6 commits intoapache:masterfrom
PositionDeletesTable should respect ones of BaseTable#8428Conversation
BaseMetadataTable should respect the properties of BaseTable
|
I think this option works for me, @aokolnychyi for any concerns? |
BaseMetadataTable should respect the properties of BaseTableBaseMetadataTable should respect the properties of BaseTable
BaseMetadataTable should respect the properties of BaseTableBaseMetadataTable should respect the properties of BaseTable
BaseMetadataTable should respect the properties of BaseTableBaseMetadataTable should respect the properties of BaseTable
BaseMetadataTable should respect the properties of BaseTableBaseMetadataTable should respect properties of BaseTable
BaseMetadataTable should respect properties of BaseTableBaseMetadataTable should respect properties of BaseTable
|
@jerqi sorry about this, I was re-thinking about it and not 100% sure it makes sense, as there are some table properties that look weird on all metadata tables. What do you think about first trying to solve it in [SparkBinPackPositionDeletesRewriter] (using ds write options) (https://github.com/apache/iceberg/blob/master/spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/SparkBinPackPositionDeletesRewriter.java) I think that's a bit less impact. |
We need to set every write option for the data frame. It may be difficult for users to use. |
|
@jerqi ok that sounds good with me, let's go with that approach. Mostly write properties I assume? Writing position deletes through position_deletes table is not exposed to user, as we dont support that outside rewrite_position_deletes. But I see your point that doing it inside the action makes the code harder as there's quite a few possible properties. |
Maybe we should include read properties and commit properties, too, let me see what properties the rewrite_position_deletes used. |
Yes, you're right. We need mostly write properties. |
BaseMetadataTable should respect properties of BaseTablePositionDeletesTable should respect properties of BaseTable
PositionDeletesTable should respect properties of BaseTablePositionDeletesTable should respect properties of BaseTable
| // these properties should respect the ones of BaseTable. | ||
| return Collections.unmodifiableMap( | ||
| table().properties().entrySet().stream() | ||
| .filter(entry -> entry.getKey().startsWith("write.")) |
There was a problem hiding this comment.
I find that all the write properties are needed for our PositionDeletesRewriteAction. So I choose to match the key prefix here instead of copying some specific entries.
PositionDeletesTable should respect properties of BaseTablePositionDeletesTable should respect ones of BaseTable
|
@szehon-ho Could you review this pr again if you have time? |
|
Merged, thanks @jerqi . Can you please update the pr description to make it clearer what problem we are fixing? And do we need to make the fix for other Spark versions? |
|
|
@szehon-ho I have raised a new pr for Spark 3.5. https://github.com/apache/iceberg/pull/8584/files |
What changes were proposed in this pull request?
Make write properties of
PositionDeletesTablerespect ones ofBaseTable.Why are the changes needed?
When we use
PositionDeletesRewriteAction, we will use the properties ofPositionDeletesTable, but the properties are empty before this pr. We will use default properties all the time. The write file format ofPositionDeletesRewriteActionwill be parquet although the table use orc as write format. It's unreasonable. More information you can see #8313 (comment)Does this PR introduce any user-facing change?
No.
How was this patch tested?
UT