Spark 3.4: Fix write and SQL options to override delete file compression config#8438
Spark 3.4: Fix write and SQL options to override delete file compression config#8438szehon-ho merged 22 commits intoapache:masterfrom
Conversation
…ase table properties
This reverts commit b5fce2c.
| }); | ||
| } | ||
|
|
||
| private Object[][] getOverridePropertiesSuites() { |
There was a problem hiding this comment.
case 1: Spark properties override the write properties.
| } | ||
|
|
||
| private Object[][] getNonOverridePropertiesSuite() { | ||
| return new Object[][] { |
There was a problem hiding this comment.
case 2: We reuse the data file format properties because of the absence of delete file format properties.
| }, | ||
| { | ||
| ImmutableMap.of(), | ||
| ImmutableMap.of( |
There was a problem hiding this comment.
case 3: The properties of delete file format exists.
| break; | ||
|
|
||
| case ORC: | ||
| String orcCodec = deleteOrcCompressionCodec(); |
There was a problem hiding this comment.
Nit: would a helper method be better to re-use?
setWithFallback(writeProperties, DELETE_ORC_COMPRESSION, deleteOrcCompressionCodec(), orcCompressionCodec());
| writeProperties.put( | ||
| DELETE_ORC_COMPRESSION, orcCodec != null ? orcCodec : orcCompressionCodec()); | ||
| String strategy = deleteOrcCompressionStrategy(); | ||
| if (strategy != null) { |
There was a problem hiding this comment.
Simlarly, maybe setWithFallback(writeProperties, DELETE_ORC_COMPRESSION_STRATEGY, deleteOrcCompressionStrategy(), orcCompressionStrategy()) (I feel null handling is similar in the two cases?)
szehon-ho
left a comment
There was a problem hiding this comment.
Thanks, it looks good to me
szehon-ho
left a comment
There was a problem hiding this comment.
I think we should get this in, I left some comments on the test as it was a bit harder to read for me.
| } | ||
| } | ||
|
|
||
| private void testWritePropertiesBySuite(Object[] propertiesSuite) { |
There was a problem hiding this comment.
is there any point to make it Object[], and not just Map<String, String>[]?
There was a problem hiding this comment.
also, not sure 'bySuite' adds any value here, maybe just testWriteProperties
There was a problem hiding this comment.
is there any point to make it Object[]
Fixed.
There was a problem hiding this comment.
is there any point to make it Object[], and not just Map<String, String>[]?
We can't create a generic type array or cast the arry. So I use Object[] here. This seems confusing. So I changed the type from the Array to the List.
There was a problem hiding this comment.
also, not sure 'bySuite' adds any value here, maybe just testWriteProperties
Fixed.
| (Map<String, String>) propertiesSuite[0], | ||
| () -> { | ||
| Table table = validationCatalog.loadTable(tableIdent); | ||
| Map<String, String> writeOptions = ImmutableMap.of(); |
There was a problem hiding this comment.
This definition seems quite far from where its used. How about just inline it below?
SparkWriteConf writeConf = new SparkWriteConf(spark, table, ImmutableMap.of());
| } | ||
|
|
||
| @Test | ||
| public void testSparkWriteConfWriteProperties() { |
There was a problem hiding this comment.
While this is good for code re-use and test coverage , its hard to read. Iceberg test typically split to be as fine grained as possible.
I see you have at least three cases, can we make separate test for the three cases? We should also inline the properties in the test itself to make it easier to read.
@Test
testSparkConfOverride() {
Map<String, String>[] properties = // inline properties here
testWriteProperties(properties)...
}
testDataPropsDefaultsAsDeleteProps() {
...
}
testDeleteFileWriteConf() {
...
}
...
feel free to clarify the titles.
| updateProperties.commit(); | ||
|
|
||
| Map<String, String> writeOptions = ImmutableMap.of(); | ||
| SparkWriteConf writeConf = new SparkWriteConf(spark, table, writeOptions); |
There was a problem hiding this comment.
Still slightly prefer to inline writeOptions here, but not a blocker.
There was a problem hiding this comment.
I will raise a follow-up pr to fix this.
|
Merged, thanks @jerqi ! |
Thanks for your careful review. |
No description provided.