Skip to content

Flink: Add Sink options to override the compression properties of the Table#6049

Merged
stevenzwu merged 2 commits intoapache:masterfrom
pvary:compression
Oct 26, 2022
Merged

Flink: Add Sink options to override the compression properties of the Table#6049
stevenzwu merged 2 commits intoapache:masterfrom
pvary:compression

Conversation

@pvary
Copy link
Copy Markdown
Contributor

@pvary pvary commented Oct 25, 2022

Based on my performance tests using snappy codec to write out files is 40% more performant that writing with gzip.
Since with Flink there is an additional compaction step which overwrites the original data files anyway, it would be useful to first write out the files with snappy and use the table codec only at the compaction time.

To provide this functionality additional FlinkWriteOptions are added to modify the table default settings for:

  • compression-codec
  • compression-level - For Parquet and Avro formats
  • compression-strategy - For ORC formats

Copy link
Copy Markdown
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes LGTM and make sense, but would be good to have somebody with more Flink knowledge review this as well

Comment thread flink/v1.15/flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java Outdated
writeProperties.put(AVRO_COMPRESSION, conf.avroCompressionCodec());
level = conf.avroCompressionLevel();
if (level != null) {
writeProperties.put(AVRO_COMPRESSION_LEVEL, conf.avroCompressionLevel());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can use level in this line.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate when the blocks in the switch are not independent.
I refactored to use a different variable instead inside each block

@stevenzwu stevenzwu merged commit e35f0d2 into apache:master Oct 26, 2022
@stevenzwu
Copy link
Copy Markdown
Contributor

Thanks @pvary for the contribution and @nastra for the review

@pvary pvary deleted the compression branch October 26, 2022 19:28
@pvary
Copy link
Copy Markdown
Contributor Author

pvary commented Oct 26, 2022

Thanks @stevenzwu and @nastra for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants