core: snappy as default for parquet compession #5670

rbalamohan · 2022-08-29T23:47:39Z

PR to use snappy as default compression for parquet instead of gzip. #5658

sumeetgajjar · 2022-08-30T04:22:34Z

org.apache.iceberg.TestSplitScan > test[format = parquet] FAILED

Hi @rbalamohan, the above test is failing due to the change in compression codec, the size of the data file is now ~ 77MB and with a 16MB split size, it results in 5 scan tasks.

iceberg/data/src/test/java/org/apache/iceberg/TestSplitScan.java

Line 97 in 97c85aa

int numRecords = 2500000;

Reducing the number of records to 2000000 to get the same file size when gzip was used resolves the test failure.

rdblue · 2022-08-31T18:17:09Z

From the fairly broad testing that I've done, snappy is never a good choice for compression. This choice depends on what you want to optimize. Snappy is often fast, but gets very poor compression rates. LZ4 is a much better choice if you're optimizing for write speed because it is usually faster and smaller than snappy. But if you're optimizing for compression, I probably wouldn't choose LZ4.

core: snappy as default for parquet compession

0792317

github-actions bot added core docs python labels Aug 29, 2022

core: snappy as default for parquet compession

99dece9

github-actions bot added the data label Aug 30, 2022

sumeetgajjar approved these changes Aug 30, 2022

View reviewed changes

hililiwei approved these changes Aug 31, 2022

View reviewed changes

rdblue closed this Aug 31, 2022

deniskuzZ mentioned this pull request Feb 8, 2023

HIVE-26990 - Upgrade Iceberg from 1.0.0 to 1.1.0 apache/hive#3968

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: snappy as default for parquet compession #5670

core: snappy as default for parquet compession #5670

Uh oh!

rbalamohan commented Aug 29, 2022

Uh oh!

sumeetgajjar commented Aug 30, 2022

Uh oh!

rdblue commented Aug 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

core: snappy as default for parquet compession #5670

core: snappy as default for parquet compession #5670

Uh oh!

Conversation

rbalamohan commented Aug 29, 2022

Uh oh!

sumeetgajjar commented Aug 30, 2022

Uh oh!

rdblue commented Aug 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants