PARQUET-2037: Write INT96 with parquet-avro#901
Conversation
|
In the ticket, you mentioned there are two ways to solve this issue. I see you implemented it using the 2nd way. I wonder what is the reason behind it? I am not in favor of one over the other, just want to know what are the pros and cons for each. |
|
@shangxinli, |
|
Yeah, agree. Just one thing that sometimes it might not be straightforward for the user to know the exact path to manually set in the configuration for some deeply nested schema. I remember last time when I worked on Avro schema, there are 20+ layers nested in the field and there are 'Type' in the middle with 'name' in it. That is not very human being readable and easy to make mistake. But I have less experience working on Schema, I am not certain this is a real issue. |
|
@shangxinli, I agree this is not a perfect solution but I could not come up with any better one. Meanwhile, this feature will not be used widely since INT96 is deprecated. Maybe, it is even better that this feature is not always easy to use :) |
|
LGTM |
* 'master' of https://github.com/apache/parquet-mr: (222 commits) PARQUET-2052: Integer overflow when writing huge binary using dictionary encoding (apache#910) PARQUET-2041: Add zstd to `parquet.compression` description of ParquetOutputFormat Javadoc (apache#899) PARQUET-2050: Expose repetition & definition level from ColumnIO (apache#908) PARQUET-1761: Lower Logging Level in ParquetOutputFormat (apache#745) PARQUET-2046: Upgrade Apache POM to 23 (apache#904) PARQUET-2048: Deprecate BaseRecordReader (apache#906) PARQUET-1922: Deprecate IOExceptionUtils (apache#825) PARQUET-2037: Write INT96 with parquet-avro (apache#901) PARQUET-2044: Enable ZSTD buffer pool by default (apache#903) PARQUET-2038: Upgrade Jackson version used in parquet encryption. (apache#898) Revert "[WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894)" PARQUET-2027: Fix calculating directory offset for merge (apache#896) [WIP] Refactor GroupReadSupport to unuse deprecated api (apache#894) PARQUET-2030: Expose page size row check configurations to ParquetWriter.Builder (apache#895) PARQUET-2031: Upgrade to parquet-format 2.9.0 (apache#897) PARQUET-1448: Review of ParquetFileReader (apache#892) PARQUET-2020: Remove deprecated modules (apache#888) PARQUET-2025: Update Snappy version to 1.1.8.3 (apache#893) PARQUET-2022: ZstdDecompressorStream should close `zstdInputStream` (apache#889) PARQUET-1982: Random access to row groups in ParquetFileReader (apache#871) ... # Conflicts: # parquet-column/src/main/java/org/apache/parquet/example/data/simple/SimpleGroup.java # parquet-hadoop/pom.xml # parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java # parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation