PARQUET-1335: Logical type names in parquet-mr are not consistent with parquet-format#496
Conversation
| String message = | ||
| "message StringMessage {\n" + | ||
| " required binary string (UTF8);\n" + | ||
| " required binary string (STRING);\n" + |
There was a problem hiding this comment.
Why is this test change needed?
|
@gszadovszky, I think we might need to remove this commit. It looks like this changes the Parquet schema format. Is that correct? |
|
@rdblue, this is not a breaking change but introducing the new logical types. Both |
|
@rdblue like @gszadovszky told, the change shouldn't be a breaking change, since the "old" types are still honored. Though the change I did on the test is misleading, I should have added a new test case for STRING and leave UTF8 untouched. Created a new PR #503. |
|
Thanks, it's good to hear the old types still work. Since the new logical type code changes schema serialization, is this a forward-incompatible change? Will old readers still be able to read files written after this change? |
|
The new API writes both logicalType and converted_type fields for each SchemaElement. Therefore old readers, which only know about converted_type will be able to read files written by new writers. What old parquet versions won't be able to interpret is the changes in the schema language, the text representation parseable by MessageParser. New logical types, like timestamps have new type parameters, which the old parser can't parse. Fortunately - as far as I know - the text schema representation is not written into the file, thus the files written by new writer should be readable by old readers. @rdblue does this answer your concern? |
…h parquet-format (apache#496)
No description provided.