PARQUET-2094: Handle negative values in page headers#933
PARQUET-2094: Handle negative values in page headers#933gszadovszky merged 4 commits intoapache:masterfrom
Conversation
|
TravisCI runs contain multiple |
@plygrnd, yes this is expected. The test class TestCorruptThriftRecords verifies if the correct exceptions are thrown. It uses MR to execute the tests and it seems MR logs the exceptions before they are thrown to the caller. |
|
Okay, LGTM then. |
|
LGTM |
| * A specific IOException thrown when invalid values are found in the Parquet file metadata (including the footer, | ||
| * page header etc.). | ||
| */ | ||
| public static class InvalidParquetMetadataException extends IOException { |
There was a problem hiding this comment.
I don't think that this is an IOException. What is the value of making this a checked exception? Why not just make this a RuntimeException? Or use some existing one like IllegalStateException or ParquetDecodingException?
There was a problem hiding this comment.
The module parquet-format-structures is the one all the others are depended on. Parquet exceptions are implemented in another module so I cannot use them here. Since we already throw IOExceptions I've felt extending it would be a good idea. But you might be right. I am happy to extend RuntimeException instead of IOException.
| return pageHeader; | ||
| } | ||
|
|
||
| private static <T> void validateValue(Predicate<? super T> validator, T value, String metaName) |
There was a problem hiding this comment.
Why accept a predicate? Most check methods like this use a boolean. I would expect it to work like a Precondition:
if (!isValid) {
throw new ParquetDecodingException(...);
}
int size = pageHeader.getCompressed_page_size()
validateValue(size >= 0, String.format("Compressed page size must be positive, but was: %s", size));There was a problem hiding this comment.
I am not sure why I've implemented this way. I'm fine rewriting to use a simple boolean.
| public static PageHeader readPageHeader(InputStream from, | ||
| BlockCipher.Decryptor decryptor, byte[] AAD) throws IOException { | ||
| return read(from, new PageHeader(), decryptor, AAD); | ||
| return validate(read(from, new PageHeader(), decryptor, AAD)); |
There was a problem hiding this comment.
I think it would be more clear if you called MetadataValidator.validate rather than just validate.
| fail("Expected exception but did not thrown"); | ||
| } catch (InvalidParquetMetadataException e) { | ||
| assertTrue("Exception message does not contain the expected parts", | ||
| e.getMessage().contains("pageHeader.compressed_page_size")); |
There was a problem hiding this comment.
Isn't there an assertion helper so you don't need to catch the exception? Something like assertThrows in the codebase?
There was a problem hiding this comment.
Yes, there is something already implemented but in another module and I cannot use it here.
| * A specific RuntimeException thrown when invalid values are found in the Parquet file metadata (including the | ||
| * footer, page header etc.). | ||
| */ | ||
| public static class InvalidParquetMetadataException extends RuntimeException { |
There was a problem hiding this comment.
Minor: I'd prefer it if the exception weren't an inner class since that makes it harder to reference. But this isn't a blocker.
There was a problem hiding this comment.
It is a fair point anyway. I'll move it before merging. Thanks a lot for the review.
| * page header etc.). | ||
| */ | ||
| public class InvalidParquetMetadataException extends RuntimeException { | ||
| <T> InvalidParquetMetadataException(String message) { |
There was a problem hiding this comment.
What is the type parameter for?
| import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer; | ||
| import org.junit.Test; | ||
|
|
||
| import org.apache.parquet.format.Util.DefaultFileMetaDataConsumer; |
(cherry picked from commit 1695d92)
(cherry picked from commit 1695d92)
|
@gszadovszky, thanks for getting this done. |
(cherry picked from commit 1695d92)
Make sure you have checked all steps below.
Jira
Tests
Commits
Documentation