-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.
However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:
- it's inefficient
- it's not obvious that it will always produce the same Thrift encoding as the original, leading to spurious signature verification failures
- if the original footer deserializes to invalid enum values, attempting to serialize it again will lead to undefined behavior
Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).
For these reasons, it would be better to reuse the existing serialized metadata from the footer.
Component(s)
C++, Parquet