Skip to content

[C++][Parquet] Plaintext footer signature verification re-serializes footer #48858

@pitrou

Description

@pitrou

Describe the bug, including details regarding any error messages, version, and platform.

When reading an encrypted Parquet file with a plaintext footer, the Parquet reader is able to verify footer integrity by comparing the signature in the file with the one computed by encrypting the footer.

However, the way it does this is to first re-serializes the deserialized footer using Thrift. This has several issues:

  1. it's inefficient
  2. it's not obvious that it will always produce the same Thrift encoding as the original, leading to spurious signature verification failures
  3. if the original footer deserializes to invalid enum values, attempting to serialize it again will lead to undefined behavior

Reason 3 is what allowed this to be uncovered by OSS-Fuzz (see https://oss-fuzz.com/testcase-detail/4740205688193024).

For these reasons, it would be better to reuse the existing serialized metadata from the footer.

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions