Skip to content

[GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5#8560

Merged
Yohahaha merged 7 commits intoapache:mainfrom
ArnavBalyan:arnavb/parquet-3.5
Jan 22, 2025
Merged

[GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5#8560
Yohahaha merged 7 commits intoapache:mainfrom
ArnavBalyan:arnavb/parquet-3.5

Conversation

@ArnavBalyan
Copy link
Copy Markdown
Member

  • Adds support for encrypted parquet fallback for Spark 3.5.
  • Previous Spark versions use an older parquet dependency which does not provide direct metadata to check encryption.
  • Using EncryptionType provided by Parquet 1.13 for Spark 3.5.

@github-actions github-actions bot added CORE works for Gluten Core VELOX labels Jan 17, 2025
@github-actions
Copy link
Copy Markdown

#8455

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

update
@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

update
@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@ArnavBalyan ArnavBalyan changed the title [GLUTEN-8455][VL] Encrypted parquet fallback for 3.5 [GLUTEN-8455][VL] Support encrypted parquet fallback for 3.5 Jan 17, 2025
Comment on lines +566 to +573
fileMetaData.getEncryptionType match {
case EncryptionType.UNENCRYPTED =>
false
case EncryptionType.PLAINTEXT_FOOTER =>
true
case _ =>
false
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why return file is encrypted when EncryptionType.PLAINTEXT_FOOTER?

would you post official parquet encrypt type doc?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry accidentally deleted the comment, here is the code ref - https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L1766. We still rely on the exception to check the footer + file encryption. The EncryptionType.PLAINTEXT_FOOTER checks for encrypted file but plain footer. It seems there is no doc on this, but hope the code ref helps. thanks!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some comments to it? It should have some EncryptionType description and how we deal with it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

update
@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

testWithSpecifiedSparkVersion(
"Detect encrypted Parquet with encrypted footer",
Array("3.2", "3.3", "3.4")) {
Array("3.2", "3.3", "3.4", "3.5")) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's time to replace testWithSpecifiedSparkVersion to test, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure let me update

Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! only one comments

@github-actions
Copy link
Copy Markdown

Run Gluten ClickHouse CI on ARM

@Yohahaha Yohahaha merged commit 0f4489a into apache:main Jan 22, 2025
baibaichen pushed a commit to baibaichen/gluten that referenced this pull request Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants