Skip to content

[GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5.#8386

Merged
JkSelf merged 1 commit intoapache:mainfrom
yikf:bucket-write
Jan 8, 2025
Merged

[GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5.#8386
JkSelf merged 1 commit intoapache:mainfrom
yikf:bucket-write

Conversation

@yikf
Copy link
Copy Markdown
Contributor

@yikf yikf commented Jan 1, 2025

What changes were proposed in this pull request?

This PR amins to support write compatible-hive bucket table for Spark3.4 and Spark3.5. fix #8385

How was this patch tested?

unit tests.

@github-actions github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE DOCS labels Jan 1, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 1, 2025

#8385

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 1, 2025

Run Gluten Clickhouse CI on x86

@yikf
Copy link
Copy Markdown
Contributor Author

yikf commented Jan 1, 2025

@JkSelf @jackylee-ch @zhouyuan Could you please take a look if you find a time, thanks!

@JkSelf
Copy link
Copy Markdown
Contributor

JkSelf commented Jan 2, 2025

cc @ulysses-you

targetFileName = fmt::format("gluten-part-{}{}{}", makeUuid(), compressionFileNameSuffix(compression), ".parquet");
}
return std::make_shared<connector::hive::LocationHandle>(
targetDirectory, writeDirectory.value_or(targetDirectory), tableType, targetFileName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should only use hive style file name for bucketed writing, and for a non-bucketed table, keep previous behavior.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestion.

For non-bucket tables, we can maintain the previous approach.

For bucket tables, it seems that the bucketId can only be calculated during the write process, thereby determining the final file name, which is controlled by Velox.

Comment thread docs/developers/SubstraitModifications.md Outdated
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 3, 2025

Run Gluten Clickhouse CI on x86

2 similar comments
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 3, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 4, 2025

Run Gluten Clickhouse CI on x86

@yikf
Copy link
Copy Markdown
Contributor Author

yikf commented Jan 7, 2025

@JkSelf @ulysses-you Could you please review this PR again when you have time?

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

@yikf yikf changed the title [GLUTEN-8385][VL]Support write compatible-hive bucket table for Spark3.4 and Spark3.5. [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. Jan 7, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 7, 2025

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Contributor

@JkSelf JkSelf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your work.

@JkSelf JkSelf merged commit 7dda686 into apache:main Jan 8, 2025
@yikf yikf deleted the bucket-write branch January 8, 2025 01:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5.

3 participants