[GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. by yikf · Pull Request #8386 · apache/gluten

yikf · 2025-01-01T05:32:58Z

What changes were proposed in this pull request?

This PR amins to support write compatible-hive bucket table for Spark3.4 and Spark3.5. fix #8385

How was this patch tested?

unit tests.

github-actions · 2025-01-01T05:33:16Z

#8385

github-actions · 2025-01-01T05:33:30Z

Run Gluten Clickhouse CI on x86

yikf · 2025-01-01T05:33:36Z

@JkSelf @jackylee-ch @zhouyuan Could you please take a look if you find a time, thanks!

JkSelf · 2025-01-02T01:12:59Z

cc @ulysses-you

ulysses-you · 2025-01-02T01:56:25Z

-    targetFileName = fmt::format("gluten-part-{}{}{}", makeUuid(), compressionFileNameSuffix(compression), ".parquet");
-  }
  return std::make_shared<connector::hive::LocationHandle>(
-      targetDirectory, writeDirectory.value_or(targetDirectory), tableType, targetFileName);


I think we should only use hive style file name for bucketed writing, and for a non-bucketed table, keep previous behavior.

Thank you for your suggestion.

For non-bucket tables, we can maintain the previous approach.

For bucket tables, it seems that the bucketId can only be calculated during the write process, thereby determining the final file name, which is controlled by Velox.

github-actions · 2025-01-03T09:54:10Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-03T16:14:31Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-04T01:20:27Z

Run Gluten Clickhouse CI on x86

yikf · 2025-01-07T02:14:44Z

@JkSelf @ulysses-you Could you please review this PR again when you have time?

github-actions · 2025-01-07T06:02:17Z

Run Gluten Clickhouse CI on x86

github-actions · 2025-01-07T06:04:20Z

Run Gluten Clickhouse CI on x86

…tion

github-actions · 2025-01-07T08:25:24Z

Run Gluten Clickhouse CI on x86

JkSelf

LGTM. Thanks for your work.

github-actions bot added CORE works for Gluten Core VELOX CLICKHOUSE DOCS labels Jan 1, 2025

ulysses-you reviewed Jan 2, 2025

View reviewed changes

yikf force-pushed the bucket-write branch from a57f370 to c44b367 Compare January 3, 2025 09:53

JkSelf reviewed Jan 7, 2025

View reviewed changes

Comment thread backends-velox/src/test/scala/org/apache/spark/sql/execution/VeloxParquetWriteSuite.scala

yikf force-pushed the bucket-write branch from cfb515e to 3f9800e Compare January 7, 2025 06:03

yikf changed the title ~~[GLUTEN-8385][VL]Support write compatible-hive bucket table for Spark3.4 and Spark3.5.~~ [GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5. Jan 7, 2025

Native writer support compatible hive bucket write with dynamic parti…

ee6ae32

…tion

yikf force-pushed the bucket-write branch from 3f9800e to ee6ae32 Compare January 7, 2025 08:24

JkSelf approved these changes Jan 7, 2025

View reviewed changes

ulysses-you approved these changes Jan 7, 2025

View reviewed changes

JkSelf merged commit 7dda686 into apache:main Jan 8, 2025

yikf deleted the bucket-write branch January 8, 2025 01:40

Conversation

yikf commented Jan 1, 2025

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

github-actions bot commented Jan 1, 2025

Uh oh!

github-actions bot commented Jan 1, 2025

Uh oh!

yikf commented Jan 1, 2025

Uh oh!

JkSelf commented Jan 2, 2025

Uh oh!

Uh oh!

ulysses-you Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

yikf Jan 2, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Jan 3, 2025

Uh oh!

github-actions bot commented Jan 3, 2025

Uh oh!

github-actions bot commented Jan 4, 2025

Uh oh!

yikf commented Jan 7, 2025

Uh oh!

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

github-actions bot commented Jan 7, 2025

Uh oh!

JkSelf left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants