[GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5.#8386
[GLUTEN-8385][VL] Support write compatible-hive bucket table for Spark3.4 and Spark3.5.#8386JkSelf merged 1 commit intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI on x86 |
|
@JkSelf @jackylee-ch @zhouyuan Could you please take a look if you find a time, thanks! |
|
cc @ulysses-you |
| targetFileName = fmt::format("gluten-part-{}{}{}", makeUuid(), compressionFileNameSuffix(compression), ".parquet"); | ||
| } | ||
| return std::make_shared<connector::hive::LocationHandle>( | ||
| targetDirectory, writeDirectory.value_or(targetDirectory), tableType, targetFileName); |
There was a problem hiding this comment.
I think we should only use hive style file name for bucketed writing, and for a non-bucketed table, keep previous behavior.
There was a problem hiding this comment.
Thank you for your suggestion.
For non-bucket tables, we can maintain the previous approach.
For bucket tables, it seems that the bucketId can only be calculated during the write process, thereby determining the final file name, which is controlled by Velox.
|
Run Gluten Clickhouse CI on x86 |
2 similar comments
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
@JkSelf @ulysses-you Could you please review this PR again when you have time? |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
JkSelf
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your work.
What changes were proposed in this pull request?
This PR amins to support write compatible-hive bucket table for Spark3.4 and Spark3.5. fix #8385
How was this patch tested?
unit tests.