Skip to content

[GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implementation#10201

Merged
zhztheplayer merged 6 commits intoapache:mainfrom
zjuwangg:m_refactorColumnarShuffleExchangeExec
Aug 19, 2025
Merged

[GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implementation#10201
zhztheplayer merged 6 commits intoapache:mainfrom
zjuwangg:m_refactorColumnarShuffleExchangeExec

Conversation

@zjuwangg
Copy link
Copy Markdown
Contributor

@zjuwangg zjuwangg commented Jul 16, 2025

What changes were proposed in this pull request?

(Fixes: #10107)

How was this patch tested?

Existing tests.

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions github-actions bot added CORE works for Gluten Core VELOX RSS labels Jul 16, 2025
@zjuwangg
Copy link
Copy Markdown
Contributor Author

@zhztheplayer Do you think it's proper way to decouple rss related config from common module?

@zhztheplayer zhztheplayer self-requested a review July 23, 2025 06:12
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @zjuwangg.

The solution looks promising, except that it adds some layers of complexity (e.g., more reflections) which needs extra effort from reader / user. I may need to have a deeper look to understand the cost of refactoring the existing code. Let me know if you have relevant insights.

@zjuwangg
Copy link
Copy Markdown
Contributor Author

zjuwangg commented Jul 24, 2025

Thanks for working on this @zjuwangg.

The solution looks promising, except that it adds some layers of complexity (e.g., more reflections) which needs extra effort from reader / user. I may need to have a deeper look to understand the cost of refactoring the existing code. Let me know if you have relevant insights.

@zhztheplayer
Sure, it indeed introduces complexity. Now we have gluten-common/ backend-specific-impl / rss-specific-impl, and backend depends on gluten-common, while rss-specific-impl depends on backends.
Current design VeloxCelebornColumnarBatchSerlizerFactory works just like the way of VeloxCelebornColumnarShuffleWriterFactory does. Besides that, we can also leverage CelebornShuffleWriterFactory interface, add the columnarBatchSeriliazer interface and change the interface name to reduce reflections.

In the long run, we may also can extract RSS-related configuration and behavior like backend. We may need introduce CelebornShuffleBackend and UniffleShuffleBackend and pass current job shuffle backend to execution-backend to do more specific backend-related and rss-related config.

@zjuwangg zjuwangg requested a review from zhztheplayer July 29, 2025 02:55
@zhztheplayer
Copy link
Copy Markdown
Member

Hi @zjuwangg, sorry for the delayed response. Are you going to modify GlutenConfig in this PR?

@zjuwangg
Copy link
Copy Markdown
Contributor Author

zjuwangg commented Jul 30, 2025

Hi @zjuwangg, sorry for the delayed response. Are you going to modify GlutenConfig in this PR?
@zhztheplayer

No, It's just a poc of my idea. I plan to modify related config in following PR.

@zjuwangg
Copy link
Copy Markdown
Contributor Author

@zhztheplayer
Currently, RSS-specific configurations exist beyond just createColumnarBatchSerializer (e.g., this genShuffleWriterType). We should introduce more middle interfaces to decouple these tightly bound dependencies.

@zjuwangg zjuwangg force-pushed the m_refactorColumnarShuffleExchangeExec branch from 5d65f09 to 4c73fb3 Compare July 30, 2025 03:23
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zhztheplayer
Copy link
Copy Markdown
Member

@zjuwangg CI is failing

@zjuwangg
Copy link
Copy Markdown
Contributor Author

zjuwangg commented Aug 4, 2025

@zjuwangg CI is failing

Got. Will address it soon.

@zjuwangg zjuwangg force-pushed the m_refactorColumnarShuffleExchangeExec branch from 4c73fb3 to be0cbe4 Compare August 13, 2025 07:32
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg changed the title [wip] try to decouple rss dependency from GlutenConfig [GLUEN-10107] Make ColumnarShuffleExchangeExec extensible for different serializers Aug 13, 2025
@zjuwangg zjuwangg changed the title [GLUEN-10107] Make ColumnarShuffleExchangeExec extensible for different serializers [GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implmentation Aug 13, 2025
@zjuwangg zjuwangg changed the title [GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implmentation [GLUEN-10107] Introduce NeedCustomColumnarBatchSerializer trait to make columnarBatchSerializerClass custom by rss implementation Aug 13, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

1 similar comment
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg force-pushed the m_refactorColumnarShuffleExchangeExec branch from 76d5442 to bb8d9d5 Compare August 13, 2025 09:22
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@zjuwangg zjuwangg requested a review from zhztheplayer August 14, 2025 05:09
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@zjuwangg
Copy link
Copy Markdown
Contributor Author

@zhztheplayer Can you help merge this one?
I plan to change ck backend use this way to decouple the rss related config.

@zjuwangg zjuwangg requested a review from zhztheplayer August 19, 2025 06:30
Copy link
Copy Markdown
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@zhztheplayer zhztheplayer merged commit 0a86df3 into apache:main Aug 19, 2025
53 checks passed
@zjuwangg zjuwangg deleted the m_refactorColumnarShuffleExchangeExec branch August 20, 2025 03:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CORE works for Gluten Core RSS VELOX

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL] Make ColumnarShuffleExchangeExec extensible for different serializers

2 participants