Skip to content

[VL] Rss use row-based sort#6671

Closed
marin-ma wants to merge 1 commit intoapache:mainfrom
marin-ma:rss-sort-patch
Closed

[VL] Rss use row-based sort#6671
marin-ma wants to merge 1 commit intoapache:mainfrom
marin-ma:rss-sort-patch

Conversation

@marin-ma
Copy link
Copy Markdown
Contributor

@marin-ma marin-ma commented Aug 1, 2024

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 1, 2024

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 1, 2024

Run Gluten Clickhouse CI

@github-actions github-actions bot added CORE works for Gluten Core RSS labels Aug 1, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 1, 2024

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Aug 6, 2024

Run Gluten Clickhouse CI

@github-actions github-actions bot removed the CORE works for Gluten Core label Aug 28, 2024
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@marin-ma
Copy link
Copy Markdown
Contributor Author

@kerwin-zk Could you help to check whether this patch could work? Besides, I noticed it requires some extra configurations to be set for celeborn. Here's what I found and please help to check if there are anything else needed to fully enable sort-based shuffle with celeborn. Thanks!

--conf spark.shuffle.manager=celeborn \
--conf spark.celeborn.client.spark.shuffle.writer=sort \
--conf spark.celeborn.client.shuffle.compression.codec=zstd

cc: @FelixYBW

@kerwin-zk
Copy link
Copy Markdown
Contributor

@kerwin-zk Could you help to check whether this patch could work? Besides, I noticed it requires some extra configurations to be set for celeborn. Here's what I found and please help to check if there are anything else needed to fully enable sort-based shuffle with celeborn. Thanks!

--conf spark.shuffle.manager=celeborn \
--conf spark.celeborn.client.spark.shuffle.writer=sort \
--conf spark.celeborn.client.shuffle.compression.codec=zstd

cc: @FelixYBW

@marin-ma I'll try to test it first.

@FelixYBW
Copy link
Copy Markdown
Contributor

@marin-ma I'll try to test it first.

Is there any more config? I'm testing it in a customer case.

@kerwin-zk
Copy link
Copy Markdown
Contributor

--conf spark.shuffle.manager=celeborn
--conf spark.celeborn.client.spark.shuffle.writer=sort

@FelixYBW @marin-ma I haven't tested row-based sort + Celeborn yet. For columnar-based sort + Celeborn, the following settings are needed:

spark.shuffle.manager: org.apache.spark.shuffle.gluten.celeborn.CelebornShuffleManager
spark.celeborn.master.endpoints: master-1-1:9097
spark.shuffle.service.enabled: false
spark.celeborn.client.spark.push.sort.memory.threshold: 128m
spark.celeborn.client.spark.shuffle.writer: sort
spark.celeborn.client.shuffle.compression.codec: none
spark.sql.adaptive.localShuffleReader.enabled: false

@FelixYBW
Copy link
Copy Markdown
Contributor

@floesing_pins you may take a look of this. You may set spark.gluten.sql.columnar.shuffle.sort.partitions.threshold=4000

@jingyuanzhang_pins FYI.

@clay4megtr
Copy link
Copy Markdown
Contributor

fiele isSort of VeloxCelebornColumnarShuffleWriter need update

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Dec 29, 2024
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2025

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Jan 9, 2025
@marin-ma marin-ma reopened this Mar 18, 2025
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@marin-ma marin-ma marked this pull request as ready for review March 18, 2025 20:04
@marin-ma
Copy link
Copy Markdown
Contributor Author

cc @kerwin-zk

@github-actions github-actions bot removed the stale stale label Mar 19, 2025
@kerwin-zk
Copy link
Copy Markdown
Contributor

@marin-ma Since our internal version depends on GLUTEN_RSS_SORT_SHUFFLE_WRITER, I suggest adding a configuration to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER.

@FelixYBW
Copy link
Copy Markdown
Contributor

@marin-ma Since our internal version depends on GLUTEN_RSS_SORT_SHUFFLE_WRITER, I suggest adding a configuration to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER.

Is your internal version the same as upstream? We picked this up because one customer met OOM issue using upstream. Did you try GLUTEN_SORT_SHUFFLE_WRITER? Is there performance gap?

@kerwin-zk
Copy link
Copy Markdown
Contributor

kerwin-zk commented Mar 20, 2025

@marin-ma Since our internal version depends on GLUTEN_RSS_SORT_SHUFFLE_WRITER, I suggest adding a configuration to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER.

Is your internal version the same as upstream? We picked this up because one customer met OOM issue using upstream. Did you try GLUTEN_SORT_SHUFFLE_WRITER? Is there performance gap?

@FelixYBW The internal version is somewhat different from the upstream version, and ShuffleRead may have memory issues(#9069). I previously tested GLUTEN_SORT_SHUFFLE_WRITER, and there were cases where it failed to process very large datasets. Therefore, I think it would be more appropriate to control whether to use GLUTEN_SORT_SHUFFLE_WRITER or GLUTEN_RSS_SORT_SHUFFLE_WRITER based on a parameter.

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 5, 2025

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label May 5, 2025
@FelixYBW
Copy link
Copy Markdown
Contributor

FelixYBW commented May 5, 2025

@marin-ma Let's add a config to do the switch.

@FelixYBW
Copy link
Copy Markdown
Contributor

FelixYBW commented May 5, 2025

@kerwin-zk The OOM issue is fixed #9221

@github-actions github-actions bot removed the stale stale label May 6, 2025
@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Jun 20, 2025
@github-actions
Copy link
Copy Markdown

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants