Skip to content

[CH] Add a comment to explain why the endpoint uses a single thread#9257

Merged
zhouyuan merged 1 commit intoapache:mainfrom
dcoliversun:threadsaferpc
Apr 10, 2025
Merged

[CH] Add a comment to explain why the endpoint uses a single thread#9257
zhouyuan merged 1 commit intoapache:mainfrom
dcoliversun:threadsaferpc

Conversation

@dcoliversun
Copy link
Copy Markdown
Contributor

@dcoliversun dcoliversun commented Apr 8, 2025

What changes were proposed in this pull request?

This PR aims to add a comment to explain why the endpoint uses a single thread. We don't need to set thread count from SparkContext[1].

[1] https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rpc/RpcEndpoint.scala#L166-L177

How was this patch tested?

No need to add tests.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2025

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 8, 2025

Run Gluten Clickhouse CI on x86

@dcoliversun
Copy link
Copy Markdown
Contributor Author

cc @zhouyuan Sorry to bother you. When I was reading the code, I thought there was no need to set the threadCount additionally here. Could you help me confirm this?

@zhouyuan
Copy link
Copy Markdown
Member

zhouyuan commented Apr 8, 2025

@dcoliversun thanks for checking, it looks like this will break the CK backend, would you please fix this issue?

https://opencicd.kyligence.com/job/gluten/job/gluten-ci/15547/

[2025-04-08T10:01:19.987Z] [WARNING] Applicable -Wconf / @nowarn filters for this warning: msg=<part of the message>, cat=deprecation
[2025-04-08T10:01:25.577Z] [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-3/backends-clickhouse/src/main/scala/org/apache/spark/rpc/GlutenDriverEndpoint.scala:35: error: not found: type IsolatedThreadSafeRpcEndpoint
[2025-04-08T10:01:25.577Z] [ERROR] class GlutenDriverEndpoint extends IsolatedThreadSafeRpcEndpoint with Logging {
[2025-04-08T10:01:25.577Z] [ERROR]                                    ^
[2025-04-08T10:01:25.577Z] [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-3/backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHListenerApi.scala:48: error: value self is not a member of org.apache.spark.rpc.GlutenDriverEndpoint
[2025-04-08T10:01:25.577Z] [ERROR]     GlutenDriverEndpoint.glutenDriverEndpointRef = (new GlutenDriverEndpoint).self
[2025-04-08T10:01:25.577Z] [ERROR]                                                                               ^
[2025-04-08T10:01:25.577Z] [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-3/backends-clickhouse/src/main/scala/org/apache/spark/rpc/GlutenExecutorEndpoint.scala:30: error: not found: type IsolatedThreadSafeRpcEndpoint
[2025-04-08T10:01:25.577Z] [ERROR]   extends IsolatedThreadSafeRpcEndpoint
[2025-04-08T10:01:25.577Z] [ERROR]           ^
[2025-04-08T10:01:25.610Z] [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-1/backends-clickhouse/src/main/scala/org/apache/spark/rpc/GlutenDriverEndpoint.scala:35: error: not found: type IsolatedThreadSafeRpcEndpoint
[2025-04-08T10:01:25.617Z] [ERROR] class GlutenDriverEndpoint extends IsolatedThreadSafeRpcEndpoint with Logging {
[2025-04-08T10:01:25.617Z] [ERROR]                                    ^
[2025-04-08T10:01:25.617Z] [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-1/backends-clickhouse/src/main/scala/org/apache/gluten/backendsapi/clickhouse/CHListenerApi.scala:48: error: value self is not a member of org.apache.spark.rpc.GlutenDriverEndpoint
[2025-04-08T10:01:25.617Z] [ERROR]     GlutenDriverEndpoint.glutenDriverEndpointRef = (new GlutenDriverEndpoint).self
[2025-04-08T10:01:25.617Z] [ERROR]                                                                               ^
[2025-04-08T10:01:25.617Z] [ERROR] /home/jenkins/agent/workspace/gluten/gluten-ci/ut-stage-1/backends-clickhouse/src/main/scala/org/apache/spark/rpc/GlutenExecutorEndpoint.scala:30: error: not found: type IsolatedThre

rpcEnv.setupEndpoint(GlutenRpcConstants.GLUTEN_DRIVER_ENDPOINT_NAME, this)

// TODO(yuan): get thread cnt from spark context
override def threadCount(): Int = 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly this piece of code is left to make it compile on different Spark versions, we did not use the thread count here for real work

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just noticed that IsolatedThreadSafeRpcEndpoint was imported from spark 3.4. I suggest to remove TODO, all we need is just a thread-safe endpoint.

@dcoliversun dcoliversun changed the title [CH] Mark endpoint as thread-safe rpc endpoint [CH] Add a comment to explain why the endpoint uses a single thread Apr 9, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 9, 2025

Run Gluten Clickhouse CI on x86

Copy link
Copy Markdown
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the check!

@zhouyuan zhouyuan merged commit 4ad257f into apache:main Apr 10, 2025
8 checks passed
@dcoliversun dcoliversun deleted the threadsaferpc branch April 11, 2025 00:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants