Skip to content

[GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 for CH backend#9031

Closed
jlfsdtc wants to merge 2 commits intoapache:mainfrom
jlfsdtc:upgrade_spark_3.5.5
Closed

[GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 for CH backend#9031
jlfsdtc wants to merge 2 commits intoapache:mainfrom
jlfsdtc:upgrade_spark_3.5.5

Conversation

@jlfsdtc
Copy link
Copy Markdown
Contributor

@jlfsdtc jlfsdtc commented Mar 17, 2025

What changes were proposed in this pull request?

Support Spark 3.5.5 for CH backend

How was this patch tested?

GA

1. move iceberg version in profiles
2. ignore UT for CH backend
3. exclude jdk.tools in pom
4. ignore spark warehouse in gluten-ut
@github-actions github-actions bot added CORE works for Gluten Core CLICKHOUSE labels Mar 17, 2025
@github-actions
Copy link
Copy Markdown

#8889

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jlfsdtc jlfsdtc changed the title [GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 [GLUTEN-8889][CORE] Bump Spark version from 3.5.2 to 3.5.5 for CH backend Mar 17, 2025
Comment thread pom.xml
</activation>
<properties>
<java.version>1.8</java.version>
<iceberg.version>1.5.0</iceberg.version>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it works well on ch backends? For velox backends, the iceberg version must use 1.8.0

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Velox backend tests, we already upgraded to JDK17. I think the intention here is to enable JDK8 + Spark-355

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to build with JD8 and Spark 355, iceberg should not be included. The changes here don't seem necessary to me.
Besides, we need to define iceberg.version in spark config file instead of jdk config file so that we can use different iceberg versions for different sparks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If Iceberg and Spark versions are bound together, it means that after upgrading Spark to version 3.5.5, it will no longer be possible to use JDK 8. However, Spark 3.5.5 itself still supports the use of JDK 8.
Use different versions of JDK and Spark to support different versions of Iceberg.

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not get your mind. I have runed Iceberg 1.5.0 with vanilla Spark 3.5.5 before, and it can't work that way. Thus If we want use vanilla Spark 3.5.5 with JDK8, then iceberg can not be involved.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to build with JD8 and Spark 355, iceberg should not be included. The changes here don't seem necessary to me. Besides, we need to define iceberg.version in spark config file instead of jdk config file so that we can use different iceberg versions for different sparks.

+1
it seems we should not bind JDK version with iceberg version as this will impact with Spark-344 also.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackylee-ch I made a quick patch to test iceberg 1.5 + Spark-355 + JDK8, #9001

and it looks like all the unit tests passed: https://github.com/apache/incubator-gluten/actions/runs/13859404046/job/38785108628

It seems you were using #8890 to test it, sorry for that as I have removed the -Piceberg from spark-3.5.5 and jdk8 GA.

BTW, current PR has already met the problem I described before, the vanilla spark would core dump in VeloxTPCHIcebergSuite
https://github.com/apache/incubator-gluten/actions/runs/13916861429/job/38941504564?pr=9031

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jackylee-ch Thanks for point out, I didnt realize this change - so it looks like Velox backend does not support this combination JDK8 + Spark355 + Iceberg1.5

Not sure if this applies to the CK backend also. In case CK backend can work with this, could we add a new special profile so it wont impact Velox backend?

Cc: @baibaichen

Copy link
Copy Markdown
Contributor

@jackylee-ch jackylee-ch Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this core dump issue occurs in the vanilla Spark 3.5.5 environment with Iceberg 1.5.0, and it is unrelated to the Velox backend. Similar problems can also be encountered when using the CH backend.
@jlfsdtc you can double check for this~

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we try to build with JD8 and Spark 355, iceberg should not be included. The changes here don't seem necessary to me. Besides, we need to define iceberg.version in spark config file instead of jdk config file so that we can use different iceberg versions for different sparks.

+1 it seems we should not bind JDK version with iceberg version as this will impact with Spark-344 also.

en, i lose of that. Spark-344 is another version of Iceberg

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jlfsdtc jlfsdtc force-pushed the upgrade_spark_3.5.5 branch from 4c38607 to 8732836 Compare March 18, 2025 06:29
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jlfsdtc jlfsdtc force-pushed the upgrade_spark_3.5.5 branch from 8732836 to 506c725 Compare March 18, 2025 06:29
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI on x86

@jackylee-ch
Copy link
Copy Markdown
Contributor

@ jlfsdtc
BTW, It seems that you meet the failed problem with show create_table.sql, you need reinstall sql tests for Spark 3.5.5, just like I did in another PR.

@github-actions
Copy link
Copy Markdown

github-actions bot commented May 5, 2025

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label May 5, 2025
@github-actions
Copy link
Copy Markdown

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this May 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLICKHOUSE CORE works for Gluten Core stale stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants