Skip to content

Upgrade spark from 3.2 to 3.5#11702

Merged
xiangfu0 merged 2 commits intoapache:masterfrom
gortiz:spark-3.5
Oct 13, 2023
Merged

Upgrade spark from 3.2 to 3.5#11702
xiangfu0 merged 2 commits intoapache:masterfrom
gortiz:spark-3.5

Conversation

@gortiz
Copy link
Copy Markdown
Contributor

@gortiz gortiz commented Sep 28, 2023

This PR tries to blindly update from Spark 3.2.4 to 3.5.0.

See #11701

@gortiz gortiz marked this pull request as draft September 28, 2023 08:55
@gortiz gortiz mentioned this pull request Sep 28, 2023
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 28, 2023

Codecov Report

Merging #11702 (78278da) into master (bfa03fe) will decrease coverage by 48.67%.
The diff coverage is n/a.

@@              Coverage Diff              @@
##             master   #11702       +/-   ##
=============================================
- Coverage     63.12%   14.45%   -48.67%     
+ Complexity     1140      201      -939     
=============================================
  Files          2343     2343               
  Lines        126306   126306               
  Branches      19419    19419               
=============================================
- Hits          79733    18260    -61473     
- Misses        40905   106497    +65592     
+ Partials       5668     1549     -4119     
Flag Coverage Δ
integration ?
integration1 ?
integration2 ?
java-11 14.41% <ø> (-48.67%) ⬇️
java-17 14.44% <ø> (-48.53%) ⬇️
java-20 14.40% <ø> (-48.60%) ⬇️
temurin 14.45% <ø> (-48.67%) ⬇️
unittests 14.45% <ø> (-48.67%) ⬇️
unittests1 ?
unittests2 14.45% <ø> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../batch/spark3/SparkSegmentGenerationJobRunner.java 47.44% <ø> (ø)
...atch/spark3/SparkSegmentMetadataPushJobRunner.java 0.00% <ø> (ø)
...ion/batch/spark3/SparkSegmentTarPushJobRunner.java 0.00% <ø> (ø)
...ion/batch/spark3/SparkSegmentUriPushJobRunner.java 0.00% <ø> (ø)

... and 1518 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@gortiz gortiz force-pushed the spark-3.5 branch 5 times, most recently from ca24c0e to b7aa865 Compare October 4, 2023 07:21
@gortiz gortiz marked this pull request as ready for review October 4, 2023 07:21
@gortiz gortiz changed the title [Draft] Upgrade spark from 3.2 to 3.5 Upgrade spark from 3.2 to 3.5 Oct 4, 2023
@gortiz
Copy link
Copy Markdown
Contributor Author

gortiz commented Oct 4, 2023

Marked as ready to review. Last commit seems to be running correctly but failed in what it looks a flaky test.

The main issue here is what should we do with the name of module pinot-batch-ingestion-spark-3.2. In this PR we don't use 3.2 but 3.5. These versions seem to be backward compatible, so semantics should be the same, but the name of the project may be misleading.

Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rebase, overall lgtm.

@Jackie-Jiang Jackie-Jiang added the dependencies Pull requests that update a dependency file label Oct 10, 2023
@xiangfu0
Copy link
Copy Markdown
Contributor

can you rebase the PR?

@gortiz
Copy link
Copy Markdown
Contributor Author

gortiz commented Oct 13, 2023

Rebased

@xiangfu0 xiangfu0 merged commit cedac51 into apache:master Oct 13, 2023
@mcvsubbu
Copy link
Copy Markdown
Contributor

@gortiz is bumping to log4j-slf4j2 absolutely necessary? Can we revert that?

The reason is that this pulls in later version of log4j library (like you have changed), and they seem to have thread contention at high loads (we have seen that in LinkedIn multiple times).

@gortiz
Copy link
Copy Markdown
Contributor Author

gortiz commented Oct 30, 2023

I've replied in #11903 (comment). The TL;DR: is that it was necessary in order to correctly use the slf4j version Spark 3.5 expects in the classpath.

As you can see in #11903, spark 3.5 tests fail when using older slf4j versions. As explained in my comment we could:

  • Keep log4j2 in 2.17.1 and use log4j-slf4j18, which may support slf4j v2. I don't like this approach, but may be the easiest to implement.
  • Keep log4j2 in 2.20 (or update it to 2.21) and use log4j-slf4j2. The contention problem is new to me, we should report it to log4j2 trying to fix it. Keep using log4j2 in v 2.17.1 (released 2 years ago) is problematic. Imagine a new vulnerability is found in that version, we would like to be able to upgrade to the newest version that fixes it.
  • Try to make the test work somehow. This is my least preferred option because it would just silence a possible runtime error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants