-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-11626] Guava version 25.1-jre for Hadoop/Cassandra and Guava version 30.1 (latest) for the rest #13740
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Run Java PostCommit |
Codecov Report
@@ Coverage Diff @@
## master #13740 +/- ##
==========================================
- Coverage 82.75% 82.74% -0.01%
==========================================
Files 466 466
Lines 57527 57543 +16
==========================================
+ Hits 47607 47615 +8
- Misses 9920 9928 +8
Continue to review full report at Codecov.
|
|
Run Java_Examples_Dataflow PreCommit failed. Retrying. |
|
Run Java_Examples_Dataflow PreCommit |
|
Java PostCommit check failed Retrying. |
|
Run Java PostCommit |
|
Another error. |
|
Run Java PostCommit |
|
The test passed. Where is Cassandra problem now? |
|
Run SQL PostCommit |
|
Run Java HadoopFormatIO Performance Test |
|
Run Dataflow ValidatesRunner |
|
Run Spark ValidatesRunner |
|
Run SQL Postcommit |
|
Run SQL Postcommit |
|
Run Spark ValidatesRunner |
|
Run Dataflow ValidatesRunner |
|
Run Java HadoopFormatIO Performance Test |
|
Run Java PostCommit |
|
Run Java_Examples_Dataflow PreCommit |
|
Now "Run Java PreCommit" failed and shows what I was looking for https://ci-beam.apache.org/job/beam_PreCommit_Java_Commit/15498/#showFailuresLink |
|
Run Java PreCommit |
|
Run SQL Postcommit |
|
Run Spark ValidatesRunner |
|
Run Dataflow ValidatesRunner |
|
Run Java HadoopFormatIO Performance Test |
|
Run Java PostCommit |
|
Run Java_Examples_Dataflow PreCommit |
|
Run Java PreCommit |
|
Run Java PostCommit |
|
Run Java PreCommit Java precommit failed twice: |
|
Run SQL Postcommit |
|
Run Spark ValidatesRunner |
|
Run Dataflow ValidatesRunner |
|
Run Java HadoopFormatIO Performance Test |
|
Run Java PostCommit |
|
Run Java_Examples_Dataflow PreCommit |
|
Run Java_Examples_Dataflow PreCommit |
| // Try to keep grpc_version consistent with gRPC version in google_cloud_platform_libraries_bom | ||
| def grpc_version = "1.32.2" | ||
| def guava_version = "25.1-jre" | ||
| def guava_version = guava25Projects.contains(project.path) ? "25.1-jre" : "30.1-jre" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would not this be problematic, causing Beam to depend on 2 different versions? Which version, users of Beam will be depending if they need to use Beam with one of these 3 projects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which version, users of Beam will be depending if they need to use Beam with one of these 3 projects?
There's no impact to the Beam Cassandra and Hadoop artifacts. The Maven artifact org.apache.beam:beam-sdks-java-io-hadoop-format:2.27.0, org.apache.beam:beam-sdks-java-io-cassandra:2.27.0, or org.apache.beam:beam-sdks-java-io-hadoop-file-system:2.27.0 does not declare Guava dependency.
However, if Beam Cassandra / Hadoop users use Beam with beam-sdks-java-io-kinesis, beam-sdks-java-io-google-cloud-platform, or beam-sdks-java-extensions-sql-zetasql (they declare Guava dependency), then the users need to pin Guava version to 25.1-jre. They can use <dependencyManagement> for Maven and force for Gradle.
If the Beam users don't depend on any of beam-sdks-java-io-kinesis, beam-sdks-java-io-google-cloud-platform, or beam-sdks-java-extensions-sql-zetasql, then this change does not have any effect to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. I think this would be an undocumented hurdle for the impacted users. I am not sure what is the best course of action. Hopefully @kennknowles would have a recommendation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I should document that condition ("if Beam Cassandra / Hadoop users use Beam with beam-sdks-java-io-kinesis, ...") somewhere.
| * The Java artifacts "beam-sdks-java-io-kinesis", "beam-sdks-java-io-google-cloud-platform", and | ||
| "beam-sdks-java-extensions-sql-zetasql" declare Guava 30.1-jre dependency (It was 25.1-jre in Beam 2.27.0). | ||
| This new Guava version may introduce dependency conflicts if your project or dependencies rely | ||
| on removed APIs. If affected, ensure to use an appropriate Guava version via `dependencyManagement` in Maven and | ||
| `force` in Gradle. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aaltay I added this note for potential impact to Beam users. The potential risk described here is not special to this Guava version. Every dependency upgrade, in general, carries a risk of introducing dependency conflicts if a user relies on removed methods or classes. (Therefore this note might not be needed.)
| // Try to keep grpc_version consistent with gRPC version in google_cloud_platform_libraries_bom | ||
| def grpc_version = "1.32.2" | ||
| def guava_version = "25.1-jre" | ||
| def guava_version = guava25Projects.contains(project.path) ? "25.1-jre" : "30.1-jre" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We always treat library.java as a global constant. In all existing cases where a project requires a library version that deviates from library.java, we don't use library.java and instead hard-code that dependency in the project's build.gradle.
IMO making library.java conditional on the project being compiled defeats the purpose of declaring a common version in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all existing cases where a project requires a library version that deviates from library.java, we don't use library.java and instead hard-code that dependency in the project's build.gradle.
That's great information. Let me try that. I see hadoop-common does that with force. Thanks.
Memo for myself in hadoop-common:
hadoopVersions.each {kv ->
configurations."hadoopVersion$kv.key" {
resolutionStrategy {
force "org.apache.hadoop:hadoop-client:$kv.value"
force "org.apache.hadoop:hadoop-common:$kv.value"
force "org.apache.hadoop:hadoop-mapreduce-client-core:$kv.value"
}
}
}
|
Closing this in favor of #13804 |
Closed this in favor of #13804
This PR upgrades the non-vendored Guava version to the latest 30.1-jre, while keeping the version 25.1-jre for certain modules (Hadoop and Cassandra-related) that require the old version of Guava.
Why do I want the latest Guava?
When Beam publishes a recommended version of Guava for Dataflow users (#13737, WIP), I want the recommended version in line with the one in the GCP Libraries BOM (with "-jre" suffix). This is because Google Cloud client libraries are built and tested with the newer version of Guava. I want Beam's Dataflow and Google Cloud Platform modules to be built, tested, and used with the same version of Guava as much as possible.
If we don't do this PR, we would end up a situation where the GCP Libraries BOM recommends to use Guava 30 and Beam's GCP BOM recommends Guava 25.
What is the problem with Guava 25?
When a library touch classes or methods that only exist in the newer version of Guava, it fails with NoClassDefFoundError or NoSuchMethodError. For example, gcsio uses
Uninterruptibles.sleepUninterruptibly(java.time.Duration)in it and Linkage Checker detects the usage:The method only exists in Guava 28 or higher. This might not be a problem for Dataflow-only users for now, but this may cause other use cases of the library. Therefore, I want to recommend the newer version of Guava to GCP users.
Problem with newer Guava version in Hadoop/Cassandra
If I naively upgrade the Guava version to 30.1-jre, the tests failed with
NoSuchMethodErrorforFutures.addCallbackandNoSuchFieldErrorforDIGIT(CharMatcher). Details are in BEAM-11626.This PR fixes the problem by keeping the Guava version lower for the Hadoop/Cassandra-related modules.
Where is the Guava dependency declared?
The following Gradle modules declare dependency to the guava variable:
Other than tests, the 3 modules declaring the Guava dependencies are
sdks/java/io/kinesis,sdks/java/io/google-cloud-platform, andsdks/java/extensions/sql/zetasql.sdks/java/io/kinesismodule hascom.amazonaws:amazon-kinesis-client:1.13.0built with Guava 26.0-jrecom.amazonaws:amazon-kinesis-producer:0.14.1built with Guava 24.1.1-jreorg.apache.hadoop:hadoop-yarn-common:2.10.1and Guava 30. The conflict already exists in Guava 29. Therefore, there is no problem declaring dependency with Guava 30.The
sdks/java/maven-archetypes/examplesmodule is tricky one. I want Hadoop/Cassandra users to use Guava 25.1 and others to use Guava 30.What's the impact to Beam's Cassandra / Hadoop users?
There's no impact to the Beam Cassandra and Hadoop artifacts. The Maven artifact
org.apache.beam:beam-sdks-java-io-hadoop-format:2.27.0,org.apache.beam:beam-sdks-java-io-cassandra:2.27.0, or org.apache.beam:beam-sdks-java-io-hadoop-file-system:2.27.0does not declare Guava dependency.
Instruction for Hadoop / Cassandra Beam users
If Beam Cassandra / Hadoop users use Beam with beam-sdks-java-io-kinesis, beam-sdks-java-io-google-cloud-platform, or beam-sdks-java-extensions-sql-zetasql, then the users need to pin Guava version to 25.1-jre. They can use
<dependencyManagement>for Maven andforcefor Gradle.Linkage Checker
The
sdks/java/build-tools/beam-linkage-check.shfound a new conflict in a dependency oforg.apache.hadoop:hadoop-client:2.10.1 (provided)in beam-sdks-java-extensions-sql-zetasql module.https://gist.github.com/suztomo/e5fa71d8a0800dbbbc9cd2626d50730e
If the zeta sql module happens to be used with YARN web app, then "Instruction for Hadoop / Cassandra Beam users" applies here to resolve the incompatibility.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.