-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-124] Flink and Spark running Examples WordCountIT #703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R: @kennknowles has a lot of existing state here, maybe you can take a look? |
examples/java/pom.xml
Outdated
| <plugin> | ||
| <groupId>org.apache.maven.plugins</groupId> | ||
| <artifactId>maven-dependency-plugin</artifactId> | ||
| <version>2.10</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These versions are managed in the root pom.xml in the pluginManagement section. But in fact it is already configured to ignoreNonCompile, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but actually I remove the failOnWarning setting to make it as default (false). failOnWarning makes the project build failed due to unused spark and flink runner dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is caused by them being compile dependencies. If you make them runtime it will be fine. Or, if you do reference them in the code, mark them optional.
|
Took a look. At a high level:
The Jenkins failure looks like it was just canceled, maybe due to a Jenkins restart. |
|
I think #539 took care of most of the changes applied to the Spark runner here, so a rebase is in place.
|
d8581c7 to
72c5af7
Compare
|
Thanks @kennknowles @amitsela. Rebased from master. Only change is adding runtime dependencies to example/java to support Flink and Spark runner. PTAL |
|
+R: @jasonkuster |
|
LGTM |
|
+R: @lukecwik. Can you take a look? Thanks. |
|
LGTM for Spark dependencies. |
| * <p>Input text document is available from the following sources: | ||
| * <ul> | ||
| * <li>Using GCS (default): | ||
| * gs://dataflow-samples/shakespeare/kinglear.tx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kinglear.tx -> kinglear.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
ab37b8c to
1770eeb
Compare
|
PTAL I don't have permission to put test file to gs://dataflow-samples/apache/LICENSE. Need to be done before merge to master. |
| private static final Logger LOG = LoggerFactory.getLogger(WordCountOnSuccessMatcher.class); | ||
|
|
||
| private static final String EXPECTED_CHECKSUM = "8ae94f799f97cfd1cb5e8125951b32dfb52e1f12"; | ||
| private static final String EXPECTED_CHECKSUM = "c04722202dee29c442b55ead54c6000693e85e77"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the input file is customizable, then the checksum needs to be customizable as well. Move this to the WordCountITOptions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea. In fact, I'll writing another PR which create a FileChecksumMatcher to make this WordCountMatcher more general. I think I can make changes in that PR.
Be sure to do all of the following to help us incorporate your contribution
quickly and easily:
[BEAM-<Jira issue #>] Description of pull requestmvn clean verify. (Even better, enableTravis-CI on your fork and ensure the whole test matrix passes).
<Jira issue #>in the title with the actual Jira issuenumber, if there is one.
Individual Contributor License Agreement.
+R: @dhalperi @jasonkuster
This PR is an updated version of an old PR.
Removes dependencies of Spark and Flink runner on Beam java example in order to run WordCountIT with Spark and Flink runner successfully. The following command is used for different runner:
mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=/tmp", "--inputFile=/tmp/kinglear.txt", "--runner=org.apache.beam.runners.spark.SparkRunner" ]'mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.runners.flink.FlinkRunner" ]'mvn clean verify -pl examples/java -DskipIT=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner" ]'