[BEAM-124] Flink and Spark running Examples WordCountIT #703

markflyhigh · 2016-07-20T21:11:39Z

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

Make sure the PR title is formatted like:
[BEAM-<Jira issue #>] Description of pull request
Make sure tests pass via mvn clean verify. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
Replace <Jira issue #> in the title with the actual Jira issue
number, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.

This PR is an updated version of an old PR.

Removes dependencies of Spark and Flink runner on Beam java example in order to run WordCountIT with Spark and Flink runner successfully. The following command is used for different runner:

Spark runner:

mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=/tmp", "--inputFile=/tmp/kinglear.txt", "--runner=org.apache.beam.runners.spark.SparkRunner" ]'

Flink runner:

mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.runners.flink.FlinkRunner" ]'

Dataflow test runner:

mvn clean verify -pl examples/java -DskipIT=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner" ]'

dhalperi · 2016-07-20T21:54:38Z

R: @kennknowles has a lot of existing state here, maybe you can take a look?

kennknowles · 2016-07-22T17:38:47Z

examples/java/pom.xml

+      <plugin>
+        <groupId>org.apache.maven.plugins</groupId>
+        <artifactId>maven-dependency-plugin</artifactId>
+        <version>2.10</version>


These versions are managed in the root pom.xml in the pluginManagement section. But in fact it is already configured to ignoreNonCompile, isn't it?

Yes, but actually I remove the failOnWarning setting to make it as default (false). failOnWarning makes the project build failed due to unused spark and flink runner dependency.

Yes, that is caused by them being compile dependencies. If you make them runtime it will be fine. Or, if you do reference them in the code, mark them optional.

kennknowles · 2016-07-22T17:50:19Z

Took a look. At a high level:

Eliminate the dependency on examples more forcefully.
This is related and probably conflicts with [BEAM-380] Remove Spark runner dependency on beam-examlpes-java #539.
Get feedback from @amitsela on what integration tests are most important to keep running continuously, and which are now covered by the RunnableOnService tests.
Is this actually running in the presubmit?

The Jenkins failure looks like it was just canceled, maybe due to a Jenkins restart.

amitsela · 2016-07-23T09:09:43Z

I think #539 took care of most of the changes applied to the Spark runner here, so a rebase is in place.
As for RunnableOnService coverage:

Streaming tests are definitely not covered for obvious reasons.
IO tests are unique as well.
I think the rest of the tests (that should keep running continuously) are covered by RunnableOnService. There are also tests such as SerializationTest and SideEffectsTest which I'm not sure are covered by RunnableOnService but also but sure they should be running constantly.
@kennknowles for the Spark runner in Batch mode, is there anything else missing besides Read.Bound support in order for the runner to execute all Batch tests ? I have this worked in the Spark runner 2 branch and I'm thinking of backporting this into the runner's current version so if that would help, I'll do that.

markflyhigh · 2016-07-25T17:20:43Z

Thanks @kennknowles @amitsela. Rebased from master. Only change is adding runtime dependencies to example/java to support Flink and Spark runner.

PTAL

markflyhigh · 2016-07-26T17:34:35Z

+R: @jasonkuster

jasonkuster · 2016-07-27T00:00:19Z

LGTM

markflyhigh · 2016-07-27T21:59:54Z

+R: @lukecwik. Can you take a look? Thanks.

amitsela · 2016-07-28T07:05:44Z

LGTM for Spark dependencies.

lukecwik · 2016-07-28T13:46:10Z

examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java

+ * <p>Input text document is available from the following sources:
+ * <ul>
+ * <li>Using GCS (default):
+ *   gs://dataflow-samples/shakespeare/kinglear.tx


kinglear.tx -> kinglear.txt

markflyhigh · 2016-08-01T16:10:20Z

PTAL

I don't have permission to put test file to gs://dataflow-samples/apache/LICENSE. Need to be done before merge to master.

@jasonkuster @kennknowles

dhalperi · 2016-08-02T16:45:33Z

examples/java/src/test/java/org/apache/beam/examples/WordCountIT.java

    private static final Logger LOG = LoggerFactory.getLogger(WordCountOnSuccessMatcher.class);

-    private static final String EXPECTED_CHECKSUM = "8ae94f799f97cfd1cb5e8125951b32dfb52e1f12";
+    private static final String EXPECTED_CHECKSUM = "c04722202dee29c442b55ead54c6000693e85e77";


If the input file is customizable, then the checksum needs to be customizable as well. Move this to the WordCountITOptions

good idea. In fact, I'll writing another PR which create a FileChecksumMatcher to make this WordCountMatcher more general. I think I can make changes in that PR.

kennknowles reviewed Jul 22, 2016
View reviewed changes

markflyhigh force-pushed the wordcountit-spark-flink branch from d8581c7 to 72c5af7 Compare July 25, 2016 17:16

lukecwik reviewed Jul 28, 2016
View reviewed changes

Mark Liu and others added 3 commits July 29, 2016 16:59

[Beam-124] Run WorldCountIT with Spark and Flink runner

cc7a95d

Adding test input source

a2a231b

change default WordCount input file to apache lisence

1770eeb

markflyhigh force-pushed the wordcountit-spark-flink branch from ab37b8c to 1770eeb Compare July 30, 2016 00:00

change test checksum to file apache license

6fec07d

dhalperi reviewed Aug 2, 2016
View reviewed changes

markflyhigh mentioned this pull request Aug 3, 2016

[BEAM-124] Spark Running WordCountIT Example #769

Merged

4 tasks

markflyhigh closed this Aug 8, 2016

markflyhigh deleted the wordcountit-spark-flink branch November 7, 2016 22:48

[BEAM-124] Flink and Spark running Examples WordCountIT #703

[BEAM-124] Flink and Spark running Examples WordCountIT #703

Uh oh!

Conversation

markflyhigh commented Jul 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhalperi commented Jul 20, 2016

Uh oh!

kennknowles Jul 22, 2016

Choose a reason for hiding this comment

Uh oh!

markflyhigh Jul 22, 2016

Choose a reason for hiding this comment

Uh oh!

kennknowles Jul 22, 2016

Choose a reason for hiding this comment

Uh oh!

kennknowles commented Jul 22, 2016

Uh oh!

amitsela commented Jul 23, 2016

Uh oh!

markflyhigh commented Jul 25, 2016

Uh oh!

markflyhigh commented Jul 26, 2016

Uh oh!

jasonkuster commented Jul 27, 2016

Uh oh!

markflyhigh commented Jul 27, 2016

Uh oh!

amitsela commented Jul 28, 2016

Uh oh!

lukecwik Jul 28, 2016

Choose a reason for hiding this comment

Uh oh!

markflyhigh Jul 30, 2016

Choose a reason for hiding this comment

Uh oh!

markflyhigh commented Aug 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhalperi Aug 2, 2016

Choose a reason for hiding this comment

Uh oh!

markflyhigh Aug 2, 2016

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

markflyhigh commented Jul 20, 2016 •

edited

Loading

markflyhigh commented Aug 1, 2016 •

edited

Loading