[BEAM-124] Spark Running WordCountIT Example #769

markflyhigh · 2016-08-02T22:16:30Z

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

Make sure the PR title is formatted like:
[BEAM-<Jira issue #>] Description of pull request
Make sure tests pass via mvn clean verify. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
Replace <Jira issue #> in the title with the actual Jira issue
number, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.

Add Spark dependency in order to use SparkRunner to execute WordCountIT.
Change default test file and add to project which avoid the problem that SparkRunner can't resove gs:// right now.
New test input is: gs://apache-beam-samples/apache/LICENSE

Following command is used to run WordCountIT with SparkRunner:

mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=/tmp", "--runner=org.apache.beam.runners.spark.SparkRunner" ]'

This PR is duplicated from PR(#703), since we want to have Flink and Spark in separate review.

markflyhigh · 2016-08-02T22:18:58Z

+R: @amitsela @dhalperi

jasonkuster · 2016-08-02T22:21:25Z

Are we modifying wordcount so that it counts the words in the apache license instead of the shakespeare example?

markflyhigh · 2016-08-02T22:46:54Z

For the reason of SparkRunner can't resolve gs:// right now. Also there is a concern that using apache license instead of shakespeare example can avoid apache license issue if we want to put it in the project build.

jasonkuster · 2016-08-02T22:49:55Z

Sure, makes sense to me.

jasonkuster · 2016-08-08T22:04:42Z

"Unable to find any files matching gs://dataflow-samples/apache/LICENSE" from the Jenkins output - looks like until we get that updated this will break presubmits.

markflyhigh · 2016-08-08T22:14:03Z

@jasonkuster Sorry for point it out. I don't have write access right to "gs://dataflow-samples/", can you give me the authentication or help me upload the file?

lukecwik · 2016-08-08T22:28:47Z

examples/java/src/main/java/org/apache/beam/examples/WordCount.java

+    public static class InputFactory implements DefaultValueFactory<String> {
+      @Override
+      public String create(PipelineOptions options) {
+        if (options.getRunner().isAssignableFrom(SparkRunner.class)) {


Do we want to have everyone use GCS by default?

What if Dataflow was the only one that used the GCS one?

Also, this sets a poor precedent where there is "runner" specific configuration being done on a per test basis.

Yes, we want everyone use GCS by default, and FilnkRunner already support it. But WordCountIT can't use SparkRunner with path starting with "gs://" as for as I know. This is one tmp solution in order to aggregate this E2E test to pre/post-submit test. Otherwise, SparkRunner side will be a blocker.

Can we construct the input file path in WordCountIT, and pass it to WordCount?

yes, use --inputFile flag. Put this key value pair inside -DintegrationTestPipelineOptions.

lukecwik · 2016-08-09T15:58:30Z

R: @lukecwik

markflyhigh · 2016-08-09T17:03:18Z

Working with @dhalperi to put new test data in a proper directory.

markflyhigh · 2016-08-09T22:57:55Z

PTAL @lukecwik

New test input is updated on GCS and verified.
Solve the merge conflicts.

lukecwik · 2016-08-09T23:01:00Z

LGTM, will merge once jenkins/travis runs finish

This closes #769

lukecwik reviewed Aug 8, 2016
View reviewed changes

Mark Liu and others added 2 commits August 9, 2016 15:28

[BEAM-124] Spark Running WordCountIT Example

9983ff9

Change WordCount default input

e979c82

markflyhigh force-pushed the wordcount-e2e-spark-runner branch from 1e6ec6e to e979c82 Compare August 9, 2016 22:54

asfgit merged commit e979c82 into apache:master Aug 10, 2016

asfgit pushed a commit that referenced this pull request Aug 10, 2016

[BEAM-124] Spark Running WordCountIT Example

a035796

This closes #769

markflyhigh deleted the wordcount-e2e-spark-runner branch November 7, 2016 22:48

[BEAM-124] Spark Running WordCountIT Example #769

[BEAM-124] Spark Running WordCountIT Example #769

Uh oh!

Conversation

markflyhigh commented Aug 2, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

markflyhigh commented Aug 2, 2016

Uh oh!

jasonkuster commented Aug 2, 2016

Uh oh!

markflyhigh commented Aug 2, 2016

Uh oh!

jasonkuster commented Aug 2, 2016

Uh oh!

jasonkuster commented Aug 8, 2016

Uh oh!

markflyhigh commented Aug 8, 2016

Uh oh!

lukecwik Aug 8, 2016

Choose a reason for hiding this comment

Uh oh!

markflyhigh Aug 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peihe Aug 19, 2016

Choose a reason for hiding this comment

Uh oh!

markflyhigh Aug 19, 2016

Choose a reason for hiding this comment

Uh oh!

lukecwik commented Aug 9, 2016

Uh oh!

markflyhigh commented Aug 9, 2016

Uh oh!

markflyhigh commented Aug 9, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukecwik commented Aug 9, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

markflyhigh commented Aug 2, 2016 •

edited

Loading

markflyhigh Aug 8, 2016 •

edited

Loading

markflyhigh commented Aug 9, 2016 •

edited

Loading