[BEAM-124] Flink and Spark running Examples WordCountIT #345

jasonkuster · 2016-05-17T19:00:58Z

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

Make sure the PR title is formatted like:
[BEAM-<Jira issue #>] Description of pull request
Make sure tests pass via mvn clean verify. (Even better, enable
Travis-CI on your fork and ensure the whole test matrix passes).
Replace <Jira issue #> in the title with the actual Jira issue
number, if there is one.
If this contribution is large, please file an Apache
Individual Contributor License Agreement.

…park as supported runners in examples. Signed-off-by: Jason Kuster <jason@google.com>

jasonkuster · 2016-05-17T19:06:06Z

Hey Max, Amit.

Looking for some feedback on this pull request. The purpose is to remove the dependencies of Spark and Flink runner on Beam to enable them to run the WordCountIT in examples, as Dataflow currently does. As things were in the codebase, both Spark and Flink depended on examples for some of the code in WordCount.java and TfIdf.java. In the Flink case I've removed the tests; in the Spark case I've just added the code in. I'd love to hear your guys' thoughts on what the right thing to do is going forward.

The benefit we get from this is that this is where the new End-to-End tests seem to be going, such that they can be written in a runner-agnostic way and then run just by flipping a few flags (for example, see the commands below for running this test). Let me know your thoughts!

mvn clean verify -pl examples/java -am -rf :java-examples-all -DskipITs=false -DintegrationTestPipelineOptions='[ "--tempRoot=/tmp", "--inputFile=/tmp/kinglear.txt", "--runner=org.apache.beam.runners.spark.SparkPipelineRunner", "--sparkMaster=local" ]'

mvn clean verify -pl examples/java -am -rf :java-examples-all -DskipITs=false -DintegrationTestPipelineOptions='[ "--tempRoot=/tmp", "--inputFile=/tmp/kinglear.txt", "--runner=org.apache.beam.runners.flink.FlinkPipelineRunner" ]

mvn clean verify -pl examples/java -am -rf :java-examples-all -DskipITs=false -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.sdk.testing.TestDataflowPipelineRunner" ]'

Jason

aljoscha · 2016-05-18T12:23:05Z

I think this is the right way to go. In #343 I'm also removing these two examples because all RunnableOnService tests will be executed on Flink with those changes.

amitsela · 2016-05-18T16:44:18Z

runners/spark/src/test/java/org/apache/beam/runners/spark/io/NumShardsTest.java

+   * Count) as a reusable PTransform subclass. Using composite transforms allows for easy reuse,
+   * modular testing, and an improved monitoring experience.
+   */
+  public static class CountWords extends PTransform<PCollection<String>,


Could use SimpleWordCountTest.CountWords instead. Maybe need to make a small change to the format function.

I'll take a look. Thanks!

amitsela · 2016-05-18T16:55:20Z

I generally agree with @aljoscha and once #294 is done, and RunnableOnService tests will cover those use cases, they might be removed from Spark runner tests as well.
@davorbonaci your thoughts on the examples pom.xml ?

mxm · 2016-05-23T15:33:55Z

Hi @jasonkuster! +1 for enabling end-to-end tests for all Runners. A couple questions: I wonder why do you remove a Flink TfIdf integration test and add one for Spark? ☺️ Presumably because the RunnableOnService tests are not yet integrated with the Spark Runner?

jasonkuster · 2016-05-23T16:29:57Z

Hey @mxm! I removed in Flink and added in Spark just to see what the two different methods of resolving the dependency issues would look like. I'm happy to do either for either one, but based on the above comments it looks like the RunnableOnService tests are in process on both Spark and Flink, so once those are done and in it sounds like the right thing to do is just to remove the offending tests. I'm flexible though - my goal is just to get the E2E tests running everywhere. 😄

davorbonaci · 2016-05-24T12:29:02Z

(should be rebased, given relevant changes to the pom.)

mxm · 2016-05-24T12:44:33Z

@jasonkuster The RunnableOnService tests are integrated with the Flink Runner for batched execution. So removing batch examples is fine. The streaming side still needs side inputs to support the tests.

+1 for merging from my side (needs rebasing though)

Add KafkaIO to Contrib KafkaIO is an Unbounded source for reading from Apache Kafka. Backports KafkaIO from Apache Beam. See apache/incubator-beam 7b175df

Manually add portability page to content

Remove Flink, Spark dependencies on Beam examples and add Flink and S…

d09ee4c

…park as supported runners in examples. Signed-off-by: Jason Kuster <jason@google.com>

jasonkuster changed the title ~~[BEAM-] Flink and Spark running Examples WordCountIT~~ [BEAM-124] Flink and Spark running Examples WordCountIT May 17, 2016

amitsela reviewed May 18, 2016
View reviewed changes

aljoscha mentioned this pull request May 18, 2016

[BEAM-286] Reorganize flink runner module to follow other runners str… #348

Closed

4 tasks

jasonkuster mentioned this pull request Jun 8, 2016

[BEAM-259] Configure RunnableOnService tests for Spark runner, batch mode #294

Merged

4 tasks

markflyhigh mentioned this pull request Jul 20, 2016

[BEAM-124] Flink and Spark running Examples WordCountIT #703

Closed

4 tasks

jasonkuster closed this Aug 8, 2016

dhalperi pushed a commit to dhalperi/beam that referenced this pull request Aug 23, 2016

Add KafkaIO to Contrib (apache#345)

411c84b

Add KafkaIO to Contrib KafkaIO is an Unbounded source for reading from Apache Kafka. Backports KafkaIO from Apache Beam. See apache/incubator-beam 7b175df

iemejia pushed a commit to iemejia/beam that referenced this pull request Jan 12, 2018

Merge pull request apache#345

8f31c78

Manually add portability page to content

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BEAM-124] Flink and Spark running Examples WordCountIT #345

[BEAM-124] Flink and Spark running Examples WordCountIT #345

Uh oh!

jasonkuster commented May 17, 2016 •

edited

Loading

Uh oh!

jasonkuster commented May 17, 2016

Uh oh!

aljoscha commented May 18, 2016

Uh oh!

amitsela May 18, 2016

Uh oh!

jasonkuster May 18, 2016

Uh oh!

amitsela commented May 18, 2016

Uh oh!

mxm commented May 23, 2016

Uh oh!

jasonkuster commented May 23, 2016

Uh oh!

davorbonaci commented May 24, 2016

Uh oh!

mxm commented May 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[BEAM-124] Flink and Spark running Examples WordCountIT #345

[BEAM-124] Flink and Spark running Examples WordCountIT #345

Uh oh!

Conversation

jasonkuster commented May 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jasonkuster commented May 17, 2016

Uh oh!

aljoscha commented May 18, 2016

Uh oh!

amitsela May 18, 2016

Choose a reason for hiding this comment

Uh oh!

jasonkuster May 18, 2016

Choose a reason for hiding this comment

Uh oh!

amitsela commented May 18, 2016

Uh oh!

mxm commented May 23, 2016

Uh oh!

jasonkuster commented May 23, 2016

Uh oh!

davorbonaci commented May 24, 2016

Uh oh!

mxm commented May 24, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jasonkuster commented May 17, 2016 •

edited

Loading