Skip to content

Conversation

@markflyhigh
Copy link
Contributor

@markflyhigh markflyhigh commented Jul 20, 2016

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

+R: @dhalperi @jasonkuster

This PR is an updated version of an old PR.

Removes dependencies of Spark and Flink runner on Beam java example in order to run WordCountIT with Spark and Flink runner successfully. The following command is used for different runner:

  • Spark runner:

mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=/tmp", "--inputFile=/tmp/kinglear.txt", "--runner=org.apache.beam.runners.spark.SparkRunner" ]'

  • Flink runner:

mvn clean verify -pl examples/java -DskipITs=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.runners.flink.FlinkRunner" ]'

  • Dataflow test runner:

mvn clean verify -pl examples/java -DskipIT=false -Dit.test=WordCountIT -DintegrationTestPipelineOptions='[ "--tempRoot=gs://clouddfe-testing-temp-storage", "--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner" ]'

@dhalperi
Copy link
Contributor

R: @kennknowles has a lot of existing state here, maybe you can take a look?

<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<version>2.10</version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These versions are managed in the root pom.xml in the pluginManagement section. But in fact it is already configured to ignoreNonCompile, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but actually I remove the failOnWarning setting to make it as default (false). failOnWarning makes the project build failed due to unused spark and flink runner dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is caused by them being compile dependencies. If you make them runtime it will be fine. Or, if you do reference them in the code, mark them optional.

@kennknowles
Copy link
Member

Took a look. At a high level:

The Jenkins failure looks like it was just canceled, maybe due to a Jenkins restart.

@amitsela
Copy link
Member

I think #539 took care of most of the changes applied to the Spark runner here, so a rebase is in place.
As for RunnableOnService coverage:

  • Streaming tests are definitely not covered for obvious reasons.
  • IO tests are unique as well.
    I think the rest of the tests (that should keep running continuously) are covered by RunnableOnService. There are also tests such as SerializationTest and SideEffectsTest which I'm not sure are covered by RunnableOnService but also but sure they should be running constantly.
    @kennknowles for the Spark runner in Batch mode, is there anything else missing besides Read.Bound support in order for the runner to execute all Batch tests ? I have this worked in the Spark runner 2 branch and I'm thinking of backporting this into the runner's current version so if that would help, I'll do that.

@markflyhigh markflyhigh force-pushed the wordcountit-spark-flink branch from d8581c7 to 72c5af7 Compare July 25, 2016 17:16
@markflyhigh
Copy link
Contributor Author

Thanks @kennknowles @amitsela. Rebased from master. Only change is adding runtime dependencies to example/java to support Flink and Spark runner.

PTAL

@markflyhigh
Copy link
Contributor Author

+R: @jasonkuster

@jasonkuster
Copy link
Contributor

LGTM

@markflyhigh
Copy link
Contributor Author

+R: @lukecwik. Can you take a look? Thanks.

@amitsela
Copy link
Member

LGTM for Spark dependencies.

* <p>Input text document is available from the following sources:
* <ul>
* <li>Using GCS (default):
* gs://dataflow-samples/shakespeare/kinglear.tx
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kinglear.tx -> kinglear.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@markflyhigh markflyhigh force-pushed the wordcountit-spark-flink branch from ab37b8c to 1770eeb Compare July 30, 2016 00:00
@markflyhigh
Copy link
Contributor Author

markflyhigh commented Aug 1, 2016

PTAL

I don't have permission to put test file to gs://dataflow-samples/apache/LICENSE. Need to be done before merge to master.

@jasonkuster @kennknowles

private static final Logger LOG = LoggerFactory.getLogger(WordCountOnSuccessMatcher.class);

private static final String EXPECTED_CHECKSUM = "8ae94f799f97cfd1cb5e8125951b32dfb52e1f12";
private static final String EXPECTED_CHECKSUM = "c04722202dee29c442b55ead54c6000693e85e77";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the input file is customizable, then the checksum needs to be customizable as well. Move this to the WordCountITOptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea. In fact, I'll writing another PR which create a FileChecksumMatcher to make this WordCountMatcher more general. I think I can make changes in that PR.

@markflyhigh markflyhigh closed this Aug 8, 2016
@markflyhigh markflyhigh deleted the wordcountit-spark-flink branch November 7, 2016 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants