Skip to content

Conversation

@peihe
Copy link
Contributor

@peihe peihe commented Jun 25, 2016

No description provided.

@peihe peihe force-pushed the spark-examples branch 2 times, most recently from 87a1307 to 99dd9f4 Compare June 25, 2016 00:57
@peihe
Copy link
Contributor Author

peihe commented Jun 27, 2016

R: @amitsela
Had a test failure:
ClassNotFoundException: org.apache.spark.api.java.JavaSparkContext

How the spark-core library is provided in the previous tests?

opts.setRunner(SparkRunner.class);
Pipeline pipeline = Pipeline.create(opts);
public void testE2ETfIdfSpark() throws Exception {
SparkPipelineOptions options = PipelineOptionsFactory.as(SparkPipelineOptions.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use PipelineOptionsFactory.fromArgs to be runner agnostic ? Is there a benefit in that ? maybe apply the same to all runners ?

@amitsela
Copy link
Member

I would still duplicate a WordCount copy into the Spark runner like I did in #539 because it's widely used in the runner's unit tests.
Maybe this could be removed after the runner is mature enough to rely only on the RunnableOnService tests.
And like I also said in #539, transitive dependency is your enemy here, I can't come up with something better than adding Spark provided/runtime dependencies.

This could be resolved by removing the provided scope on spark dependencies from the Spark runner, but I don't think that's a good idea. Looping in @jbonofre WDYT ? this could make the Spark runner Jar become very heavy.. and what about different Spark distributions on clusters ?

@peihe
Copy link
Contributor Author

peihe commented Jun 28, 2016

add R: @davorbonaci

@amitsela
Copy link
Member

amitsela commented Jul 7, 2016

@peihe anything new here ? because #539 is passing tests now - but like you said, it doesn't eliminate code duplication.

I don't see this working if the runner doesn't have a compile scope dependency on the engine, and at least for Spark, I'm not sure it's the best way to go.

Pinging @jbonofre: from your experience with customers, is Spark usually provided ?

@amitsela
Copy link
Member

amitsela commented Jul 7, 2016

While my point of view on things is of a Spark (+YARN) cluster, I'm starting to get the feeling that there are a lot of interest in "out-of-the-box" packaging..

Let me raise that in the mailing list to get people's thought on this, and I might change the build to either compile or use profiles or something.

@peihe peihe closed this Aug 4, 2016
@peihe peihe deleted the spark-examples branch August 21, 2017 02:25
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
Amar3tto added a commit to akvelon/beam that referenced this pull request Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants