Skip to content

Conversation

@tgroh
Copy link
Member

@tgroh tgroh commented Apr 20, 2016

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

  • Make sure the PR title is formatted like:
    [BEAM-<Jira issue #>] Description of pull request
  • Make sure tests pass via mvn clean verify. (Even better, enable
    Travis-CI on your fork and ensure the whole test matrix passes).
  • Replace <Jira issue #> in the title with the actual Jira issue
    number, if there is one.
  • If this contribution is large, please file an Apache
    Individual Contributor License Agreement.

This allows existing pipelines to continue to function by keeping the
graph structure identical while replacing Create with a Read.

After BEAM-17 this override can be removed.

See #183 for more information.

@tgroh
Copy link
Member Author

tgroh commented Apr 20, 2016

R: @amitsela
CC: @kennknowles

This allows existing pipelines to continue to function by keeping the
graph structure identical while replacing Create with a Read.
@tgroh tgroh force-pushed the spark_override_create branch from 78afdb3 to f98addc Compare April 20, 2016 16:07
import org.apache.beam.sdk.values.PCollection.IsBounded;
import org.apache.beam.sdk.values.PInput;

public class SinglePrimitiveOutputPTransform<T> extends PTransform<PInput, PCollection<T>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the name change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

This could be a more specific override (probably should be, in order to facilitate quick removal), but as written is a relatively general override

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually a bit confused as to how/whether this works. The translator is expecting Create.Values

I think the two options are:

  1. Wait until the Spark runner supports Read.
  2. Override Create.Values to a Spark-specific clone of it as here, but alter the translator to translate the new class.

Am I missing something?

@amitsela
Copy link
Member

amitsela commented Apr 20, 2016

@amitsela
Copy link
Member

If the objective is to allow existing pipeline implementations with Create.Values to construct the pipeline with Read instead - I guess that SinglePrimitiveOutputPTransform should be translated to the runner's create() instead of Create.Values

@amitsela
Copy link
Member

amitsela commented Apr 20, 2016

I seems as if https://github.com/apache/incubator-beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/Pipeline.java#L369 still get's Create.Values as the transform... or that's actually the entry point to translation..
I'm not completely sure this override applies.. I'll take a deeper look.

@tgroh
Copy link
Member Author

tgroh commented Apr 20, 2016

The use of super.apply(Override, PInput) as opposed of input.apply(PTransform) causes the Pipeline to replace the contents of the Create.Values apply method with the contents of the Override, while keeping the transform noted in the TransformTreeNode representing the transform equivalent.

@kennknowles
Copy link
Member

Ah, that is excellent. It should also be done for the GroupByKey override.

@kennknowles
Copy link
Member

(and it is also very confusing - hence working to remove this code path)

@amitsela
Copy link
Member

Oh.. OK. Awesome.
So is Create.Values being deprecated ? And this is a "transition" solution ?

@tgroh
Copy link
Member Author

tgroh commented Apr 20, 2016

Create.Values is being removed from the list of model-primitive transforms, and replaced with an implementation based on Read.Bounded

This is a transition solution that allows the implementation of Create.Values to remain a Primitive that the SparkPipelineRunner supports and allowing the default implementation of Create to move to a composite built on top of Read.Bounded (in #183).

When the SparkPipelineRunner supports the Read primitive, this should be removed, and we can get rid of create in the Spark TransformTranslator.

@kennknowles
Copy link
Member

Yea, the primitive transforms are moving to the list here. We've been altering the SDK to match, while adding overrides to the runners so there is no behavioral change.

@amitsela
Copy link
Member

Yep, I remember now :)
so +1 from me.
BTW @tgroh the behaviour you described in super.apply(Override, PInput) will only work for PTransform<PInput, PCollection<T>> or also for transformations like GroupByKey ?

@tgroh
Copy link
Member Author

tgroh commented Apr 20, 2016

It works for arbitrary PTransforms

@dhalperi
Copy link
Contributor

@tgroh looks like you have LGTMs from everyone? I am happy to play the role of mergebot if you need a committer in order to unblock you on #183

@asfgit asfgit closed this in de601a8 Apr 20, 2016
iemejia pushed a commit to iemejia/beam that referenced this pull request Jan 12, 2018
pl04351820 pushed a commit to pl04351820/beam that referenced this pull request Dec 20, 2023
Closes apache#214.
Closes apache#215.
Closes apache#216.

Co-authored-by: Christopher Wilcox <crwilcox@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants