-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What needs to happen?
Currently the Flink runner has 4 alternative implementations to translate a pipeline.
FlinkStreamingTransformTranslators
Based onDataStreamAPI. Only support streaming workflows. It can also support batch workflows once [Flink Runner] Add UseDataStreamForBatch option to Flink runner to enable batch execution on DataStream API #28614 is merged. It uses the "native"org.apache.beam.sdk.Pipelineclass and the implementation is based on Flink'sDataStreamAPI.FlinkStreamingPortablePipelineTranslator, used for portable streaming pipelines. It has a lot of "overlap" (copy/pasted code) withFlinkStreamingTransformTranslatorsbut supports a slightly different set of transforms. Based on FlinkDataStreamAPI too.FlinkBatchPortablePipelineTranslator-> Used for batch portable pipelines. Based on the deprecatedDataSetAPI.FlinkBatchTransformTranslators-> Used for batch "native" java pipelines. . Based on the deprecatedDataSetAPI.
FlinkBatchPortablePipelineTranslator and FlinkBatchTransformTranslators should both be deprecated since the are implemented using the deprecated DataSet API which Flink will eventually remove.
FlinkBatchTransformTranslators can be replaced by FlinkStreamingTransformTranslators (#28614).
Given the similarities between the classes, I think FlinkStreamingTransformTranslators and FlinkStreamingPortablePipelineTranslator could be merged to support both portable and non portable pipelines with a unique translation layer.
Since we can easily convert instances of org.apache.beam.sdk.Pipeline to RunnerAPI.Pipeline, it should be possible to support all types of pipelines (native streaming, portable streaming, native batch, portable batch) with the same translation implementation.
The goal of this task is to introduce this new unique implementation and to eventually remove all the other alternatives.
Assuming this proposal is of interest, my colleagues and myself could implement it.
Issue Priority
Priority: 3 (nice-to-have improvement)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner