-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-270] Support Timestamps/Windows in Flink Batch #335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
aljoscha
wants to merge
29
commits into
apache:master
from
aljoscha:flink-windowed-value-batch-cleaned
Closed
[BEAM-270] Support Timestamps/Windows in Flink Batch #335
aljoscha
wants to merge
29
commits into
apache:master
from
aljoscha:flink-windowed-value-batch-cleaned
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a forward-port of GoogleCloudPlatform/DataflowJavaSDK#232
This enable unprocessed elements to be handled in the Watermark manager after they are added to the CommittedResult structure.
This will help, for all sinks, users and developers gain insight into where time is spent. (Enabling DEBUG level will provide more insight.)
…modules sync in term of version
The pre-commit wordcount test will confirm that this does not break the Cloud Dataflow worker.
* Move PubsubClient and friends out of sdk.io and into sdk.util.
* Add PubsubApiaryClient since gRPC has onerous boot class path
requirements which I don't wish to inflict upon other runners.
* Add PubsubTestClient in preparation for unit testing
PubsubUnbounded{Source,Sink}.
* Unit tests for all of above.
This is strictly creating the module and moving one easy class to it. Many of the utilities in org.apache.beam.util and subpackages should move as developments allow.
This SideInputReader allows callers to check for a side input being available before attempting to read the contents
This DoFnRunner wraps a DoFnRunner and provides an additional method to process an element in all the windows where all side inputs are ready, returning any elements that it could not process.
The default and timerful completion callbacks are identical, excepting their calls to evaluationContext.commitResult; factor that code into a common location.
This makes the runner available for selection by integration tests.
Today Flink batch supports only global windows. This is a situation we intend our build to allow, eventually via JUnit category filtering. For now all the test classes that use non-global windows are excluded entirely via maven configuration. In the future, it should be on a per-test-method basis.
Without it the RunnableOnService tests seem to not work
With this change we always use WindowedValue<T> for the underlying Flink DataSets instead of just T. This allows us to support windowing as well. This changes also a lot of other stuff enabled by the above: - Use WindowedValue throughout - Add proper translation for Window.into() - Make side inputs window aware - Make GroupByKey and Combine transformations window aware, this includes support for merging windows. GroupByKey is implemented as a Combine with a concatenating CombineFn, for simplicity This removes Flink specific transformations for things that are handled by builtin sources/sinks, among other things this: - Removes special translation for AvroIO.Read/Write and TextIO.Read/Write - Removes special support for Write.Bound, this was not working properly and is now handled by the Beam machinery that uses DoFns for this - Removes special translation for binary Co-Group, the code was still in there but was never used With this change all RunnableOnService tests run on Flink Batch.
All of the stuff in the removed ITCases is covered (in more detail) by the RunnableOnService tests.
Contributor
Author
|
I thinks there's a lot of commits in here because syncing between the ASF git and github doesn't work right now. This also removes the redundant Flink ITCases mentioned in #328 and fixes the last remaining |
Contributor
Author
|
I opened the PR from the wrong branch. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a cleanup version of #328