Skip to content

Conversation

@aljoscha
Copy link
Contributor

This is a cleanup version of #328

dhalperi and others added 29 commits May 13, 2016 11:01
This enable unprocessed elements to be handled in the Watermark manager
after they are added to the CommittedResult structure.
This will help, for all sinks, users and developers gain insight into where time
is spent. (Enabling DEBUG level will provide more insight.)
The pre-commit wordcount test will confirm that this does not break the
Cloud Dataflow worker.
* Move PubsubClient and friends out of sdk.io and into sdk.util.
* Add PubsubApiaryClient since gRPC has onerous boot class path
  requirements which I don't wish to inflict upon other runners.
* Add PubsubTestClient in preparation for unit testing
  PubsubUnbounded{Source,Sink}.
* Unit tests for all of above.
This is strictly creating the module and moving one easy class to it.
Many of the utilities in org.apache.beam.util and subpackages should
move as developments allow.
This SideInputReader allows callers to check for a side input being
available before attempting to read the contents
This DoFnRunner wraps a DoFnRunner and provides an additional method to
process an element in all the windows where all side inputs are ready,
returning any elements that it could not process.
The default and timerful completion callbacks are identical, excepting
their calls to evaluationContext.commitResult; factor that code into a
common location.
This makes the runner available for selection by integration tests.
Today Flink batch supports only global windows. This is a situation we
intend our build to allow, eventually via JUnit category filtering.

For now all the test classes that use non-global windows are excluded
entirely via maven configuration. In the future, it should be on a
per-test-method basis.
Without it the RunnableOnService tests seem to not work
With this change we always use WindowedValue<T> for the underlying Flink
DataSets instead of just T. This allows us to support windowing as well.

This changes also a lot of other stuff enabled by the above:

 - Use WindowedValue throughout
 - Add proper translation for Window.into()
 - Make side inputs window aware
 - Make GroupByKey and Combine transformations window aware, this
   includes support for merging windows. GroupByKey is implemented as a
   Combine with a concatenating CombineFn, for simplicity

This removes Flink specific transformations for things that are handled
by builtin sources/sinks, among other things this:

 - Removes special translation for AvroIO.Read/Write and
   TextIO.Read/Write
 - Removes special support for Write.Bound, this was not working properly
   and is now handled by the Beam machinery that uses DoFns for this
 - Removes special translation for binary Co-Group, the code was still
   in there but was never used

With this change all RunnableOnService tests run on Flink Batch.
All of the stuff in the removed ITCases is covered (in more detail) by
the RunnableOnService tests.
@aljoscha
Copy link
Contributor Author

I thinks there's a lot of commits in here because syncing between the ASF git and github doesn't work right now. This also removes the redundant Flink ITCases mentioned in #328 and fixes the last remaining RunnableOnService test failure.

@aljoscha
Copy link
Contributor Author

I opened the PR from the wrong branch.

@aljoscha aljoscha closed this May 17, 2016
@aljoscha aljoscha deleted the flink-windowed-value-batch-cleaned branch May 17, 2016 09:14
dhalperi pushed a commit to dhalperi/beam that referenced this pull request Aug 23, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants