-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-53] Java-only pub/sub source and sink (streaming only) #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
R=Ken |
|
Note this has been extensively stress tested by running within a custom Google Dataflow runner. |
|
R= @kennknowles (Copying from #85 (comment), so my scripts work) |
|
Made PubsubGrpcClient public and refactored a bit so that it API is free of gRPC and protoc dependencies. This means it can be reused, and also will make mocking easier. |
|
Looking at the ci failure the grpc-pubsub.jar (containing all the pubsub proto-derived classes) is jdk8 only. Advice? |
|
R: @davorbonaci |
|
The two pubsub/grpc jars this depends on have been compiled with jdk8 and thus break our jdk7 support requirement. I'm taking this up with the Google team responsible. We can review this but we can't merge until that is sorted out. |
|
|
||
| /** | ||
| * Keep track of the minimum/maximum/sum of a set of timestamped long values. | ||
| * For efficiency, bucket values by their timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be super clear - the min/max/sum are meant as examples here, right? It seems this is a generic binning/bucketing map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are the three functions I needed for watermark tracking and implemented in SimpleFunction. I'm trying to keep the scope as small as possible.
|
This is a very large PR. I am going to need to take another pass to continue to grok. |
|
One thing you could do that would lighten the cognitive load would be to separate the source and sink into two PRs. |
Move some pubsub source/sink machinery into PubSubGrpcClient so that it's api can be grpc and protoc neutral.
Factor PubsubClient iface out of PubsubGrpcClient impl.
This reverts commit 0befb34.
|
PTAL (really need to squash history now)
So now we could proceed as |
|
Ok, I'll save this one for just the source, and will send in pub/sub client and sink as sep prs. |
|
Noting that the first PR peeled off from this is #120. |
apache#85 Move findbugs plugin execution to the process-classes phase
Improve the CombinedByKey translator in the DataStream-based Flink batch runner.
First step towards supporting pub/sub i/o in any Java runner.
Disclaimers:
But other than that we're ready to go.