Skip to content

Conversation

@chadrik
Copy link
Contributor

@chadrik chadrik commented Aug 5, 2019

Add support for using PubSubIO as an external transform, so that it works from python on portable runners like Flink.

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

SimpleFunction<PubsubMessage, T> parseFn =
(SimpleFunction<PubsubMessage, T>) new IdentityMessageFn();
setParseFn(parseFn);
// FIXME: call setCoder(). need to use PubsubMessage proto to be compatible with python
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case that an external sdk has requested needsAttributes, this transform needs to produce PubsubMessage instances rather than byte arrays. There's a wrinkle here: python deserializes PubsubMessage using protobufs, whereas Java encodes with PubsubMessageWithAttributesCoder.

I believe we want to use protobufs from Java. Can I get confirmation of that, please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I serialized the PubsubMessage using protobufs. Since there's no cross-language coder for PubsubMessage, and I assumed it would be overreach to add one, I used the bytes coder and then handled converting to and from protobufs in code that lives close to the transforms.

args['id_label'] = _encode_str(self.id_label)

# FIXME: how do we encode a bool so that Java can decode it?
# args['with_attributes'] = _encode_bool(self.with_attributes)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the right way to encode a bool so that Java can decode it correctly?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I encoded as int and handled the cast in java

@chadrik
Copy link
Contributor Author

chadrik commented Aug 5, 2019

R: @robertwb
R: @mxm
R: @lukecwik
R: @chamikaramj

@chadrik chadrik force-pushed the external_gcp_pubsub branch from 45f3637 to 8f3ede3 Compare August 7, 2019 04:19
@chadrik chadrik changed the title [BEAM-7738] Add external transform support to PubSubIO [BEAM-7738] Add external transform support to PubsubIO Aug 7, 2019
@mxm
Copy link
Contributor

mxm commented Aug 13, 2019

@chadrik This is awesome. Thanks a lot for your work! Let's rebase this once #9098 is in. I'll have a look at this very soon.

@chadrik
Copy link
Contributor Author

chadrik commented Aug 13, 2019

Thanks @mxm. This is actually not yet working yet, and I'm getting some mysterious errors deep within flink wrt serialization. I'm working on writing tests and gathering as much info as I can so that I can report back here. So far I'm pretty perplexed, but I'm new to Beam, Flink, and Java, so not a big surprise! Once I have more info, it would be great to have your input.

@chadrik
Copy link
Contributor Author

chadrik commented Aug 16, 2019

Here's some info on where I am with this. I could really use some help to push this over the finish line.

The expansion service runs correctly, sends the expanded transforms back to python, but the job fails inside Java on Flink because it's trying to use the incorrect serializer. There's a good chance that I'm overlooking something very obvious.

Here's the stack trace:

2019-08-16 16:04:58,297 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: PubSubInflow/ExternalTransform(beam:external:java:pubsub:read:v1)/PubsubUnboundedSource/Read(PubsubSource) (1/1) (93e578d8ae737c7502685cd978413a98) switched from RUNNING to FAILED.
java.lang.RuntimeException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B
	at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:110)
	at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:89)
	at org.apache.flink.streaming.runtime.io.RecordWriterOutput.collect(RecordWriterOutput.java:45)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$ManualWatermarkContext.processAndCollect(StreamSourceContexts.java:305)
	at org.apache.flink.streaming.api.operators.StreamSourceContexts$WatermarkContext.collect(StreamSourceContexts.java:394)
	at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.emitElement(UnboundedSourceWrapper.java:342)
	at org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:268)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
	at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
	at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
	at org.apache.flink.streaming.runtime.tasks.StoppableSourceStreamTask.run(StoppableSourceStreamTask.java:45)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassCastException: org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage cannot be cast to [B
	at org.apache.beam.sdk.coders.ByteArrayCoder.encode(ByteArrayCoder.java:41)
	at org.apache.beam.sdk.coders.LengthPrefixCoder.encode(LengthPrefixCoder.java:56)
	at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:105)
	at org.apache.beam.sdk.values.ValueWithRecordId$ValueWithRecordIdCoder.encode(ValueWithRecordId.java:81)
	at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:578)
	at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:569)
	at org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.encode(WindowedValue.java:529)
	at org.apache.beam.runners.flink.translation.types.CoderTypeSerializer.serialize(CoderTypeSerializer.java:87)
	at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:175)
	at org.apache.flink.streaming.runtime.streamrecord.StreamElementSerializer.serialize(StreamElementSerializer.java:46)
	at org.apache.flink.runtime.plugable.SerializationDelegate.write(SerializationDelegate.java:54)
	at org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.serializeRecord(SpanningRecordSerializer.java:78)
	at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:160)
	at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:128)
	at org.apache.flink.streaming.runtime.io.RecordWriterOutput.pushToRecordWriter(RecordWriterOutput.java:107)
	... 15 more

Here's the serializer that CoderTypeSerializer.serialize is using before the exception:

WindowedValue$FullWindowedValueCoder(
  ValueWithRecordId$ValueWithRecordIdCoder(
    LengthPrefixCoder(
      ByteArrayCoder)
  ),
  GlobalWindow$Coder
) 

Here's the value that's being serialized:

TimestampedValueInGlobalWindow{
  value=ValueWithRecordId{
    id=[54, 56, 54, 48, 48, 52, 48, 49, 49, 57, 55, 54, 52, 53, 48], 
    value=org.apache.beam.sdk.io.gcp.pubsub.PubsubMessage@64d3b2a
  }, 
  timestamp=2019-08-16T20:04:29.230Z, 
  pane=PaneInfo.NO_FIRING
}

Obviously ByteArrayCoder is not the right serializer for PubsubMessage.

Note that I get the same error whether with_attributes is enabled or disabled.

So why is ByteArrayCoder being used?

The graph that's generated after expansion looks right to me. Here's a table of the primitive transforms and their PCollections (note the last row is a python pardo).

transform name transform type out collection out collection coder
external_1root/PubsubUnboundedSource/Read(PubsubSource) beam:transform:read:v1 PubsubMessage PubsubMessageWithAttributesCoder
external_1root/PubsubUnboundedSource/PubsubUnboundedSource.Stats/ParMultiDo(Stats) beam:transform:pardo:v1 of StatsFn PubsubMessage PubsubMessageWithAttributesCoder
external_1root/MapElements/Map/ParMultiDo(Anonymous) beam:transform:pardo:v1 of ParsePayloadAsPubsubMessageProto byte[] ByteArrayCoder
ref_AppliedPTransform_PubSubInflow/Map(_from_proto_str)_4 beam:transform:pardo:v1 of _from_proto_str PubsubMessage Pickle

So as far as the Pipeline definition is concerned, it seems like the output of Read(PubsubSource) is properly associated with the PubsubMessageWithAttributesCoder, but when when the job runs it's using ByteArrayCoder. So maybe somehow the wires are getting crossed with ParMultiDo(Anonymous), which uses ByteArrayCoder.

Any ideas?

The last thing that's important to note is that I needed PubsubIO.Read to output a data type that is compatible with Python, so I wrote a custom parseFn that converts each message to a protobuf byte array. So the PCollection that bridges the divide between Java and Python uses a BytesCoder, but the transforms on either side do the work of converting to and from protobuf representation of a PubsubMessage.

@chadrik
Copy link
Contributor Author

chadrik commented Aug 17, 2019

Ok, I now know how the PubsubMessageWithAttributesCoder is getting swapped with a BytesCoder, but it turns out it's intentional and I don't know why.

In FlinkStreamingPortablePipelineTranslator.translateRead when the coder is initiated it is passed through LengthPrefixUnknownCoders.addLengthPrefixedCoder which silently replaces all non-model coders with LengthPrefixCoder(ByteArrayCoder). This seems like the sort of thing that should print a warning, since I assume that a broken pipeline is a likely outcome.

I'm unclear why this coder swap is necessary. The java part of this pipeline is a Read -> ParDo -> ParDo, shouldn't this segment be able to utilize java-only coders (i.e. non-model coder)?

What's the proper solution here? The last java ParDo is the one that's ensuring we have a byte array for sending to python, but evidently this needs to be happening in the Read itself?

@chadrik chadrik force-pushed the external_gcp_pubsub branch 2 times, most recently from 2b02422 to 6f41f12 Compare August 30, 2019 16:27
@chadrik
Copy link
Contributor Author

chadrik commented Aug 30, 2019

PubsubIO is officially working on Flink!

Some notes:

There's one giant caveat: this actually does not work unless you have this special commit: chadrik@d12b990. It turns out the same is currently true for KafkaIO external transform support, which came as quite a surprise. The Jira issue for this problem is here: https://jira.apache.org/jira/browse/BEAM-7870. I'd love to see some movement on this. Happy to help where I can!

@chadrik
Copy link
Contributor Author

chadrik commented Sep 5, 2019

I think this PR is ready to merge, but I could use a little help understanding how to resolve the test failures.

Java PreCommit has no test failures, so I'm not sure why it's marked as failed here. Is it because of the warnings? Jenkins seems to imply that anything over 12 warnings is an error, but I don't see any warnings in files that I changed:
https://builds.apache.org/job/beam_PreCommit_Java_Commit/7551/

Likewise Python PreCommit list no failures in jenkins, but is marked as failed here.

And Portable_Python PreCommit just seems to have timed out.

@chadrik
Copy link
Contributor Author

chadrik commented Sep 5, 2019

Run Portable_Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Sep 5, 2019

Run Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Sep 10, 2019

Hi. Still hoping for a little help bringing this to a close.

@mxm
Copy link
Contributor

mxm commented Sep 24, 2019

Build results have expired: https://builds.apache.org/job/beam_PreCommit_Java_Commit/7551/

@mxm
Copy link
Contributor

mxm commented Sep 24, 2019

Retest this please

@mxm
Copy link
Contributor

mxm commented Sep 24, 2019

@chadrik Does this have to be rebased now that #9098 has been merged?

@chadrik
Copy link
Contributor Author

chadrik commented Sep 24, 2019

Does this have to be rebased now that #9098 has been merged?

It will work as is, but it should be converted to the new API to provide more real-world examples. I'll do that today.

If you could help me diagnose the Java test failure it would be much appreciated!

@mxm
Copy link
Contributor

mxm commented Sep 25, 2019

Will take a look ASAP, on a low time budget at the moment :)

@chamikaramj
Copy link
Contributor

Seems like the Java PreCommit failure is for the new test ?

Stacktrace is:
org.apache.beam.sdk.io.gcp.pubsub.PubsubIOExternalTest > testConstructPubsubRead FAILED
java.lang.RuntimeException at PubsubIOExternalTest.java:97
Caused by: java.lang.RuntimeException at PubsubIOExternalTest.java:97
Caused by: java.lang.reflect.InvocationTargetException at PubsubIOExternalTest.java:97
Caused by: java.lang.IllegalStateException at PubsubIOExternalTest.java:97

@chamikaramj
Copy link
Contributor

Python failure seems to be a docs issue (:sdks:python:test-suites:tox:py2:docs'.)

Could be a flake. Retrying.

@chamikaramj
Copy link
Contributor

Run Python PreCommit

@chadrik chadrik force-pushed the external_gcp_pubsub branch from 6f41f12 to fcbffb7 Compare September 26, 2019 23:56
@mxm
Copy link
Contributor

mxm commented Sep 27, 2019

@chadrik Seems like you fixed the issue. Java tests are passing now: https://builds.apache.org/job/beam_PreCommit_Java_Commit/7889/ The one test failure is unrelated to your changes:

org.apache.beam.sdk.transforms.ParDoLifecycleTest.testTeardownCalledAfterExceptionInStartBundleStateful

@chadrik chadrik force-pushed the external_gcp_pubsub branch from 9f32963 to 53af6bf Compare October 2, 2019 23:18
@chadrik
Copy link
Contributor Author

chadrik commented Oct 2, 2019

This is now waiting on the BooleanCoder in python

@chadrik chadrik force-pushed the external_gcp_pubsub branch from 53af6bf to 774373a Compare October 8, 2019 04:21
@chadrik
Copy link
Contributor Author

chadrik commented Oct 9, 2019

Run Java PreCommit

@mxm
Copy link
Contributor

mxm commented Oct 16, 2019

Any update here?

@chadrik
Copy link
Contributor Author

chadrik commented Oct 16, 2019 via email

@chadrik chadrik force-pushed the external_gcp_pubsub branch from 04f32a0 to 63cabe7 Compare October 18, 2019 06:17
@chadrik
Copy link
Contributor Author

chadrik commented Oct 20, 2019

Run Python PreCommit

3 similar comments
@chadrik
Copy link
Contributor Author

chadrik commented Oct 21, 2019

Run Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Oct 21, 2019

Run Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Oct 21, 2019

Run Python PreCommit

@chadrik
Copy link
Contributor Author

chadrik commented Oct 21, 2019

@mxm This is tested on prem and ready to go! (with the caveat that the special commit to add dependencies is still required for end users, as is the case with kafka: chadrik/beam@d12b990)

Copy link
Contributor

@mxm mxm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. This is ready from my side, but I would follow-up with doc strings in the Python API to list the limitations. This should also be done with ReadFromKafka/WriteToKafka. Perhaps we should even add the additional commits as part of Beam, at least until we have worked around the coders problems?

@chadrik
Copy link
Contributor Author

chadrik commented Oct 22, 2019

Perhaps we should even add the additional commits as part of Beam, at least until we have worked around the coders problems?

I think it would be great for this to work out of the box, but I'm not sure what the ramifications of adding those commits would be. They don't just modify the standard coders, they also change the standard beam java docker containers by adding additional dependencies. Is it safe to add those even if people aren't using pubsub/kafka? Is the only downside larger container sizes?

@TheNeuralBit How close are we to having schema coders ready to use? IIUC, we should be able to register a converter between Row -> PubsubMessage, right? What's the mechanism for that? It would be great to put that to use here as soon as the schema coders are ready.

@TheNeuralBit
Copy link
Member

How close are we to having schema coders ready to use? IIUC, we should be able to register a converter between Row -> PubsubMessage, right? What's the mechanism for that? It would be great to put that to use here as soon as the schema coders are ready.

@robertwb gave #9188 a LGTM, I just need to get CI passing and I think we can merge it. I'll work on that now.

Are you thinking you'd use beam:coder:row:v1 as the interface for the external transform, and the Java ExternalTransform implementations would handle the conversion of Row to/from PubsubMessage? There's no trivial way to register a converter between Row and PubsubMessage since the latter isn't structured, but of course on the Java side we could have code to serialize the Row to a variety of formats to put in the PubsubMessage payload: Avro, JSON, or the row serialization format itself (although I'm not sure we'd want to encourage using that outside of Beam), would be pretty simple to add. Maybe the format to use could be part of the external transform payload.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 22, 2019

Are you thinking you'd use beam:coder:row:v1 as the interface for the external transform, and the Java ExternalTransform implementations would handle the conversion of Row to/from PubsubMessage?

There are two places that I see beam:coder:row:v1 being useful:

  1. as a way to declare the construction interface of an external transform, and encode its values. A schema coder would replace the configuration mapping in pipeline.ExternalConfiguration.ExternalConfigurationPayload proto.
  2. as a coder for structured elements that are exchanged between sdks

Right now I'm mostly thinking about the latter, which is when PubsubMessage comes into play.

There's no trivial way to register a converter between Row and PubsubMessage since the latter isn't structured,

Maybe I'm thinking about this wrong, but I think the PubsubMessage is structured:

public class PubsubMessage {

  private byte[] message;
  private Map<String, String> attributes;
  private String messageId;

  /** Returns the main PubSub message. */
  public byte[] getPayload() {
    return message;
  }

  /** Returns the full map of attributes. This is an unmodifiable map. */
  public Map<String, String> getAttributeMap() {
    return attributes;
  }

  /** Returns the messageId of the message populated by Cloud Pub/Sub. */
  @Nullable
  public String getMessageId() {
    return messageId;
  }

I'm not a Java expert by any means, but this seems like a type that would work with AutoValue, we just need to rename message to payload and attributes to attributeMap.

What are the requirements for registering a Row converter?

but of course on the Java side we could have code to serialize the Row to a variety of formats to put in the PubsubMessage payload: Avro, JSON, or the row serialization format itself (although I'm not sure we'd want to encourage using that outside of Beam), would be pretty simple to add.

I think the payload is not a concern when it comes to portability of external transforms: it gets encoded/decoded by another transform, not PubsubRead/Write. We can just assume that's a byte array.

My grasp on the Java side is a bit tenuous, so I'd like for @mxm to confirm or deny what I've written here.

@TheNeuralBit
Copy link
Member

Right now I'm mostly thinking about the latter

Agreed, that's what I'm thinking about too.

Maybe I'm thinking about this wrong, but I think the PubsubMessage is structured:

Ah ok, fair. I was referring specifically to the structure (or lack thereof) of the byte array payload, but you're right the (Python SDK) user can handle creating a byte array themselves, and the row coder can just encode {byte[] payload, Map<String, String> attributes, String messageId}

What are the requirements for registering a Row converter?

Java

There are a variety of ways to do it. If it were a class that we had control over we could use the DefaultSchema annotation, as long as one of the included SchemaProvider implementations would work (I think JavaBeanSchema is the closest but wouldn't work because PubsubMessage doesn't have setters). I think what we'd want to do here is just implement a SchemaProvider and a SchemaProviderRegistrar for PubsubMessage and include it in Beam.

@reuvenlax may have a better suggestion.

Python

With my PR I think it could look like:

# this is py3 syntax for clarity, but we'd probably
# need to use the TypedDict('PubsubMessage', ...) version
class PubsubMessage(TypedDict):
   message: ByteString
   attributes: Mapping[unicode, unicode]
   messageId: unicode

coders.registry.register_coder(PubsubMessage, coders.RowCoder)

pcoll
  | 'make some messages' >> beam.Map(makeMessage).with_output_types(PubsubMessage)
  | 'write to pubsub' >> beam.io.WriteToPubsub(project, topic) # or something

@chadrik
Copy link
Contributor Author

chadrik commented Oct 22, 2019

If it were a class that we had control over we could use the DefaultSchema annotation, as long as one of the included SchemaProvider implementations would work (I think JavaBeanSchema is the closest but wouldn't work because PubsubMessage doesn't have setters).

It may reduce our overall technical debt if we just implement the full set of setters and getters on PubsubMessage and use DefaultSchema. It doesn't seem like it would be a bad thing. Who would have an opinion on that?

@TheNeuralBit
Copy link
Member

D'oh my bad, we do have control over PubsubMessage 🤦‍♂️ I assumed it was part of the pubsub client library. Yeah I vote we use DefaultSchema with either JavaBeanSchema or JavaFieldSchema, whichever works with fewer changes.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 24, 2019

@mxm I marked kafak and pubsub external transforms as external. I think this is ready to merge.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 24, 2019

@TheNeuralBit On the python side, as with the Java SDK, there is a custom PubsubMessage class (take a look in apache_beam.io.gcp.pubsub). The main thing it provides is methods for converting to/from protobuf.

In your example above, you registered the TypedDict sub-class: will there be a similar system in python for converting between a row type to a custom type, like apache_beam.io.gcp.pubsub.PubsubMessage?

@TheNeuralBit
Copy link
Member

Will there be a similar system in python for converting between a row type to a custom type

It hasn't been designed, but that's definitely something I'd like to support. Would we actually need the PubsubMessage protobuf conversion methods for the external transform version of pubsubio/kafkaio, or are you just thinking it would be good to use the same type for consistency? I'm just wondering if we can get away with a simple TypedDict sub-class for now

from apache_beam.transforms.external import ExternalTransform
from apache_beam.transforms.external import NamedTupleBasedPayloadBuilder

ReadFromPubsubSchema = typing.NamedTuple(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be done in this PR, but in the future it will be great if we just move external PubSub and Kafka transforms to io/gcp/pubsub.py and io/kafka.py and configure/branch based on pipeline options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that idea. The two sets of xforms are do not have the same arguments, in part because of the need for the expansion service endpoint, but also because the Java versions are not guaranteed to be the same as the python versions (though I would like that to be the case). Even the expansion service argument could somehow be passed through the options, the two sets of xform classes may have different base classes: ideally the ones in io.external inherit from ExternalTransform (they currently do not, but they should be able to once we resolve the issue we're working on with @TheNeuralBit )

)


class ReadFromPubSub(beam.PTransform):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we rename to avoid conflicts with ReadFromPubSub/WriteToPubSub in io/gcp/pubsub.py ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a feature that they are named the same. I can take a pipeline designed to run on dataflow, and simply change the imports to get it to run on flink.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 24, 2019

Would we actually need the PubsubMessage protobuf conversion methods for the external transform version of pubsubio/kafkaio

We would not need to use PubsubMessage protobuf type any more (and thus we wouldn't need the conversion methods), but we may still want the beam PubsubMessage class. The latter is yielded by io.pubsub.ReadFromPubSub, so if we ant io.external.pubsub.ReadFromPubSub to be a drop-in replacement (which I am in favor of) then that means keeping io.external.pubsub.PubsubMessage and adding the ability to convert between this and a row-type.

@chadrik
Copy link
Contributor Author

chadrik commented Oct 24, 2019

@TheNeuralBit let's move the conversation about row coders over to https://jira.apache.org/jira/browse/BEAM-7870

This PR is ready to merge!

@chamikaramj
Copy link
Contributor

LGTM. Thanks.

Looks like Max already approved so I'll merge.

@chamikaramj chamikaramj merged commit 73642eb into apache:master Oct 25, 2019
@chadrik
Copy link
Contributor Author

chadrik commented Oct 27, 2019 via email

@mxm
Copy link
Contributor

mxm commented Oct 28, 2019

Congrats @chadrik! :) Also thanks for your help @chamikaramj and @TheNeuralBit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants