Skip to content

Conversation

@alexvanboxel
Copy link
Contributor

[BEAM-9044] Protobuf options to Schema options

Convert protobuf options to Row schema options. It gets the options from a proto message descriptor and converts them into beam schema options on the Schema. It does the same from proto field descriptors, converting them into beam field options.

The converter creates typed options as the proto options are also fully typed. For the convertion it uses the ProtoDynamicSchema to resolve the types.


Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status
Build Status
--- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status
Build Status
Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@alexvanboxel
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

1 similar comment
@alexvanboxel
Copy link
Contributor Author

Run Java_Examples_Dataflow PreCommit

@alexvanboxel
Copy link
Contributor Author

@reuvenlax friendly ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to switch this metadata to use options. However you are translating proto option names directly to field names, which means there's no guarantee that these other options won't conflict. Maybe prefix all of these options with something to prevent potential conflict?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to switch this metadata to use options. However you are translating proto option names directly to field names, which means there's no guarantee that these other options won't conflict. Maybe prefix all of these options with something to prevent potential conflict?

What about beam:option:proto:, that would make an proto option vptech.data.v1.description into beam:option:proto:vptech.data.v1.description. Making it a URI format

beam:option : fixed prefix for options, proto : for the extension (will be avro for avro), vptech.data.v1 package of the extension and desciption the proto extension field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Reuven was referring to the metadata that's added in withMetaData, since that's the "special" proto metadata

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it, I don't think so... because I told Reuven I would do metadata after this. Still I think it's a good idea... and with your comment I want to revise the URI:

  • beam:option:proto:field:vptech.data.v1.description for a proto field option
  • beam:option:proto:message:vptech.data.v1.description for a proto message option
  • beam:option:proto:meta:number for a proto field number
  • beam:option:proto:meta:type_name for a proto type name

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to force all Beam option names to have a prefix - I think that's the wrong approach. I'm suggesting that we create well-known prefixes for these proto options (when translated to Beam options), so we can distinguish them from other options. The prefixes you suggested sound fine to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now all the options are prefix. I moved the removal of the metadata to another ticket, because it's more involved:

[BEAM-9604] BIP-1: Remove schema metadata usage for Protobuf extension

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, I think we should add a well-known prefix here.

@alexvanboxel
Copy link
Contributor Author

Run Java PreCommit

@alexvanboxel
Copy link
Contributor Author

Run Java PreCommit

@alexvanboxel alexvanboxel force-pushed the feature/BEAM-9044-options-proto branch 2 times, most recently from 496d77a to f7185f6 Compare March 27, 2020 13:17
@alexvanboxel
Copy link
Contributor Author

@reuvenlax if you don't have remarks (removing metadata will be in another PR) I can make a private build of master. I got a pipeline full of schema aware stuff to test this on.

@alexvanboxel
Copy link
Contributor Author

Run Java PreCommit

@alexvanboxel
Copy link
Contributor Author

I've decided to add the removal of metadata usage in the proto implementation as an additional commit, this makes the PR bigger but it also makes it complete. The option based storing of proto number and message name makes it much cleaner then the removed metadata based one.

@alexvanboxel
Copy link
Contributor Author

@reuvenlax friendly reminder for reviewing the changes. It would be great if this would make the release (2.21) that will be cut in 6 days.

@reuvenlax
Copy link
Contributor

LGTM

Convert protobuf options to Row schema options. It gets the options
from a proto message descriptor and converts them into beam schema
options on the Schema. It does the same from proto field descriptors,
converting them into beam field options.

The converter creates typed options as the proto options are also
fully typed. For the convertion it uses the ProtoDynamicSchema
to resolve the types.
@alexvanboxel alexvanboxel force-pushed the feature/BEAM-9044-options-proto branch from a7d7492 to e741380 Compare April 3, 2020 11:37
@alexvanboxel alexvanboxel merged commit 06a120c into apache:master Apr 3, 2020
@alexvanboxel
Copy link
Contributor Author

Thanks, now I can start working on the next parts and think about documentation. Hooray!

@alexvanboxel alexvanboxel deleted the feature/BEAM-9044-options-proto branch April 3, 2020 12:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants