-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Update portable schema representation and java SchemaTranslation #8853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update portable schema representation and java SchemaTranslation #8853
Conversation
… SchemaTranslationTest
kennknowles
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, nice and clear diff. Meaningful change. The only comment that is actionable here is that it might be an opportune time to reset the 0 value for the AtomicType enum.
| // Experimental: A representation of a Beam Schema. | ||
| message Schema { | ||
| enum TypeName { | ||
| enum AtomicType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't 0 in an enum supposed to be reserved for unknown? It is wise, because of defaulting in proto libs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, to my knowledge specifically adding an UNSPECIFIED with a value of 0 will make this clearer.
For example:
| UNSPECIFIED = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
| ArrayType array_type = 3; | ||
| MapType map_type = 4; | ||
| Schema row_schema = 5; | ||
| Schema row_type = 5; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proto best practice I think is to go ahead and have a RowType message with one field. It has overhead, yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Agreed this is much cleaner
| .put(TypeName.BYTES, RunnerApi.Schema.AtomicType.BYTES) | ||
| .build(); | ||
|
|
||
| private static final String URN_BEAM_LOGICAL_DATETIME = "urn:beam:logical:datetime"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two stylistic nits:
- We tend to omit the
urn, don't we? While it does make a valid URI out of the thing, it seems a bit silly. - I would leave out
logicalbut put in something liketypeorschema_typeorfieldtypeto namespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I went with fieldtype
| .put(TypeName.ROW, RunnerApi.Schema.TypeName.ROW) | ||
| .put(TypeName.LOGICAL_TYPE, RunnerApi.Schema.TypeName.LOGICAL_TYPE) | ||
|
|
||
| private static final BiMap<TypeName, RunnerApi.Schema.AtomicType> ATOMIC_TYPE_MAPPING = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I find a switch clearer than a map lookup, and it takes the same amount of code space. Not for this PR, in which you are just editing the existing structure not restructuring.
| switch (typeName) { | ||
| case ROW: | ||
| fieldType = FieldType.row(fromProto(protoFieldType.getRowSchema())); | ||
| switch (protoFieldType.getTypeInfoCase()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another not-for-this PR comment that this would be cleaner with the switch in a function so the branches could all return.
|
Is the beam_runner_api.proto the right place to put all the schema stuff? |
|
I guess I don't have a strong opinion, I was just updating it in place. Do you think it should get it's own schema.proto file? |
|
I think it would be great to have a separate |
|
Agreed. I can follow-up with PR(s) for that move and the other code cleanup suggestions. |
|
LGTM too. Thanks. |
…ion (apache#8853)" This reverts commit e65c176.
…Translation (apache#8853)"" This reverts commit dbcb14c.
…ion (apache#8853)" This reverts commit e65c176.
Also adds tests in SchemaTranslationTest.
Things that are not currently included in this PR:
Schema.FieldTypeand they are mapped to the appropriate URNs when converting to/from the proto representation.Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
R: @reuvenlax, @robertwb, @kennknowles