Update portable schema representation and java SchemaTranslation #8853

TheNeuralBit · 2019-06-13T23:01:50Z

Also adds tests in SchemaTranslationTest.

Things that are not currently included in this PR:

LogicalType registration and resolution by URN. We cannot decode a Schema with a logical type.
Representing datetime and decimal as logical types (see BEAM-7554). Instead they are still represented as primitive types in Java's Schema.FieldType and they are mapped to the appropriate URNs when converting to/from the proto representation.

Post-Commit Tests Status (on master branch)

Lang	Apex	Dataflow	Gearpump	Samza
Go	---	---	---	---
Java
Python	---		---	---

Pre-Commit Tests Status (on master branch)

---	Java	Python	Go	Website
Non-portable
Portable	---		---	---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

R: @reuvenlax, @robertwb, @kennknowles

… SchemaTranslationTest

kennknowles

This is great, nice and clear diff. Meaningful change. The only comment that is actionable here is that it might be an opportune time to reset the 0 value for the AtomicType enum.

kennknowles · 2019-06-19T02:08:11Z

model/pipeline/src/main/proto/beam_runner_api.proto

 // Experimental: A representation of a Beam Schema.
 message Schema {
-  enum TypeName {
+  enum AtomicType {


Isn't 0 in an enum supposed to be reserved for unknown? It is wise, because of defaulting in proto libs.

Yes, to my knowledge specifically adding an UNSPECIFIED with a value of 0 will make this clearer.
For example:

beam/model/pipeline/src/main/proto/beam_runner_api.proto

Line 451 in eb5a8c2

UNSPECIFIED = 0;

kennknowles · 2019-06-19T02:09:54Z

model/pipeline/src/main/proto/beam_runner_api.proto

+      ArrayType array_type = 3;
      MapType map_type = 4;
-      Schema row_schema = 5;
+      Schema row_type = 5;


Proto best practice I think is to go ahead and have a RowType message with one field. It has overhead, yes.

Done. Agreed this is much cleaner

kennknowles · 2019-06-19T02:13:45Z

...truction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java

+          .put(TypeName.BYTES, RunnerApi.Schema.AtomicType.BYTES)
          .build();

+  private static final String URN_BEAM_LOGICAL_DATETIME = "urn:beam:logical:datetime";


Two stylistic nits:

We tend to omit the urn, don't we? While it does make a valid URI out of the thing, it seems a bit silly.

I would leave out logical but put in something like type or schema_type or fieldtype to namespace.

Done. I went with fieldtype

kennknowles · 2019-06-19T02:14:37Z

...truction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java

-          .put(TypeName.ROW, RunnerApi.Schema.TypeName.ROW)
-          .put(TypeName.LOGICAL_TYPE, RunnerApi.Schema.TypeName.LOGICAL_TYPE)
+
+  private static final BiMap<TypeName, RunnerApi.Schema.AtomicType> ATOMIC_TYPE_MAPPING =


TBH I find a switch clearer than a map lookup, and it takes the same amount of code space. Not for this PR, in which you are just editing the existing structure not restructuring.

kennknowles · 2019-06-19T02:15:21Z

...truction-java/src/main/java/org/apache/beam/runners/core/construction/SchemaTranslation.java

-    switch (typeName) {
-      case ROW:
-        fieldType = FieldType.row(fromProto(protoFieldType.getRowSchema()));
+    switch (protoFieldType.getTypeInfoCase()) {


Another not-for-this PR comment that this would be cleaner with the switch in a function so the branches could all return.

lukecwik · 2019-06-19T16:16:46Z

Is the beam_runner_api.proto the right place to put all the schema stuff?

- add UNSPECIFIED to AtomicType - add RowType - urn:beam:logical:(.*) -> beam:fieldtype:\1

TheNeuralBit · 2019-06-19T20:39:29Z

I guess I don't have a strong opinion, I was just updating it in place. Do you think it should get it's own schema.proto file?

kennknowles · 2019-06-20T04:28:23Z

I think it would be great to have a separate schema.proto file. I wouldn't block merging on this. I would definitely like that to be a separate commit if you do add it here. Moving + editing in one commit would be bad form IMO.

TheNeuralBit · 2019-06-20T20:00:39Z

Agreed. I can follow-up with PR(s) for that move and the other code cleanup suggestions.

robertwb · 2019-06-25T08:36:01Z

LGTM too. Thanks.

…ion (apache#8853)" This reverts commit e65c176.

…Translation (apache#8853)"" This reverts commit dbcb14c.

…ion (apache#8853)" This reverts commit e65c176.

Update portable schema representation and java SchemaTranslation. Add…

1012a01

… SchemaTranslationTest

kennknowles requested review from kennknowles and reuvenlax June 19, 2019 02:07

kennknowles approved these changes Jun 19, 2019

View reviewed changes

!fixup

aaa70c8

- add UNSPECIFIED to AtomicType - add RowType - urn:beam:logical:(.*) -> beam:fieldtype:\1

robertwb merged commit e65c176 into apache:master Jun 25, 2019

TheNeuralBit mentioned this pull request Jun 25, 2019

Schema conversion cleanup #8943

Merged

TheNeuralBit added a commit to TheNeuralBit/beam that referenced this pull request Aug 28, 2019

Revert "Update portable schema representation and java SchemaTranslat…

dbcb14c

…ion (apache#8853)" This reverts commit e65c176.

TheNeuralBit added a commit to TheNeuralBit/beam that referenced this pull request Sep 18, 2019

Revert "Revert "Update portable schema representation and java Schema…

f629584

…Translation (apache#8853)"" This reverts commit dbcb14c.

soyrice pushed a commit to soyrice/beam that referenced this pull request Sep 19, 2019

Revert "Update portable schema representation and java SchemaTranslat…

3faf27f

…ion (apache#8853)" This reverts commit e65c176.

TheNeuralBit deleted the new-schema-representation branch October 10, 2019 00:03

Update portable schema representation and java SchemaTranslation #8853

Update portable schema representation and java SchemaTranslation #8853

Uh oh!

Conversation

TheNeuralBit commented Jun 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Post-Commit Tests Status (on master branch)

Pre-Commit Tests Status (on master branch)

Uh oh!

kennknowles left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukecwik commented Jun 19, 2019

Uh oh!

TheNeuralBit commented Jun 19, 2019

Uh oh!

kennknowles commented Jun 20, 2019

Uh oh!

TheNeuralBit commented Jun 20, 2019

Uh oh!

robertwb commented Jun 25, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TheNeuralBit commented Jun 13, 2019 •

edited

Loading