Skip to content

Conversation

@TheNeuralBit
Copy link
Member

@TheNeuralBit TheNeuralBit commented Sep 6, 2019

  • Adds a typeDescriptor parameter to SchemaCoder, and plumbs through a descriptor everywhere instances are created (3bafcb0).
  • Modifies GetterBasedSchemaProvider so that the toRow/fromRow functions it creates have a meaningful equals and hashCode function (d415795).
  • Adds an equals function to SchemaCoder that checks schema, typeDescriptor, fromRow, and toRow.
  • Adds some tests for the above features
    • Each sub-class of GetterBasedSchemaProvider has at least one test checking that toRow and fromRow are equal after serialization (36a6014).
    • Adds SchemaCoderTest which checks some instances of SchemaCoder with CoderProperties.{coderSerializable, coderConsistentWithEquals} (ab18bd5).
  • This change resolves BEAM-8204 and BEAM-8205, so the PR also re-enables AvroSchemaTest.testAvroPipelineGroupBy for Flink and Apex (17c2646).

Post-Commit Tests Status (on master branch)

Lang SDK Apex Dataflow Flink Gearpump Samza Spark
Go Build Status --- --- Build Status --- --- Build Status
Java Build Status Build Status Build Status Build Status
Build Status
Build Status
Build Status Build Status Build Status
Build Status
Python Build Status
Build Status
Build Status
Build Status
--- Build Status
Build Status
Build Status --- --- Build Status
XLang --- --- --- Build Status --- --- ---

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website
Non-portable Build Status Build Status Build Status Build Status
Portable --- Build Status --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

@TheNeuralBit
Copy link
Member Author

Run Flink ValidatesRunner

@TheNeuralBit
Copy link
Member Author

Run Apex ValidatesRunner

@TheNeuralBit
Copy link
Member Author

Run Flink ValidatesRunner

@TheNeuralBit
Copy link
Member Author

Run Apex ValidatesRunner

@TheNeuralBit TheNeuralBit changed the title [BEAM-8146] Add equals and hashCode to SchemaCoder and RowCoder [BEAM-8146,BEAM-8204,BEAM-8205] Add equals and hashCode to SchemaCoder and RowCoder Sep 23, 2019
@TheNeuralBit
Copy link
Member Author

Run Apex ValidatesRunner

@TheNeuralBit
Copy link
Member Author

Run Flink ValidatesRunner

@TheNeuralBit
Copy link
Member Author

R: @reuvenlax

I know you said Flink/Apex shouldn't be relying on coder equality and that should be the real fix for BEAM-8204 and BEAM-8205, but I think this is helpful for testing anyway.

@TheNeuralBit
Copy link
Member Author

Run Flink ValidatesRunner

@TheNeuralBit
Copy link
Member Author

Run Apex ValidatesRunner

@reuvenlax
Copy link
Contributor

is this ready for review?

@TheNeuralBit
Copy link
Member Author

Yep!

@kennknowles kennknowles requested a review from reuvenlax October 1, 2019 21:07
@kennknowles
Copy link
Member

Thing to watch out for is update (in)compatibility. It happened once when a contribution made a good refactor.

@kennknowles
Copy link
Member

Since it is already a CustomCoder hence not really update compatible, this is g2g after you resolve the rebase.

@TheNeuralBit
Copy link
Member Author

Thanks @kennknowles, I resolved the merge conflicts so I think this is ready.

It may be good to get a review from @reuvenlax before merging though, at least an approval of the high-level approach. Plumbing through the type descriptor and changing the GetterBasedSchemaProvider interface is a pretty substantial change. The latter one will also cause a small conflict with #8690.

@kennknowles
Copy link
Member

Interesting. If toRow and fromRow are equal, is it possible for the type descriptor to be different? (other than, I suppose, loss of generics reflection information due to erasure)

@TheNeuralBit
Copy link
Member Author

That's a good point, I don't think so. Initially I added the type descriptor as something we can verify as a proxy for toRow/fromRow but now that I've made the functions created by our SchemaProvider implementations comparable, that check is redundant. Do you think I should back out the type descriptor changes to make this simpler? It's mostly its own commit so it wouldn't be too hard.

@kennknowles
Copy link
Member

I would say keep as-is. Being able to implement getEncodedTypeDescriptor adds value on its own.

@kennknowles
Copy link
Member

Given how long you've been waiting on this, I'm inclined to merge. Last chance to comment @reuvenlax ?

@TheNeuralBit
Copy link
Member Author

+1 on merging @kennknowles

@kennknowles kennknowles merged commit 9b30d7c into apache:master Nov 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants