-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-8111] Enable CloudObjectsTest$DefaultCoders #9446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Run Dataflow ValidatesRunner |
|
Run Dataflow ValidatesRunner |
|
R: @reuvenlax would you mind reviewing this? |
| SchemaCoder<?> that = (SchemaCoder<?>) o; | ||
| return rowCoder.equals(that.rowCoder) | ||
| && toRowFunction.equals(that.toRowFunction) | ||
| && fromRowFunction.equals(that.fromRowFunction); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this just revert to object equality comparison on the to/from functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - I discussed this offline a bit with @kennknowles and he convinced me that it was better to have an equals function that might have some false negatives (if the toRowFunction and fromRowFunction don't have a good equals), rather than one that could have false positives (like if we rely on just checking the schema and typeDescriptor, and assume that the toRow/fromRow are the same).
I managed to make the CloudObjectsTest work by adding RowIdentity with an equals() function here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I would phrase this is: let the functions own their equals. If they say they are equal, they are. If they say they aren't, they aren't. So this equals() is relative to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good in theory. In practice these functions are usually lambdas, so we might have trouble making this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. I was thinking it's not such a big deal to get false negatives when lambdas are used, since I really just want the equality check to use in tests.
What do you think about updating the various schema providers to create Function sub-classes (with equals implemented) instead of using lambdas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative could be to add something like assertEquivalentSchemaCoder that just checks schema and type, rather than continuing down this rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we go ahead and merge this as is? I could follow up with more changes to the SchemaCoder equals (plumbing through a type descriptor and using that for comparison, as well as possibly changing the toRow/fromRow functions created by the existing SchemaProviders to make them comparable)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a PR up now (#9493) that adds equals and hashCode to the fromRow and toRow functions created by all the GetterBasedSchemaProvider sub-classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW this is not just for tests. The Flink runner appears to rely on coder equality (even though you can argue it shouldn't).
|
R: @kennknowles |
Adds a
@RunWith(Enclosed.class)toCloudObjectsTestso thatDefaultCodersactually runs. Since this test hasn't been running it has a few issues, which I've also attempted to resolve here. A summary of the changes to that end:StructuredCodersub-class use the components list as the expected components, rather than the usual arguments list.DoubleCoderto list of Dataflow known coders.PIPELINE_PROTO_CODER_IDrather than all model coders.fromRow,toRow, so it doesn't work as expected for instances created with lambdas.This also adds a test case that would have caught BEAM-8111.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.