-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[BEAM-9641] Support ZetaSQL DATE type as a Beam LogicalType #11272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3003dbb to
cca562d
Compare
|
What do you think about going ahead and defining the date logical type in cc: @reuvenlax |
32590e3 to
99bff44
Compare
Done. Thanks for the suggestion. I made the |
99bff44 to
7b89065
Compare
apilloud
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Can you clean up the style and call equals on the constant instead of logicalId (which could be null).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about using a switch statement? Is there any style guidance on using switch on a String in java?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I hope I could use a switch statement here, but unfortunately there is no constant IDENTIFIER defined in the LogicalType class. (I could add it to each concrete SQL logical type I create, but I don't think that is a good style.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Call equals on constant to avoid null issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This order of equals is awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth documenting that the Long is an offset from an epoch (and what that epoch is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading the correctly, LocalDate is the in memory type (a struct) and Long is the wire format (an offset from epoch)? This conversion could be quite expensive. It appears the Calc nodes both take an offset in this case, when we start to think about performance we might need to change the in memory type to be offset based.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(If changing the in memory type is going to be difficult in the future, consider doing that now.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is unfortunate... but what in-memory type should we use instead? joda.time.LocalDate uses a millisecond long, do we want to add another joda dependency?
We could access the base type (wire format type) directly in SQL with Row#getBaseValue, but unfortunately Rows store logical types as the input type (in memory format type), so that wouldn't actually avoid a conversion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess java.sql.Date is another option for a java type backed by millis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider not using a JVM, it adds performance overhead too. 🤓
I'm reasonably convinced the wire format is good and the conversion here is lossless, so if there isn't a easy drop-in replacement leave this as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Andrew is basically suggesting using a PassThroughLogicalType<Long> as a logical type for DATE. I think we could definitely consider this if performance becomes a problem in the future. (It's not easy to change the in-memory type for Date after it is made public, but we can easily define a new SqlDate.) For now I think we can leave it as is. It's more human readable (e.g. writing tests for DATE type in spec tests is simpler).
|
Oops, forgot to include in my comments: ZetaSQL's range is much smaller than the underlying type, can you add a test or two for that? How do out of range values fail? (Also worth asking, do we need any special treatment for boundary conditions ( |
|
Ah, just realized that the previous comments were not sent out. |
|
Could you help trigger the tests again? For the comment on range: Thanks for pointing it out. I overlooked this problem. I would like to create a separate PR to address it, along with range testing for other types as well. |
|
retest this please |
|
The failing test |
|
Run Java PreCommit |
|
Run SQL Postcommit |
|
Rebased against master. Please run precommit tests again. |
|
retest this please |
1 similar comment
|
retest this please |
|
Java PreCommit failed due to a build failure. Please help run again. |
|
Run Java PreCommit |
2 similar comments
|
Run Java PreCommit |
|
Run Java PreCommit |
|
Something just occurred to me - are there any tests that use the DATE Type in an aggregation (e.g. MAX)? I'd think that would run into the same issue I have in #11456 (processing logical types using their representation) |
|
Interesting question. You should probably add a test for JOIN as well, which will have a similar class of problems. |
No. Thanks for bringing this up. I think it is likely to run into the problem. |
This PR adds support of all ZetaSQL (BigQuery Standard SQL) DATE functions to BeamSQL:
r: @apilloud
cc: @TheNeuralBit @kennknowles
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
Post-Commit Tests Status (on master branch)
Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.