-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Another attempt to fix timestamp and decimal logical type. #35426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc'ed @claudevdm |
- Used schema registry directly if it exists. - Fixed a bug on schema registry. - Triggered post commit tests for jdbcio. - Fixed failed tests and trigger dataflow test on xlang. - Fix schema test by adding field description in named_fields_to_schema
adc399a to
996fed9
Compare
|
|
||
| def add(self, typing, schema): | ||
| if not schema.id: | ||
| if schema.id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why we change this line in #28169 to disable the look up of schema registry.
Let's see if this breaks anythong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Abacn, do you know which test workflow covers this test?
sdks/java/io/google-cloud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryStorageWriteApiSchemaTransformProviderTest.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local tests passed with ./gradlew --info sdk:java:io:google-cloud-platform:test --tests 'org.apache.beam.sdk.io.gcp.bigquery.providers.*'.
This reverts commit 6ac5009.
d497915 to
19add94
Compare
|
Both PostCommit Python Xlang Gcp Direct and PostCommit Python Xlang Gcp Dataflow are failing on master for a day, so it may not be related. |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #35426 +/- ##
============================================
+ Coverage 56.51% 56.54% +0.02%
Complexity 3319 3319
============================================
Files 1198 1199 +1
Lines 182839 183113 +274
Branches 3426 3426
============================================
+ Hits 103332 103535 +203
- Misses 76208 76279 +71
Partials 3299 3299
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I am seeing the following error in the test(test_xlang_jdbc_write_read_0_postgres (apache_beam.io.external.xlang_jdbcio_it_test.CrossLanguageJdbcIOTest): |
|
The failed tests have been red for a few days and are not related to this change. |
|
Assigning reviewers: R: @liferoad for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Consider run https://github.com/apache/beam/actions/workflows/beam_PostCommit_Python_Xlang_Gcp_Direct.yml?query=event%3Aschedule which contains more relevant tests (e.g. bigquery IO write xlang test that also involves Decimal) |
I did run that.
|
Abacn
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, feel free to merge after tests passed
Thanks. Tests are not healthy at the moment. Hopefully I can push before 2.67 branch cut. |
|
PostCommit Python tests have been red for a while and Python Coverage failure is not related to the commit (Timeout). Xlang Gcp Direct (https://github.com/apache/beam/actions/runs/16331090330/job/46133582842?pr=35426) is green. Merging it one week before the branch cut so that we can see if there is any breakage before it is too late. |
fixes #35512
PR #35243 has fixed the logical type for
JdbcTimeType(TIME field in SQL) andJdbcDateType(DATE field in SQL). However, it fails when the TIMESTAMP field is used. Notice that when this happens, the schema proto of this field is of logical typeMillisInstant.This causes failures in one of the post commit xlang test #35285.
There is already a fix proposed #35400, and this PR is another attempt.
As stated in #35243 (comment), the root cause of this problem is that the same language type (e.g.
TimeStamp) can be associated to multiple logical types (MillisInstantandMicrosInstant). It becomes a problem because of how these types are passed through pipeline construction and submission.MillisInstant) from urn (beam:logical_type:millis_instant:v1) in the schema proto and convert it into the language type (Timestamp). The typehint of Beam Schema is generated for the pcollection as an element type.to_runner_apito convert the whole pipeline into a proto. When it tries to serialize the RowCoder, it tries to revert what has done in the pipeline expansion by generating the schema proto again based on the typehint from the RowCoder. In our case,Timestampis a component in the Row, but when the SDK trying to determine its logical type for the schema proto, it resolves it toMicrosInstantbecause it is defined afterMillisInstant.The proposed approach here has three parts:
ReadFromJdbc, the fix in 1. won't work because the schema is provided by the users (rather than coming from the Expansion service). There won't be a schema proto in the schema registry. In this case, we apply the workaround insideReadFromJdbconly in the scope of converting typehint to schema proto, making sure the language type ofTimestampwill be mapped to logical typeMillisInstantduring the conversion.