-
Notifications
You must be signed in to change notification settings - Fork 4.5k
#34009 avro generic record to beam row conversion added support for a… #34024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ll logical types and conversions
|
Run Java PreCommit |
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers. If you would like to opt out of this review, comment R: @robertwb for label java. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
| * limitations under the License. | ||
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to include a license header here since we're not distributing this file (and it is under the RAT exclusion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
buildSrc/build.gradle.kts
Outdated
|
|
||
| runtimeOnly("com.google.protobuf:protobuf-gradle-plugin:0.8.13") // Enable proto code generation | ||
| runtimeOnly("com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1") // Enable Avro code generation | ||
| runtimeOnly("com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.1.0") // Enable Avro code generation. Version 1.1.0 is the last supporting avro 1.10.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about downgrading this dependency. Why do we specifically need to target 1.1.0? It is a rather old dependency, and downgrading could introduce other problems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
restored 1.9.1. My thinking is that since the earliest tested avro version is 1.10.2 (see sdks/java/extensions/avro/build.gradle) , it would be good to use a version of the avro plugin that uses avro 1.10.2.
In any case, the whole plugin seems no longer maintained, so maybe it would be a good idea to remove this altogether and just use the avro-tools jar, like elsewhere. seems like this could be addressed in a separate task though
...tensions/avro/src/main/java/org/apache/beam/sdk/extensions/avro/schemas/utils/AvroUtils.java
Show resolved
Hide resolved
|
Adding some folks who may have some more context here: @aromanenko-dev @Abacn |
…underlying type instead of types.Alias (#33868)
* add endpoint type to WorkerMetadataResponse proto * add default value to endpoint_type
* Add UUID support in Spanner Schema * Add test
|
Run Java Avro Versions PostCommit |
|
@aromanenko-dev I am not sure how to proceed, would you mind giving another quick review? thanks in advance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wollowizard Well, I'm not on the project anymore for about one year, so, tbh I didn't follow any recent Avro extension changes. So, maybe ask another person, who has more knowledge on this, to make sure that it doesn't break the things.
Though, on the first sight, it LGTM in case if all checks are passing and, especially, multiple versions Avro checks. Then I think we are good.
|
Thanks, I see there is already an approval. Added a minor comment regarding formatting, and could help merge after it is resolved. |
|
@Abacn thanks, I've made the change and resolved your comment |
|
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment R: @Abacn for label java. Available commands:
|
… for a… (apache#34024) * apache#34009 avro generic record to beam row conversion added support for all logical types and conversions * using string comparison to avoid class not found issues with earlier versions of avro * using string comparison to avoid class not found issues with earlier versions of avro * com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1 * using string comparison to avoid class not found issues with earlier versions of avro * com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1 * Add `types.Unalias` to types assertions and types switches to get an underlying type instead of types.Alias (apache#33868) * Revert huggingface transformers to 4.30.0 (apache#34025) * add endpoint type to WorkerMetadataResponse proto (apache#33953) * add endpoint type to WorkerMetadataResponse proto * add default value to endpoint_type * add hashcode/equals to WaitTest helper classes to avoid log error (apache#34006) * Add enable_lineage experiment to Dataflow tests (apache#34027) * Add UUID support in SpannerSchema (apache#34034) * Add UUID support in Spanner Schema * Add test * fix dashboard link (apache#34023) * [Go SDK] Add missing type inspection case for Alias types. (apache#34039) * removed unneeded license header * remove unneeded license header * Added tests for specific records generated with avro 1.8.2 and 1.9.2, and to add custom conversions * Supporting different UUID representations in different avro versions * Spotless fixes * fix dependency typo --------- Co-authored-by: Alfredo Scaccialepre <alfredo.scaccialepre@edreamsodigeo.com> Co-authored-by: synenka <97878236+synenka@users.noreply.github.com> Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com> Co-authored-by: martin trieu <martinkt@google.com> Co-authored-by: scwhittle <scwhittle@users.noreply.github.com> Co-authored-by: Yi Hu <yathu@google.com> Co-authored-by: Luv Agarwal <luvagarwal.k@gmail.com> Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com> Co-authored-by: Robert Burke <lostluck@users.noreply.github.com>
#34009 avro generic record to beam row conversion added support for all logical types and conversions
The PR modifies the Avro extension, specifically enhancing how Avro logical types are handled when converting between Avro records and Beam rows. It introduces support for a broader range of Avro logical types (e.g., decimal, uuid, timestamp-micros, etc.), improves compatibility with Avro’s SpecificRecord and GenericRecord, and ensures proper serialization/deserialization of these types.
The primary goal is to make AvroUtils more robust in converting Avro records (both GenericRecord and SpecificRecord) to Beam rows by fully supporting Avro’s logical types. A new
convertLogicalTypemethod is introduced to handle logical type conversions dynamically using a GenericData instance. This method checks for registered conversions (e.g., DecimalConversion, UUIDConversion) and applies them to transform Avro data into Beam-compatible formats.Instead of hardcoding conversions for specific logical types (e.g., TimestampMillis to ReadableInstant), the code now delegates to Avro’s GenericData conversion system. This makes it extensible to any logical type with a registered conversion, including custom ones.
Just fyi, my initial interest in this was given by the fact that I have some
SpecificRecords (subclass ofGenericRecord) for which I have classes generated with theavro-maven-plugin. They have java time fields, uuids and bigdecimal fields and the generated code also adds conversion to the GenericData of the generated class. So it seems that the generated code is nice and complete but the conversion to a beam row failed because beam's AvroUtils' code does not support all logical types and has a predefined list of accepted input types (for example does not accept a GenericRecord with a BigDecimal field, it needs to be ByteBuffer, its raw type) . I thought a more flexible approach would be to use the avro conversions.GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.