Skip to content

Conversation

@wollowizard
Copy link
Contributor

@wollowizard wollowizard commented Feb 19, 2025

#34009 avro generic record to beam row conversion added support for all logical types and conversions

The PR modifies the Avro extension, specifically enhancing how Avro logical types are handled when converting between Avro records and Beam rows. It introduces support for a broader range of Avro logical types (e.g., decimal, uuid, timestamp-micros, etc.), improves compatibility with Avro’s SpecificRecord and GenericRecord, and ensures proper serialization/deserialization of these types.
The primary goal is to make AvroUtils more robust in converting Avro records (both GenericRecord and SpecificRecord) to Beam rows by fully supporting Avro’s logical types. A new convertLogicalType method is introduced to handle logical type conversions dynamically using a GenericData instance. This method checks for registered conversions (e.g., DecimalConversion, UUIDConversion) and applies them to transform Avro data into Beam-compatible formats.

Instead of hardcoding conversions for specific logical types (e.g., TimestampMillis to ReadableInstant), the code now delegates to Avro’s GenericData conversion system. This makes it extensible to any logical type with a registered conversion, including custom ones.

Just fyi, my initial interest in this was given by the fact that I have some SpecificRecords (subclass of GenericRecord) for which I have classes generated with the avro-maven-plugin. They have java time fields, uuids and bigdecimal fields and the generated code also adds conversion to the GenericData of the generated class. So it seems that the generated code is nice and complete but the conversion to a beam row failed because beam's AvroUtils' code does not support all logical types and has a predefined list of accepted input types (for example does not accept a GenericRecord with a BigDecimal field, it needs to be ByteBuffer, its raw type) . I thought a more flexible approach would be to use the avro conversions.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@wollowizard
Copy link
Contributor Author

Run Java PreCommit

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@wollowizard
Copy link
Contributor Author

assign set of reviewers

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @robertwb for label java.
R: @damccorm for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need to include a license header here since we're not distributing this file (and it is under the RAT exclusion)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed


runtimeOnly("com.google.protobuf:protobuf-gradle-plugin:0.8.13") // Enable proto code generation
runtimeOnly("com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1") // Enable Avro code generation
runtimeOnly("com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.1.0") // Enable Avro code generation. Version 1.1.0 is the last supporting avro 1.10.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about downgrading this dependency. Why do we specifically need to target 1.1.0? It is a rather old dependency, and downgrading could introduce other problems

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

restored 1.9.1. My thinking is that since the earliest tested avro version is 1.10.2 (see sdks/java/extensions/avro/build.gradle) , it would be good to use a version of the avro plugin that uses avro 1.10.2.

In any case, the whole plugin seems no longer maintained, so maybe it would be a good idea to remove this altogether and just use the avro-tools jar, like elsewhere. seems like this could be addressed in a separate task though

@damccorm
Copy link
Contributor

Adding some folks who may have some more context here: @aromanenko-dev @Abacn

@github-actions github-actions bot added gcp and removed io gcp labels Feb 25, 2025
@wollowizard wollowizard requested a review from damccorm February 26, 2025 15:59
@damccorm damccorm requested review from aromanenko-dev and removed request for damccorm February 26, 2025 19:42
@wollowizard
Copy link
Contributor Author

Run Java Avro Versions PostCommit

@wollowizard
Copy link
Contributor Author

@aromanenko-dev I am not sure how to proceed, would you mind giving another quick review? thanks in advance

Copy link
Contributor

@aromanenko-dev aromanenko-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wollowizard Well, I'm not on the project anymore for about one year, so, tbh I didn't follow any recent Avro extension changes. So, maybe ask another person, who has more knowledge on this, to make sure that it doesn't break the things.

Though, on the first sight, it LGTM in case if all checks are passing and, especially, multiple versions Avro checks. Then I think we are good.

@wollowizard
Copy link
Contributor Author

@damccorm @Abacn following the last comment, would you be able to take a look at this?

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @robertwb @damccorm

@Abacn
Copy link
Contributor

Abacn commented Mar 13, 2025

Thanks, I see there is already an approval. Added a minor comment regarding formatting, and could help merge after it is resolved.

@wollowizard
Copy link
Contributor Author

@Abacn thanks, I've made the change and resolved your comment

@github-actions
Copy link
Contributor

Reminder, please take a look at this pr: @robertwb @damccorm

@github-actions
Copy link
Contributor

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @Abacn for label java.
R: @Abacn for label build.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

@Abacn Abacn merged commit 3ea39d9 into apache:master Mar 27, 2025
24 checks passed
liferoad pushed a commit to liferoad/beam that referenced this pull request Apr 4, 2025
… for a… (apache#34024)

* apache#34009 avro generic record to beam row conversion added support for all logical types and conversions

* using string comparison to avoid class not found issues with earlier versions of avro

* using string comparison to avoid class not found issues with earlier versions of avro

* com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1

* using string comparison to avoid class not found issues with earlier versions of avro

* com.github.davidmc24.gradle.plugin:gradle-avro-plugin:1.9.1

* Add `types.Unalias` to types assertions and types switches to get an underlying type instead of types.Alias (apache#33868)

* Revert huggingface transformers to 4.30.0 (apache#34025)

* add endpoint type to WorkerMetadataResponse proto (apache#33953)

* add endpoint type to WorkerMetadataResponse proto

* add default value to endpoint_type

* add hashcode/equals to WaitTest helper classes to avoid log error (apache#34006)

* Add enable_lineage experiment to Dataflow tests (apache#34027)

* Add UUID support in SpannerSchema (apache#34034)

* Add UUID support in Spanner Schema

* Add test

* fix dashboard link (apache#34023)

* [Go SDK] Add missing type inspection case for Alias types. (apache#34039)

* removed unneeded license header

* remove unneeded license header

* Added tests for specific records generated with avro 1.8.2 and 1.9.2, and to add custom conversions

* Supporting different UUID representations in different avro versions

* Spotless fixes

* fix dependency typo

---------

Co-authored-by: Alfredo Scaccialepre <alfredo.scaccialepre@edreamsodigeo.com>
Co-authored-by: synenka <97878236+synenka@users.noreply.github.com>
Co-authored-by: Vitaly Terentyev <vitaly.terentyev@akvelon.com>
Co-authored-by: martin trieu <martinkt@google.com>
Co-authored-by: scwhittle <scwhittle@users.noreply.github.com>
Co-authored-by: Yi Hu <yathu@google.com>
Co-authored-by: Luv Agarwal <luvagarwal.k@gmail.com>
Co-authored-by: Ahmed Abualsaud <65791736+ahmedabu98@users.noreply.github.com>
Co-authored-by: Robert Burke <lostluck@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.