Skip to content

Conversation

@benkonz
Copy link
Contributor

@benkonz benkonz commented Mar 27, 2024

attempts to fix several issues regarding avro <-> beam schema conversion in the python sdk

  1. [Bug]: Python SDK avro to beam schema conversion defaults to Any type for nullable atomic union fields #30750
  2. VarInt.java expects ints and longs to be unsigned, however the current implementation just returns the signed number, which causes an IOException("varint overflow " + r) exception to get thrown when using the converted beam schema with a SqlTransform

NOTE: I'm not entirely sure about my logic here, since the resulting beam.Rows have the converted 2's complement number will get returned if the user doesn't apply a SqlTransform to their pipeline, but the previous implementation was also causing errors, so I'll need some help here

  1. in avroio.py's beam_value_to_avro_value function, it recurses on avro_value_to_beam_value, rather than the beam_value_to_avro_value function. I'm 99% sure this should be the other way around, however since the two functions are so similar this previously wasn't causing issues. I only noticed it when I added the custom logic to convert to/from signed and unsigned numbers.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@benkonz
Copy link
Contributor Author

benkonz commented Mar 27, 2024

CC: @robertwb , mentioned in the description but I'm not entirely sure about my logic for converting between signed and unsigned 2's complement numbers to satisfy VarInt.java. Would you be able to assist me here?

@benkonz
Copy link
Contributor Author

benkonz commented Mar 27, 2024

tests failing due to the SqlTransform shadowJar not being available in the test environment. Building the jar locally via ./gradlew :sdks:java:extensions:sql:expansion-service:shadowJar makes the tests pass. Not sure how to set this up in the CI environment.

@github-actions
Copy link
Contributor

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@benkonz
Copy link
Contributor Author

benkonz commented Mar 28, 2024

assign set of reviewers

see comment above about the failing tests.

@github-actions
Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @riteshghorse for label python.
R: @bvolpato for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Copy link
Contributor

@riteshghorse riteshghorse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit. Tagging our schema expert R: @ahmedabu98

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching and fixing these bugs!

Can we add some unit tests for the helper methods too? (ie. avro_union_type_to_beam_type, avro_atomic_value_to_beam_atomic_value, beam_atomic_value_to_avro_atomic_value)

@benkonz
Copy link
Contributor Author

benkonz commented Apr 4, 2024

@ahmedabu98 could you take another look at this PR?

@ahmedabu98
Copy link
Contributor

@benkonz thanks for making the changes. will try to take a look next week

@benkonz
Copy link
Contributor Author

benkonz commented Apr 15, 2024

@ahmedabu98 thanks for taking another look at my PR! I've been pretty busy lately, but I'll try to find some time this week to address your comments.

@benkonz
Copy link
Contributor Author

benkonz commented Apr 19, 2024

@ahmedabu98 I addressed your review comments, could you take another look? Thanks!

Copy link
Contributor

@ahmedabu98 ahmedabu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for addressing the comments, this looks great! I appreciate the documentation too

We may need to add support for nullable arrays and maps later, but this PR is good to go

@ahmedabu98 ahmedabu98 merged commit 45e7857 into apache:master Apr 24, 2024
@Abacn
Copy link
Contributor

Abacn commented Apr 25, 2024

This breaks validatesCrossLanguageRunnerPythonUsingSql test:

e.g. https://github.com/apache/beam/actions/runs/8820828279/job/24215280605

FAILED apache_beam/io/avroio_test.py::TestFastAvro::test_avro_schema_to_beam_schema_with_nullable_atomic_fields - OSError: No files found based on the file pattern /tmp/tmphd4i_w_b*

@benkonz
Copy link
Contributor Author

benkonz commented Apr 25, 2024

This breaks validatesCrossLanguageRunnerPythonUsingSql test:

e.g. https://github.com/apache/beam/actions/runs/8820828279/job/24215280605

FAILED apache_beam/io/avroio_test.py::TestFastAvro::test_avro_schema_to_beam_schema_with_nullable_atomic_fields - OSError: No files found based on the file pattern /tmp/tmphd4i_w_b*

@Abacn oh, that was a test I added in this PR that was marked as skipped when the CI ran. It was working when I last tested on my machine. Let me do some debugging. We can revert my PR if necessary until we can get the tests passing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants