Skip to content

[WIP][BEAM-8715] Bump Avro version to 1.9.2#17372

Closed
aromanenko-dev wants to merge 4 commits intoapache:masterfrom
aromanenko-dev:BEAM-8715-bump-avro-1.19.2
Closed

[WIP][BEAM-8715] Bump Avro version to 1.9.2#17372
aromanenko-dev wants to merge 4 commits intoapache:masterfrom
aromanenko-dev:BEAM-8715-bump-avro-1.19.2

Conversation

@aromanenko-dev
Copy link
Contributor

@aromanenko-dev aromanenko-dev commented Apr 14, 2022

Bump Avro version to 1.9.2

The main changes:

  • Avro completely moved to java.time.* instead of org.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still uses joda.time.
  • It will require users to regenerate generated Java code with avro-compiler (if any)
  • There are several failing tests (regression) that should be investigated (WIP).

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@aromanenko-dev aromanenko-dev marked this pull request as draft April 14, 2022 15:29
@codecov
Copy link

codecov bot commented Apr 14, 2022

Codecov Report

Merging #17372 (f3f28d6) into master (2aa24dc) will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #17372      +/-   ##
==========================================
- Coverage   73.99%   73.98%   -0.02%     
==========================================
  Files         685      686       +1     
  Lines       89727    89942     +215     
==========================================
+ Hits        66395    66542     +147     
- Misses      22172    22240      +68     
  Partials     1160     1160              
Flag Coverage Δ
python 83.63% <ø> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
sdks/python/apache_beam/utils/interactive_utils.py 87.80% <0.00%> (-7.32%) ⬇️
...pache_beam/runners/interactive/interactive_beam.py 77.81% <0.00%> (-4.00%) ⬇️
...python/apache_beam/runners/worker/worker_status.py 77.53% <0.00%> (-2.18%) ⬇️
...che_beam/runners/interactive/interactive_runner.py 90.84% <0.00%> (-2.12%) ⬇️
sdks/python/apache_beam/io/source_test_utils.py 88.01% <0.00%> (-1.39%) ⬇️
...ache_beam/runners/portability/expansion_service.py 91.83% <0.00%> (-1.35%) ⬇️
sdks/python/apache_beam/transforms/core.py 91.84% <0.00%> (-0.81%) ⬇️
...eam/runners/portability/fn_api_runner/execution.py 92.44% <0.00%> (-0.63%) ⬇️
...runners/interactive/display/pcoll_visualization.py 85.85% <0.00%> (-0.51%) ⬇️
...eam/runners/interactive/interactive_environment.py 90.18% <0.00%> (-0.34%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2aa24dc...f3f28d6. Read the comment docs.

@aromanenko-dev aromanenko-dev force-pushed the BEAM-8715-bump-avro-1.19.2 branch from ffe0881 to 236125a Compare April 14, 2022 15:59
@aromanenko-dev
Copy link
Contributor Author

R: @TheNeuralBit
R: @reuvenlax
I tried to upgrade gracefully Avro dependency to the next major version 1.9.2 (the main reason, finally, is to move to Avro 1.11,0 to avoid having CVEs that Avro brings). Could you take a brief look? Does it make sense for you?

@aromanenko-dev
Copy link
Contributor Author

Run Java PostCommit

@aromanenko-dev
Copy link
Contributor Author

CC:
R: @iemejia
R: @RyanSkraba

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the approach for Beam schemas looks reasonable. I'm concerned that users using Avro APIs directly will be broken by the switch to java.time though. How can we ease the transition for them?

} else if (fieldValue instanceof java.time.Instant) {
return (T) org.joda.time.Instant.ofEpochMilli(((Instant) fieldValue).toEpochMilli());
}
return (T) fieldValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic should be in the Avro getters, instead of branching on this instance check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's related to only Avro case but since Beam schema uses joda time internally then java time should be converted in any case until we won't switch to java time (if we will, of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aromanenko-dev I think @TheNeuralBit is right. Based on benchmarks I've done just recently, branching in RowWithGetters doesn't perform well. In #17172 I'm suggesting to push all of the current code down into the getters.

The GetterBasedSchemaProviders (except the Avro one) only support Joda ReadableInstant (type is the method return type or field type) as DATETIME. Attempting to use java time would most likely fail during schema generation (or generate a row schema with nested internal fields)

For Avro there's already a conversion layer in place you could leverage for that. For DATETIME it's using these converters:

shadowTest library.java.avro_tests
shadowTest library.java.zstd_jni
shadowTest library.java.jamm
shadowTest library.java.xz_java
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to add this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because starting from Avro 1.9.0 this dependency became provided but we need this for tests

classesInPackage("org.apache.avro"),
classesInPackage("org.apache.beam"),
classesInPackage("org.apache.commons.logging"),
classesInPackage("org.codehaus.jackson"),
Copy link
Contributor Author

@aromanenko-dev aromanenko-dev Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an Avro's dependency anymore .

@aromanenko-dev
Copy link
Contributor Author

aromanenko-dev commented Apr 15, 2022

@TheNeuralBit Thanks for taking a look!
Unfortunately, Avro introduced some breaking changes so I'm not sure that we can do something with that if we want to move forward with Avro version bump. Another option, potentially, would be to make it as provided deps but I'm not sure we can allow to do that - iirc, #16271 also introduced some breaking changes for Beam users).

@aromanenko-dev
Copy link
Contributor Author

Run Spark ValidatesRunner

@aromanenko-dev
Copy link
Contributor Author

Run Spark StructuredStreaming ValidatesRunner

@aromanenko-dev
Copy link
Contributor Author

Run Python Spark ValidatesRunner

@aromanenko-dev
Copy link
Contributor Author

Run SQL PreCommit

@aromanenko-dev
Copy link
Contributor Author

Run Java PreCommit

@aromanenko-dev
Copy link
Contributor Author

Run Java PostCommit

@aromanenko-dev
Copy link
Contributor Author

Run Spark ValidatesRunner

@TheNeuralBit
Copy link
Member

Unfortunately, Avro introduced some breaking changes so I'm not sure that we can do something with that if we want to move forward with Avro version bump.

Fair point - I guess it's ok if we communicate it clearly to users. Should we raise this on the dev list?

@aromanenko-dev
Copy link
Contributor Author

@TheNeuralBit Yes, good point. I'm going to start a thread on this topic on mailing list once I'll be back from vacation.

@aromanenko-dev
Copy link
Contributor Author

I close this one since, based on email discussion, we won't follow this way in terms of Avro version update. Though, still useful to have these changes somewhere that are needed to do in Beam to support more recent Avro versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants