[WIP][BEAM-8715] Bump Avro version to 1.9.2#17372
[WIP][BEAM-8715] Bump Avro version to 1.9.2#17372aromanenko-dev wants to merge 4 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17372 +/- ##
==========================================
- Coverage 73.99% 73.98% -0.02%
==========================================
Files 685 686 +1
Lines 89727 89942 +215
==========================================
+ Hits 66395 66542 +147
- Misses 22172 22240 +68
Partials 1160 1160
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
ffe0881 to
236125a
Compare
|
R: @TheNeuralBit |
|
Run Java PostCommit |
|
CC: |
TheNeuralBit
left a comment
There was a problem hiding this comment.
I think the approach for Beam schemas looks reasonable. I'm concerned that users using Avro APIs directly will be broken by the switch to java.time though. How can we ease the transition for them?
| } else if (fieldValue instanceof java.time.Instant) { | ||
| return (T) org.joda.time.Instant.ofEpochMilli(((Instant) fieldValue).toEpochMilli()); | ||
| } | ||
| return (T) fieldValue; |
There was a problem hiding this comment.
I think this logic should be in the Avro getters, instead of branching on this instance check.
There was a problem hiding this comment.
I'm not sure it's related to only Avro case but since Beam schema uses joda time internally then java time should be converted in any case until we won't switch to java time (if we will, of course).
There was a problem hiding this comment.
@aromanenko-dev I think @TheNeuralBit is right. Based on benchmarks I've done just recently, branching in RowWithGetters doesn't perform well. In #17172 I'm suggesting to push all of the current code down into the getters.
The GetterBasedSchemaProviders (except the Avro one) only support Joda ReadableInstant (type is the method return type or field type) as DATETIME. Attempting to use java time would most likely fail during schema generation (or generate a row schema with nested internal fields)
For Avro there's already a conversion layer in place you could leverage for that. For DATETIME it's using these converters:
| shadowTest library.java.avro_tests | ||
| shadowTest library.java.zstd_jni | ||
| shadowTest library.java.jamm | ||
| shadowTest library.java.xz_java |
There was a problem hiding this comment.
Why do we need to add this?
There was a problem hiding this comment.
Because starting from Avro 1.9.0 this dependency became provided but we need this for tests
| classesInPackage("org.apache.avro"), | ||
| classesInPackage("org.apache.beam"), | ||
| classesInPackage("org.apache.commons.logging"), | ||
| classesInPackage("org.codehaus.jackson"), |
There was a problem hiding this comment.
This is not an Avro's dependency anymore .
|
@TheNeuralBit Thanks for taking a look! |
|
Run Spark ValidatesRunner |
|
Run Spark StructuredStreaming ValidatesRunner |
|
Run Python Spark ValidatesRunner |
|
Run SQL PreCommit |
|
Run Java PreCommit |
|
Run Java PostCommit |
|
Run Spark ValidatesRunner |
Fair point - I guess it's ok if we communicate it clearly to users. Should we raise this on the dev list? |
|
@TheNeuralBit Yes, good point. I'm going to start a thread on this topic on mailing list once I'll be back from vacation. |
|
I close this one since, based on email discussion, we won't follow this way in terms of Avro version update. Though, still useful to have these changes somewhere that are needed to do in Beam to support more recent Avro versions. |
Bump Avro version to 1.9.2
The main changes:
java.time.*instead oforg.joda.time.*. So, we need to adjust date/time conversions from/to Beam schema accordingly since Beam schema still usesjoda.time.avro-compiler(if any)Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
R: @username).[BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replaceBEAM-XXXwith the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.