-
Notifications
You must be signed in to change notification settings - Fork 1.7k
AVRO-3048: Backport of performance fix to Record Builders to avro 1.10 branch #1333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: branch-1.10
Are you sure you want to change the base?
Conversation
* AVRO-3048: Use SpecificRecord's MODEL$ in builder creation * AVRO-3048: fix formatting of code templates
|
@RyanSkraba or @Fokko -- I see you helped review the original PR #1206, would you be able to help take a look at this? |
RyanSkraba
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey -- I definitely don't mind having this backported to 1.10, but I hope to see some movement on the 1.11.0 front this week!
|
Thanks for looking @RyanSkraba ! Speaking as an enterprise user of Avro, there is a fair amount of FUD around Avro minor version upgrades. I won't comment on whether or not it's warranted, but it definitely exists. My organization would benefit greatly from a patch release of Avro 1.10 that has this performance regression addressed, allowing for an easy upgrade without switching minor versions, and I'm sure many other organizations would benefit similarly. IMO, the case for a backport is particularly strong here as this is a fix for a performance regression, as opposed to a net-new enhancement. |
|
@xkrogen Please do not involve corporate/enterprise thinking/talking into OSS project unless your enterprise wants to offer its help! |
This is a fair criticism! I brought it up on the mailing list, and your comments are welcome. I feel like the community would be open to making changes! There's also a very quiet #avro channel on the ASF slack if you want to chat openly or privately (but as always, decisions are made on the mailing list). |
|
Sorry folks, I feel I may have come off in a different way than I intended.
My point was simply that there are large bases of Avro users who could benefit from such a fix. My apologies if bringing an enterprise perspective into the discussion is contrary to the Avro project's normal thought processes. Noting that large-scale users (particularly if there are several) are interested in something is fairly common in the other Apache projects I've worked on, but I recognize that different communities can be very different. I completely understand that work is volunteer here and I am not expecting anything, only trying to provide perspective that may not have been considered.
Actually I did not mean this as a criticism of the Avro project, I think it can just as easily be a criticism of enterprise mindsets and upgrade paces :) But regardless, thank you for starting a discussion! Your mail has also made me see why my backport request might be a bit larger than I thought. It sounds like the Avro project has historically not really done patch releases on non-current minor versions, meaning the scope of the request has expanded beyond "backport this PR" to "backport this PR and cut a patch release specifically for this issue." Completely understood if that's too big of an ask, and I see now why you brought the 1.11.0 release into the picture. I will follow the dev list discussion. |
|
Oh, as a quick note -- the Avro convention (for historical reasons) is that All that being said, pragmatically for your immediate problem, I think you'll find |
|
@xkrogen Have you tried 1.11.0 ? |
|
Hi @martin-g, thanks for checking in. Not yet, we are still working through the process of bringing our ecosystem onto Avro 1.10 and have worked around this issue in the meantime with a change in the wrapper we use for generating SpecificRecord classes. I do hope/anticipate it will be easy for us to move users to Avro 1.11 once this is complete, but we're taking it one step at a time. |
|
I brought up the issue of supporting more than one version at a time on the mailing list again! If it would make your work easier, please go ahead and weigh in :D It looks like your avro-util repo is one of the only places that deals with interoperating between Java SpecificRecords that were generated with different versions of Avro. I remain impressed by the task! Is this something that the Avro project can or should be more involved in? Either making it easier for your utilities or doing some specific tests when validating a new release candidate? Is this something to bring up on the mailing list? |
|
Thanks @RyanSkraba ! The |
|
Hey folks. I'd like to elaborate a little here about why the avro-util project exists. we can then see if involving upstream avro makes sense in anything. TL;DR - when using avro in a large, diverse ecosystem, supporting multiple versions of avro simultaneously across the ecosystem is required. there a few things that would make this work easier listed near the bottom. the main reasons are code/schema reuse and portable libraries. Linkein uses avro for encoding across a lot of its "data plane" (kafka being a good example). a lot of these payloads are "shared models" - data produced by service X that is consumed by services Y and Z over a kafka topic, or any other storage medium or rpc. as such all parties involved must have "the same schema". this is especially true for avro - which requires the exact writer schema on-hand for decoding - vs other serialization formats. the best way to guarantee all parties have the same schema is to have the schema(s) in question in their own library, imported by all parties. schemas are not the only thing shared here, code is as well: its much more convenient for developers to operate on generated POJO classes than it is to operate on generic records, and there are libraries who's top level APIs accept/return IndexedRecords, meaning they expose avro. given the sheer number of codebases involved, and that some constraints on avro are beyond the ability of an organization to control (dictated by requirements of 3rd party external libraries), its not feasible to align on a single version of avro across the organization, and individual codebases change their version of avro on their own schedules. and so, even maintaining a library of generated record classes, not to mention writing a "portable" library that accepts/returns IndexedRecords requires navigating a very wide range of runtime avro versions. we do so by combination of adapters (different implementations of "the same thing" for different avro versions) and tweaking the generated specific classes. its obviously unreasonable to expect the avro API to never change. however, there have been previous breaking changes that could have been done "nicer". some examples are below if youre curious. things the avro project could change to make maintaining avro-util easier:
examples of past breaking changes that could have been less breaking:
|
|
Can this PR be merged please, so users can upgrade their minor version and enjoy from this perf. fix? |
Backport of PR #1206 to 1.10 branch. For details, also see there.