Skip to content

restore and deprecate AggregatorFactory methods#11917

Merged
clintropolis merged 3 commits intoapache:masterfrom
clintropolis:agg-factory-deprecate-sadness
Nov 19, 2021
Merged

restore and deprecate AggregatorFactory methods#11917
clintropolis merged 3 commits intoapache:masterfrom
clintropolis:agg-factory-deprecate-sadness

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Nov 12, 2021

Description

This PR renames AggregatorFactory.getType and AggregatorFactory.getFinalizedType to getIntermediateType and getResultType, and restores the previous 3 methods getType, getFinalizedType, and getComplexTypeName to their earlier incarnations before #11713, making default implementations of getIntermediateType and getResultType out of these previous methods. The previous methods all now have default implementation that explode so that they did not need restored to all existing AggregatorFactory implementations.

This is hella ugly, and now NONE of these methods are abstract, but it will be less disruptive to extension writers.

I am going to effectively revert this after the 0.23 release, but this at least gives slightly more warning to extension writers than release notes. I guess?


Key changed/added classes in this PR
  • AggregatorFactory

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 13, 2021

Contrarian opinion: I don't see how deferring the breaking change for 1 release helps anyone. People still need to do the work before they can upgrade to some version, whether it's 0.23 or 0.24. And they won't be that far apart. So why not do the breaking change now?

(On the other hand, it would be helpful to defer the breaking change for a longer period, and use that time to batch it up with other breaking changes. Breaking a bunch of things all at once from time to time, and having all other releases be non-breaking, is easier for people to deal with than breaking a little bit each release. But I don't advocate this plan for reasons I get into in the rest of the comment.)

The rest of this comment is a bit of soapboxing, so feel free to ignore it if you aren't into that sort of thing.

The soapboxy statement is that I'm sympathetic to the Linux kernel driver school of thought when it comes to query extensions like new query types, filters, aggregators, etc. You can read about that school of thought here: https://www.kernel.org/doc/html/latest/process/stable-api-nonsense.html. There's a very abbreviated TLDR right there in the URL.

In our situation, maintaining stability for query extensions has upsides that affect a relatively small number of sites where there are site-specific query extensions in play, but has downsides that affect all sites.

The main downside of maintaining stable interfaces is that it creates extra work. For example: it's common for us to add a method that gets some new information that query engines will use to make better decisions. Supporting a stable interface means we need to include handling for that information being unknown. On the other hand, if we break the interface then we can also get rid of the code that handles the "unknown" case. It simplifies the core, reduces the surface area that needs to be tested and maintained, and makes the user experience more uniform, all of which are good things.

Being more free with changing these interfaces would help us evolve them faster, and would benefit the majority of sites that don't have their own site-specific extensions. And just like in Linux, if people with site-specific extensions get tired of maintaining them, that problem can be solved by contributing them to mainline in extensions-contrib.

Btw, a coda. Our equivalent of the Linux kernel syscall API (which is famously stable) is our HTTP APIs and query API. I am sympathetic to the Linux view there too, and I kind of wish they were more stable than they are.

Copy link
Copy Markdown
Contributor

@cheddar cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. In response to the contrarian opinions: changing the API suddenly doesn't allow any time nor any way to verify the changes for an extension author. By having a version that introduces the new API and identifies that the old one will be removed, it allows for extensions to adjust to the new API on an actively running version and then push up the Druid version. For organizations this is important as it allows them to actually schedule the work and do it in pieces during sprints rather than needing to come up with a big-bang replacement.

On the argument of increasing support burden, that argument doesn't apply here as the "unknown complex" is needed anyway, you'll note that this change isn't introducing UNKNOWN_COMPLEX it is just using it in a default implementation. Also, the AggregatorFactory is essentially our UDF interface. While the HTTP APIs are definitely "super stable", when it comes to the spectrum of API stability, UDFs are another API in a DB that deserve similar treatment. You might even be able to argue that they are part of the query API.

If we were jumping through significant hoops to maintain backwards compatibility on, say, the Segment interface, that would definitely not be meaningful. Even changes to the QueryToolChest or the QueryRunnerFactory could be argued to not need as strong of a guarantee. But, given that this is

  1. This is the single-most extended API of Druid's extensions
  2. we are adding a few methods whose only detractor is that they make the interface ugly
  3. this provides a landing place for changes that helps align with the realities of what it means to work inside of a company/organization/sprints
  4. It will exist for a single release

I think this is very low on the pain spectrum and high on the "user friendliness" spectrum.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 13, 2021

First I want to say that I am not intending to block this patch. If people think it's a good idea then please go for it. It's harmless for all the reasons that @cheddar mentioned. I was just trying to articulate an opinion about extension interface changes generally:

  1. Maintaining stable extension interfaces is sometimes more trouble than it's worth.
  2. It probably doesn't help very many people to defer breaking changes by 1 release, because my guess is most people don't upgrade through every single version. If people skip the next version they'll be back to getting a sudden surprise. If one wanted to solve this problem, a good solution is to batch up interface changes and make them once every year or two. But I'm not really advocating that; see (1).

It's a good point that UDFs and UDAFs are the extension points that have the strongest benefits from being stable. I agree. I'm not 100% sure if I agree that the benefit is enough to justify the maintenance overhead in general, but, it could be. And if anything does meet that bar it'd be UDFs and UDAFs.

@cheddar
Copy link
Copy Markdown
Contributor

cheddar commented Nov 14, 2021

Good to get clarity that you aren't intending to block the PR. I think we agree that sometimes the extra work on maintaining compatibility is just not worth it, but that's always on a case-by-case basis.

@clintropolis
Copy link
Copy Markdown
Member Author

Just to weigh in - despite raising this PR I'm actually in the camp that @gianm is in on this, else I probably would've done #11713 like this in the first place. I guess in my head I've been assuming whatever we eventually call Druid 1.0 is where I have been considering taking up more conservative stances on extension point changes (but maybe also 1.0 means nothing because people use Druid in production environments and have for years so isn't a real excuse).

Assuming that those who implement custom aggs are at least recompiling their extension every release (which they really really should be doing), then the changes in #11713 seem like not a very big deal to me, because it would fail to compile and so be noticed immediately and has a very easy fix that shouldn't actually require extensive testing. This isn't changing the interface of the aggregator implementations themselves, which I think would be more involved for extension writers and require more care, just requiring factories to provide additional information.

On the argument of increasing support burden, that argument doesn't apply here as the "unknown complex" is needed anyway, you'll note that this change isn't introducing UNKNOWN_COMPLEX it is just using it in a default implementation.

This isn't quite accurate, not implementing getFinalizedColumnType (as it has been renamed in this PR) is a loss of information when it has to fall back on UNKNOWN_COMPLEX compared to aggregators that properly implement the interface (UNKNOWN_COMPLEX was added in #11713 as a short term solution for serde compat stuff, and eventually should be removed). Prior to #11713 there was no possible way to get what actual complex type an aggregator would finalize to, since there was no getFinalizedColumnTypeName method (part of the work of #11713 was going through all existing aggregator implementations and filling in the correct type information).

The engine (and future changes I would like to do to it) would generally prefer to have the complete type information however, which is only available by actually implementing the new methods. I think it could be fairly argued that nothing is broken (yet) by falling back to unknown complex since we haven't heavily leaned into everything #11713 made possible (and since that PR introduced it it acknowledges that things are incomplete and stuff has to compensate for it). It does makes it harder to actually take advantage of this new information though, since it draws out the period that we still need to handle the unknown case.

On the subject of this PR only impacting a single release, I'm not entirely sure that is true either unless we just leave these functions named as they are renamed in this PR - getColumnType and getFinalizedColumnType. To get fully back into the state of #11713, e.g. with the methods named getType and getFinalizedType, I think it would take at least 3 releases to get done if we do it incrementally like this. 0.23 which would deprecate and add the new methods, 0.24 which could remove the old versions, and possibly make new versions of getType and getFinalizedType back to how they are in #11713 with default implementations that call getColumnType and getFinalizedColumnType which would then be deprecated, and then finally in 0.25 we can remove getColumnType and getFinalizedColumnType. This disrupts extension writers at least twice instead of once, which seems worse?

I guess I obviously don't feel so strongly as to block this change either, lest I wouldn't have opened this PR, but I still don't personally feel like we gain much to make it strictly necessary either, and it definitely isn't without pain, so I'm at best a +0 on this (talked up from a -1).

+1 for having this discussion though... should we take it to the dev list to try to get more feedback on this?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Nov 17, 2021

After further reflection I'd like to advocate for an extension interface policy that backward compatibility and longer deprecation periods are preferred when they aren't too disruptive, but also that we shouldn't be hesitant to break compatibility immediately if that enables us to make some substantial improvement. In this case, I don't think a substantial improvement is being blocked, so I would stick with the change in this PR.

On the subject of this PR only impacting a single release, I'm not entirely sure that is true either unless we just leave these functions named as they are renamed in this PR - getColumnType and getFinalizedColumnType.

Faced with this fact I would say let's stick with getColumnType and getFinalizedColumnType, rather than planning to change them back to getType and getFinalizedType. It simplifies our lives (less work, less churn) and either name seems equally ok.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in light of the logic in #11917 (comment)

public ColumnType getColumnType()
{
if (getType() == ValueType.COMPLEX) {
return ColumnType.ofComplex(getComplexTypeName());
Copy link
Copy Markdown
Contributor

@gianm gianm Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getComplexTypeName wouldn't necessarily have been implemented for aggregators that aren't ingestible, so this should catch UnsupportedOperationException and treat it as an unknown complex type. (Or have the default impl of getComplexTypeName return null and check for null here)

@clintropolis
Copy link
Copy Markdown
Member Author

Faced with this fact I would say let's stick with getColumnType and getFinalizedColumnType, rather than planning to change them back to getType and getFinalizedType. It simplifies our lives (less work, less churn) and either name seems equally ok.

I've changed these method names to getIntermediateType and getResultType to avoid room for confusion with input column types.

@clintropolis clintropolis merged commit f260bbe into apache:master Nov 19, 2021
@clintropolis clintropolis deleted the agg-factory-deprecate-sadness branch November 19, 2021 23:59
clintropolis pushed a commit that referenced this pull request Jan 11, 2022
This change mimics what was done in PR #11917 to
fix the incompatibilities produced by #11713. #11917
fixed it with AggregatorFactory by creating default
methods to allow for extensions built against old
jars to still work.  This does the same for PostAggregator
@abhishekagarwal87 abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022
sachinsagare pushed a commit to pinterest/druid that referenced this pull request Oct 28, 2022
This change mimics what was done in PR apache#11917 to
fix the incompatibilities produced by apache#11713. apache#11917
fixed it with AggregatorFactory by creating default
methods to allow for extensions built against old
jars to still work.  This does the same for PostAggregator

(cherry picked from commit eb0bae4)
sachinsagare pushed a commit to sachinsagare/druid that referenced this pull request Nov 2, 2022
This change mimics what was done in PR apache#11917 to
fix the incompatibilities produced by apache#11713. apache#11917
fixed it with AggregatorFactory by creating default
methods to allow for extensions built against old
jars to still work.  This does the same for PostAggregator

(cherry picked from commit eb0bae4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants