Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.#2291
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.#2291
Conversation
There was a problem hiding this comment.
Are there any tests included that use MapLookupExtractor with isOneToOne == true?
There was a problem hiding this comment.
Minor comment, would it be better to call this property "isInjective", to be more consistent with LookupExtractionFn and such?
There was a problem hiding this comment.
just injective is appropriate. isInjective() is the proper bean convention for extracting value of boolean field injective
There was a problem hiding this comment.
@jon-wei and @drcrallen i think it is more clear to use OneToOne:
First the actual property of extraction type is already called ExtractionType.ONE_TO_ONE
Second i think OneToOne is more clear than the mathematical term injective.
|
Hi b-slim can you add more flavor to the master comment on how this differs from the dimension extraction approach, and what the performance ramifications are? |
There was a problem hiding this comment.
why not "injective" like the extraction function?
There was a problem hiding this comment.
There was a problem hiding this comment.
Ah its injective in LookupExtractionFn
There was a problem hiding this comment.
@drcrallen we decided to move away from LookupExtractionFn in favor of implementing DimensionSpec and use the lookup delegator to do the apply/unapply.
This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that LookupExtractor exposes methods that are not included at the ExtractionFn API it doesn't make sense to use lookups via ExtractionFN API
|
@drcrallen this need to merged ASAP, it is blocking the development of QTL. Even tho i think QTL won't be done for 0.9 but having this merged is very important. |
b0f6908 to
55d5186
Compare
There was a problem hiding this comment.
is binder.bind(..) necessary given that you are doing LifecycleModule.register(..) ?
There was a problem hiding this comment.
yes, that's what docs said.
There was a problem hiding this comment.
i'm wondering why it is not needed for DruidBroker.class then?
There was a problem hiding this comment.
@himanshug as you can see here registering is not enough to create the object, you need to either bind it to a scope or explicitly ask for it. maybe @cheddar can give a better explanation.
There was a problem hiding this comment.
@himanshug after testing via my IDE you are actually right, i hope @cheddar can provide a clarification ...
|
Is there a reason this needs to be in druid core and not an extension? |
There was a problem hiding this comment.
What do you think about making this
public LookupDimensionSpec(
@JsonProperty("dimension") String dimension,
@JsonProperty("outputName") String outputName,
@JsonProperty("lookup") LookupExtractor lookupInput,
@JsonProperty("name") String name,
@JsonProperty("retainMissingValues") final boolean retainMissingValues,
@JsonProperty("replaceMissingWith") final String replaceMissingWith,
@JacksonInject LookupReferencesManager lookupReferencesManager,
)
Where it is expected that you either specify "name" or "lookup", the one you do not specify is null.
If you specify lookup, then said lookup is used when getExtractionFn() is called. If you specify name, then said name is looked up in the manager when getExtractionFn() is called.
Structuring it this way should avoid issues with the lookup not existing on the router and completely eliminate the need for the LookupDelegator
|
@drcrallen can you please check this one more time ? i have changed the way to create lookups as @cheddar suggested. |
There was a problem hiding this comment.
Can this just use FunctionalExtraction to do these checks?
|
I left a few tidying up comments, once addressed I'm 👍 |
There was a problem hiding this comment.
Should include parameters for the query time parameters.
|
oh that's a weird failure: |
There was a problem hiding this comment.
this is an incompatible change for configs that had relied on defaulting to false. Can you explain more on why this won't impact configs that were relying on that behavior?
There was a problem hiding this comment.
@drcrallen lookups is an experimental feature, so changes like that are expected to happen.
I have set this to true, after had been tested.
There was a problem hiding this comment.
LookupExtractor is not listed as experimental, and neither is the "optimize" flag (as far as I can tell).
There was a problem hiding this comment.
Looks like it was added in 032d3bf which is in 0.9.0 As such it can change just fine, but the default for an experimental feature should be legacy behavior.
There was a problem hiding this comment.
well... it can be changed in 0.9.0 (which is not yet released). Does the default behavior need to change before we release 0.9.0?
|
@drcrallen please check the new changes ! |
c9d4d96 to
ee904ff
Compare
|
@drcrallen more comments ? |
|
@b-slim I'm still not sure about #2291 (diff) and if changing the default from false to true is going to be problematic. I'd like a second opinion from one of the other committers (maybe @cheddar since he's already in this PR) and will stand by whichever way they decide. 👍 after second opinion. |
|
@drcrallen if [https://github.com//pull/2291#discussion-diff-52407458R74] is a blocker i will reverted and send a separate PR to make it true by default. What do you think ? |
|
Fwiw, I think that given that the lookups are still pretty experimental, the change from the default to optimize doesn't seem so bad to me. The worst risk is that the optimization is broken and it breaks people who move up to this version. Hopefully their tests will show that it is broken and we will have a chance to fix before it's too horrible. So, given the risks and the fact that this is experimental, I'm fine with the change. |
|
cool, 👍 then |
ee904ff to
4e119b7
Compare
|
@drcrallen and @cheddar thanks for the review i will merge after build pass.! |
Promoting LookupExtractor state and LookupExtractorFactory to be a first class druid state object.
Currently druid doesn't have any reference manager to register or delete
LookupExtractorobjects.Also Currently the only way to use
Lookupextraction type user has to wrap it around anExtractionFn, this is very verbose and make optimization very painful (Lookup exposes unapply and extraction function does not).This PR:
1 - Introduces a
LookupExtractorFactoryinstance manager calledLookupReferencesManagerallowing basic operations to register/un-register/listAll or removeLookupExtractorFactoryinstances.2 - Provides an implementation of
LookupExtractorthat delegates the lookup functionality to a registered lookup. This implementation is set to be by default, so any query that comes with actual namespace it will try to use theLookupReferencesManager3 - Defines a new way to use Lookup directly via an implementation of
DimensionSpeccalledLookupDimensionSpec4 -
LookupExtractorFactorywill manage the lifecycle and the state of LookupExtractor.5 - Adds to
LookupExtractorthe propertyisOneToOneto enable optimization at the broker level.6 - Does not introduce any performance changes.
FYI: We decided to move away from
LookupExtractionFnin favor of implementingDimensionSpecand use the lookup delegator to do the apply/unapply.This has couple of advantages, lookups become less verbose, and optimaztion more easy to check for.
The fact that
LookupExtractorexposes methods that are not included at theExtractionFnAPI it doesn't make sense to use lookups viaExtractionFNAPI.Here is a overview of the overall roadmap of QTL development