[QTL] Support multiple lookup maps within one namespace by sirpkt · Pull Request #2524 · apache/druid

sirpkt · 2016-02-23T08:34:47Z

This PR is related with #2523

For URI or JDBC source with multiple columns, user can define all the needed (key column, value column) mappings within one namespace configuration.
Impact on existing namespace implementations(like druid-Kafka-extraction-namespace) is minimized. Just change implements ExtractionNamespaceFunctionFactory to extends ExtractionNamespaceFunctionFactory.
Now, NamespaceExtractor has one more parameter mapName, which indicates lookup map name in the given namespace, however, it works as before without that parameter for backward-compatibility.

fjy · 2016-02-23T17:43:50Z

b-slim · 2016-02-23T20:44:29Z

@sirpkt i am not sure how this can work with the actual LookupDimensionSpec i am wondering how you will call it a query time.

sirpkt · 2016-02-24T05:41:45Z

As I replied in the issue page,
user can specify lookup map name within the namespace in lookup like

{
  "type":"namespace",
  "namepace":"DB1",
  "mapName":"CtoD"
}

I added unit test code for explicit Json usage of NamepacedExtraction.

fjy · 2016-02-24T21:12:04Z

@drcrallen @b-slim @cheddar can u guys review this and coordinate over development?

fjy · 2016-03-28T22:32:07Z

@drcrallen @b-slim

drcrallen · 2016-03-28T23:51:54Z

This violates offheap caching

drcrallen · 2016-04-12T17:32:17Z

@sirpkt I think this one needs more discussion among the community to make sure it fits overall expectations. As such I'm proposing punting it out of 0.9.1.

0.9.1 is slated for a major overhaul of Lookups to essentially be the first (hopefully) production-ready version for lookups.

This is a (important) feature add for lookups, but is outside the scope of "required for MVP"

fjy · 2016-06-15T23:04:56Z

@drcrallen @b-slim what's going on with this PR?

b-slim · 2016-06-16T00:55:34Z

Same opinion as @drcrallen the feature needs more discussion, plus major changes to be compatible with new lookups impls. In addition we have a pretty busy roadmap and i guess this feature is not a top priority IMHO. The author can always start working on make it working with new lookup module.

sirpkt · 2016-06-16T02:49:02Z

I'll try to make this working with new lookup module.
I think it will take some time because lookup is changed considerably.

fjy · 2016-08-26T22:47:42Z

@sirpkt @b-slim @drcrallen can we submit an issue or a proposal for the list of changes described in this PR and discuss changes there?

sirpkt · 2016-09-01T04:34:16Z

I added mapName field in LookupDimensionSpec and RegisteredLookupExtractionFn
and updated docs to reflect the change of Globally Cached Lookups.
I also modified the description about Lookup Extraction Function
because LookupExtractor no longer supports "namespace" type.

jon-wei

I had some comments, but I'm generally on board with the goal and approach taken by this PR.

jon-wei · 2016-10-03T18:39:16Z

let's rename this to something like getCacheInnerMap() to differentiate the two functions, and note in the javadocs that this retrieves an inner map

jon-wei · 2016-10-03T18:40:20Z

For thought, would it be easier/better to use a MultiKey for the composite namespace:ID key?

replaced by MultiKey

jon-wei · 2016-10-03T18:43:49Z

Suggest adding a bit more documentation detail along the lines of:

Key/Value column refer to columns within the lookup source; "columns" field refers to Druid columns whose values will be used as filtering criteria for retrieving the mapping row from the lookup source

jon-wei · 2016-10-03T18:44:15Z

typo: "kayValueMaps" -> "keyValueMaps"

fixed as "maps"

jon-wei · 2016-10-03T18:44:30Z

typo: "kayValueMaps"

fixed as maps

jon-wei · 2016-10-03T18:47:44Z

Let's add a note in the docs about how KafkaLookupExtractor only uses the default mapname

jon-wei · 2016-10-03T18:49:04Z

Can we add some javadocs explaining how this function differs from getMapCachePopulator()?

jon-wei · 2016-10-03T18:51:13Z

spelling: "swap" -> "swaps", "leave" -> "leaves"

jon-wei · 2016-10-03T18:52:25Z

Let's add javadocs for these two methods

jon-wei · 2016-10-03T18:53:12Z

Let's add a note on why the delete can be a no-op here (GC?)

gianm

Looks like a useful feature to reduce memory use and simplify management of lookups - although I have some concerns about the API. Specifically, we need to try to retain backwards compatibility.

gianm · 2016-10-03T23:55:09Z

I agree just maps is clearer.

gianm · 2016-10-03T23:55:32Z

Suggest:

keyName -> keyColumn

valueName -> valueColumn

(like the old configs)

gianm · 2016-10-03T23:57:23Z

We need to retain backwards compatibility.

Perhaps we should have a "default map" name that the keyColumn/valueColumn go into if you don't specify a maps list. And then that one also gets used at query time if you don't specify a mapName.

Ah, I see we already have this in DEFAULT_MAPNAME. Let's use that for this purpose.

gianm · 2016-10-04T00:00:55Z

Would prefer List<KeyValueMap> here, it's generally easier to work with.

gianm · 2016-10-04T00:03:05Z

Any reason not to use the auto generated IntelliJ style?

changed to use auto generated one

gianm · 2016-10-04T00:04:04Z

These should be escaped; field names can have funny characters in them.

gianm · 2016-10-04T00:07:04Z

__default is more consistent with defaults in other Druid areas.

gianm

I didn't totally review the cache manager code or jdbc namespace yet. But I looked at the main apis, http stuff, parser stuff, and uri namespace code so far. Will look at the rest soon but I just wanted to get at least this part of the review out.

The biggest question for me at this point is, does it make sense to move "maps" into the parse spec? I think it does but would appreciate a second opinion.

gianm · 2016-10-18T02:55:13Z

missing "

gianm · 2016-10-18T02:59:44Z

keyValueMap is maps now

gianm · 2016-10-18T03:01:58Z

Suggest using __default in these examples as it is the actual default map name.

Please also document this as the default map name.

gianm · 2016-10-18T03:18:34Z

Hmm I wonder about backwards compatibility here. Will take a closer look at the actual http code.

gianm · 2016-10-18T03:19:59Z

mapName (spelling)

gianm · 2016-10-19T03:16:23Z

Is this necessary? I understand having a default mapName, but it seems strange to have a default key/value name (especially undocumented).

gianm · 2016-10-19T03:49:59Z

Changing String to Object means this is no longer a "flat" data parser! Maybe that's okay, but if it is okay, the name should definitely change.

gianm · 2016-10-19T04:11:26Z

When reading through the URIExtractionNamespace changes I now wonder if having the "maps" with their keyColumn / valueColumn out here is causing the dizziness and weirdness with simpleJson… because it has no keyColumn!

I wonder if it makes more sense to move "maps" into the namespaceParseSpec. That way the parser is in charge of what the map names and k/vs it returns are, and that should remove some of the weirdness in URIExtractionNamespace.

that seems like a reasonable change, it would express more directly that simpleJson parser doesn't use the "maps" field unlike the other parser types

I suppose the logic for map building from a set of KeyValueMaps in the URIExtractionNamespace's delegate parser could be moved to something shared by the CSV/TSV/customJson "FlatDataParsers" in URIExtractionNamespace

gianm · 2016-10-19T04:11:49Z

See comment above… I wonder if it'd be less dizzy to move "maps" into the namespaceParseSpec.

gianm · 2016-10-19T04:14:51Z

I don't think this is enough escaping. There could be backslashes and quotes and stuff in the field names. Maybe. Does JDBC/JDBI have a utility function to help with escaping?

EIther that, or let's check the requiredFields against a whitelist of characters.

gianm · 2016-10-19T04:28:08Z

@drcrallen @b-slim any thoughts on the general idea & API here?

Broadly: looks like the changes are all centered around having more than one lookup map per thing-we-load. So we may load a single json file that has many logical lookups in it. IMO the nice thing about doing it this way is we only have to poll and parse the file one time. It's also easier to configure loading multiple lookups from one file. I'm on board with the general idea and attempting to work out whether the API needs adjustments or not.

b-slim · 2016-10-19T15:58:10Z

why is this changing the API ? not all the lookups will have a map name ?

b-slim · 2016-10-19T16:12:05Z

I like the idea on minimizing the amount of fetch that a lookup had to make but the current API change make it backward incompatible plus it is unclear what mapName really mean. I would highly recommend to work on that by for instance use a name spacing convention like name.subname where the name will be used to match on the registered lookup then subname as equivalent to mapName.

sirpkt · 2016-11-24T05:57:11Z

Sorry for late response.
I updated the code based on the review comments.

@b-slim I don't understand your point about backward compatibility because mapName is optional argument and users always make LookupDimensionSpec with json so that they just omit
mapName in their spec when their lookups do not have multiple maps.
And I still think having separate mapName is better than combining name and mapName because user may use combining delimiter (ex. .) in name or mapName.
However, it makes sense that mapName is unclear so I changed it to innerMapName. Welcome any suggestion.

@gianm For escaping column and table names at SQL query creation, I use escape and quote methods of SQLTemplate in Querydsl. As I'm not familiar with SQL querying in Java, I'm not sure that this make sense.

Other updates:

KeyValueMap is moved to namespaceParseSpec from URIExtractionNamespace.
Dependency on commons-collections is removed by using Pair
FlatDataParser is refactored
NamespaceLookupIntrospectHandler is modified as suggested by @gianm

fjy · 2016-12-09T23:37:45Z

@b-slim @gianm can we finish this up?

gianm · 2017-02-28T18:04:43Z

Moving to 0.10.1 as review is not complete. @sirpkt please let us know if you're still interested and we will endeavor to take another look.

stale · 2019-02-28T07:14:59Z

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

stale · 2019-03-07T08:09:34Z

This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

* Refactoring Appendertor Driver (apache#4292) * Rename FiniteAppenderatorDriver to AppenderatorDriver (apache#4356) * Add totalRowCount to appenderator * add localhost as advertised hostname (apache#4689) * kafkaIndexTask unannounce service in final block (apache#4736) * warn if topic not found (apache#4834) * Kafka: Fixes needlessly low interpretation of maxRowsInMemory. (apache#5034)

fjy added this to the 0.9.1 milestone Feb 23, 2016

fjy changed the title ~~Support multiple lookup maps within one namespace~~ [QTL] Support multiple lookup maps within one namespace Feb 24, 2016

sirpkt mentioned this pull request Mar 24, 2016

[QTL] Implement LookupExtractorFactory of namespaced lookup #2716

Closed

drcrallen reviewed Mar 28, 2016
View reviewed changes

drcrallen modified the milestones: 0.9.2, 0.9.1 Apr 12, 2016

drcrallen added the Discuss label Apr 27, 2016

drcrallen mentioned this pull request Apr 27, 2016

[QTL] Immediate future plans #2889

Closed

drcrallen mentioned this pull request May 5, 2016

[QTL] Implement LookupExtractorFactory of namespaced lookup #2926

Merged

fjy modified the milestones: 0.9.3, 0.9.2 Jun 16, 2016

sirpkt force-pushed the multi-column-lookup branch 2 times, most recently from 7258abb to 254fe3d Compare September 1, 2016 04:29

sirpkt force-pushed the multi-column-lookup branch from 254fe3d to 79ef586 Compare September 21, 2016 05:27

jon-wei requested changes Oct 3, 2016

View reviewed changes

gianm reviewed Oct 4, 2016

View reviewed changes

sirpkt force-pushed the multi-column-lookup branch from 79ef586 to 739290a Compare October 10, 2016 01:54

gianm reviewed Oct 19, 2016

View reviewed changes

b-slim reviewed Oct 19, 2016

View reviewed changes

gianm assigned fjy and gianm Nov 22, 2016

sirpkt added 3 commits November 24, 2016 13:35

rebased

eddc15f

reflect review comments

65364fe

test error fix

dceb5f8

sirpkt force-pushed the multi-column-lookup branch from 739290a to dceb5f8 Compare November 24, 2016 05:37

fjy assigned b-slim and unassigned fjy Dec 19, 2016

gianm modified the milestones: 0.10.1, 0.10.0 Feb 28, 2017

gianm removed this from the 0.10.1 milestone May 16, 2017

clambertus unassigned gianm and b-slim Jul 6, 2018

stale Bot added the stale label Feb 28, 2019

stale Bot closed this Mar 7, 2019

Conversation

sirpkt commented Feb 23, 2016

Uh oh!

fjy commented Feb 23, 2016

Uh oh!

b-slim commented Feb 23, 2016

Uh oh!

sirpkt commented Feb 24, 2016

Uh oh!

fjy commented Feb 24, 2016

Uh oh!

fjy commented Mar 28, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drcrallen commented Apr 12, 2016

Uh oh!

fjy commented Jun 15, 2016

Uh oh!

b-slim commented Jun 16, 2016

Uh oh!

sirpkt commented Jun 16, 2016

Uh oh!

fjy commented Aug 26, 2016

Uh oh!

sirpkt commented Sep 1, 2016

Uh oh!

jon-wei left a comment

Choose a reason for hiding this comment

Uh oh!

jon-wei Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm left a comment

Choose a reason for hiding this comment

Uh oh!

gianm Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm Oct 3, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

jon-wei Oct 3, 2016 •

edited

Loading

gianm Oct 3, 2016 •

edited

Loading

gianm Oct 3, 2016 •

edited

Loading

gianm Oct 3, 2016 •

edited

Loading

gianm Oct 4, 2016 •

edited

Loading