[QTL] Support multiple lookup maps within one namespace#2524
[QTL] Support multiple lookup maps within one namespace#2524sirpkt wants to merge 3 commits intoapache:masterfrom
Conversation
|
@sirpkt awesome! |
|
@sirpkt i am not sure how this can work with the actual LookupDimensionSpec i am wondering how you will call it a query time. |
|
As I replied in the issue page, I added unit test code for explicit Json usage of NamepacedExtraction. |
|
@drcrallen @b-slim @cheddar can u guys review this and coordinate over development? |
There was a problem hiding this comment.
This violates offheap caching
|
@sirpkt I think this one needs more discussion among the community to make sure it fits overall expectations. As such I'm proposing punting it out of 0.9.1. 0.9.1 is slated for a major overhaul of Lookups to essentially be the first (hopefully) production-ready version for lookups. This is a (important) feature add for lookups, but is outside the scope of "required for MVP" |
|
@drcrallen @b-slim what's going on with this PR? |
|
Same opinion as @drcrallen the feature needs more discussion, plus major changes to be compatible with new lookups impls. In addition we have a pretty busy roadmap and i guess this feature is not a top priority IMHO. The author can always start working on make it working with new lookup module. |
|
I'll try to make this working with new lookup module. |
|
@sirpkt @b-slim @drcrallen can we submit an issue or a proposal for the list of changes described in this PR and discuss changes there? |
7258abb to
254fe3d
Compare
|
I added |
254fe3d to
79ef586
Compare
jon-wei
left a comment
There was a problem hiding this comment.
I had some comments, but I'm generally on board with the goal and approach taken by this PR.
There was a problem hiding this comment.
let's rename this to something like getCacheInnerMap() to differentiate the two functions, and note in the javadocs that this retrieves an inner map
There was a problem hiding this comment.
For thought, would it be easier/better to use a MultiKey for the composite namespace:ID key?
There was a problem hiding this comment.
replaced by MultiKey
There was a problem hiding this comment.
Suggest adding a bit more documentation detail along the lines of:
- Key/Value column refer to columns within the lookup source; "columns" field refers to Druid columns whose values will be used as filtering criteria for retrieving the mapping row from the lookup source
There was a problem hiding this comment.
typo: "kayValueMaps" -> "keyValueMaps"
There was a problem hiding this comment.
Let's add a note in the docs about how KafkaLookupExtractor only uses the default mapname
There was a problem hiding this comment.
Can we add some javadocs explaining how this function differs from getMapCachePopulator()?
There was a problem hiding this comment.
spelling: "swap" -> "swaps", "leave" -> "leaves"
There was a problem hiding this comment.
Let's add javadocs for these two methods
There was a problem hiding this comment.
Let's add a note on why the delete can be a no-op here (GC?)
gianm
left a comment
There was a problem hiding this comment.
Looks like a useful feature to reduce memory use and simplify management of lookups - although I have some concerns about the API. Specifically, we need to try to retain backwards compatibility.
There was a problem hiding this comment.
I agree just maps is clearer.
There was a problem hiding this comment.
Suggest:
keyName->keyColumnvalueName->valueColumn
(like the old configs)
There was a problem hiding this comment.
We need to retain backwards compatibility.
There was a problem hiding this comment.
Perhaps we should have a "default map" name that the keyColumn/valueColumn go into if you don't specify a maps list. And then that one also gets used at query time if you don't specify a mapName.
There was a problem hiding this comment.
Ah, I see we already have this in DEFAULT_MAPNAME. Let's use that for this purpose.
There was a problem hiding this comment.
Would prefer List<KeyValueMap> here, it's generally easier to work with.
There was a problem hiding this comment.
Any reason not to use the auto generated IntelliJ style?
There was a problem hiding this comment.
changed to use auto generated one
There was a problem hiding this comment.
These should be escaped; field names can have funny characters in them.
There was a problem hiding this comment.
__default is more consistent with defaults in other Druid areas.
79ef586 to
739290a
Compare
gianm
left a comment
There was a problem hiding this comment.
I didn't totally review the cache manager code or jdbc namespace yet. But I looked at the main apis, http stuff, parser stuff, and uri namespace code so far. Will look at the rest soon but I just wanted to get at least this part of the review out.
The biggest question for me at this point is, does it make sense to move "maps" into the parse spec? I think it does but would appreciate a second opinion.
There was a problem hiding this comment.
Suggest using __default in these examples as it is the actual default map name.
There was a problem hiding this comment.
Please also document this as the default map name.
There was a problem hiding this comment.
Hmm I wonder about backwards compatibility here. Will take a closer look at the actual http code.
There was a problem hiding this comment.
Is this necessary? I understand having a default mapName, but it seems strange to have a default key/value name (especially undocumented).
There was a problem hiding this comment.
Changing String to Object means this is no longer a "flat" data parser! Maybe that's okay, but if it is okay, the name should definitely change.
There was a problem hiding this comment.
When reading through the URIExtractionNamespace changes I now wonder if having the "maps" with their keyColumn / valueColumn out here is causing the dizziness and weirdness with simpleJson… because it has no keyColumn!
I wonder if it makes more sense to move "maps" into the namespaceParseSpec. That way the parser is in charge of what the map names and k/vs it returns are, and that should remove some of the weirdness in URIExtractionNamespace.
There was a problem hiding this comment.
that seems like a reasonable change, it would express more directly that simpleJson parser doesn't use the "maps" field unlike the other parser types
I suppose the logic for map building from a set of KeyValueMaps in the URIExtractionNamespace's delegate parser could be moved to something shared by the CSV/TSV/customJson "FlatDataParsers" in URIExtractionNamespace
There was a problem hiding this comment.
See comment above… I wonder if it'd be less dizzy to move "maps" into the namespaceParseSpec.
There was a problem hiding this comment.
I don't think this is enough escaping. There could be backslashes and quotes and stuff in the field names. Maybe. Does JDBC/JDBI have a utility function to help with escaping?
EIther that, or let's check the requiredFields against a whitelist of characters.
|
@drcrallen @b-slim any thoughts on the general idea & API here? Broadly: looks like the changes are all centered around having more than one lookup map per thing-we-load. So we may load a single json file that has many logical lookups in it. IMO the nice thing about doing it this way is we only have to poll and parse the file one time. It's also easier to configure loading multiple lookups from one file. I'm on board with the general idea and attempting to work out whether the API needs adjustments or not. |
There was a problem hiding this comment.
why is this changing the API ? not all the lookups will have a map name ?
|
I like the idea on minimizing the amount of fetch that a lookup had to make but the current API change make it backward incompatible plus it is unclear what |
739290a to
dceb5f8
Compare
|
Sorry for late response. @b-slim I don't understand your point about backward compatibility because @gianm For escaping column and table names at SQL query creation, I use escape and quote methods of SQLTemplate in Querydsl. As I'm not familiar with SQL querying in Java, I'm not sure that this make sense. Other updates:
|
|
Moving to 0.10.1 as review is not complete. @sirpkt please let us know if you're still interested and we will endeavor to take another look. |
|
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@druid.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
* Refactoring Appendertor Driver (apache#4292) * Rename FiniteAppenderatorDriver to AppenderatorDriver (apache#4356) * Add totalRowCount to appenderator * add localhost as advertised hostname (apache#4689) * kafkaIndexTask unannounce service in final block (apache#4736) * warn if topic not found (apache#4834) * Kafka: Fixes needlessly low interpretation of maxRowsInMemory. (apache#5034)
This PR is related with #2523
implements ExtractionNamespaceFunctionFactorytoextends ExtractionNamespaceFunctionFactory.mapName, which indicates lookup map name in the given namespace, however, it works as before without that parameter for backward-compatibility.