Skip to content

Support multiple lookups within one namespace #2523

@sirpkt

Description

@sirpkt

Even with the same source (JDBC or CSV file), we should make separate configuration for each pair of different key, value column pairs.
For example, if a lookup DB table has four columns, A, B, C, and D, and I need three lookups A to B, C to D, and A to C, then, I should make three different namespaces, which have lots of redundant information like URI, poll period, ID and so on.

In my thought, this approach is not good for maintenance.
When source configuration is changed like password change and table/column name change, it is hard to check which namespaces are affected by that change and also tiresome to manually change all the related namespaces.

So, I think it is better to divide namespace to two level, namespace and lookup maps.
Namespace is data source level and lookup map is defined for each different (key, value) column or field pairs within the given data source.

For the first example case, changed configuration could be like followings

{
  "type":"jdbc",
  "namespace":"DB1",
  "connectorConfig":{
    "createTables":true,
    "connectURI":"jdbc:mysql://localhost:3306/druid",
    "user":"druid",
    "password":"diurd"
  },
  "table":"some_lookup_table",
  "lookup maps": [
      {"name": "AtoB",
       "key": "A",
       "value":"B"},
      {"name": "CtoD",
       "key": "C",
       "value":"D"},
      {"name": "AtoC",
       "key": "A",
       "value":"C"}
  ]
  "tsColumn":"timestamp_column",
  "pollPeriod":600000
}

And, NamespacedExtractor may have one more parameter that indicate lookup map name in the given namespace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions