Skip to content

[Proposal] Support multiple dimensions in Dimension Spec #2908

@sirpkt

Description

@sirpkt

Currently, Druid supports value transformation of only one dimension because DimensionSpec accepts single input dimension.
I want to extend DimensionSpec to accept inputs from multiple dimensions especially in order to enable lookup based join with a small table on multiple columns as mentioned https://groups.google.com/forum/#!searchin/druid-development/multiple/druid-development/OUnXaRgEZG0/PnOYxmgpBwAJ.
This is different from #2374 which targets generating multiple output values from one input value.

As DimensionSpec is not limited for lookup, there can be other benefits from multi-dimensional support.
For example, with #2292, query time arithmetic calculation with multiple dimensions will be possible, which is currently supported only for post aggregated metric values.

Approach to support multiple dimensions that I think is as follows:

  1. changing DimensionSpec to accept list of dimensions
  2. enabling StorageAdapters to provide dimensionSelector for multi-dimensional DimensionSpec
    I think makeDimensionSelectorUndecorated() is target method to change and
    IncrementalIndex, IncrementalIndexStorageAdapter, and QueryableIndexStorageAdapter are target classes to change.
  3. forcing Cursor-based processing for queries with multi-dimensional DimensionSpec
    Currently, only search query uses bitmap processing. Just changing to use cursor-based processing when multi-dimensional spec is used.
  4. enabling ExtractionFn to accept list of String as input
    ExtractionFn of multi-dimensional inputs implements already existing interface apply(Object) with check routine that Object is List<String>.

And I'm thinking about generalization of lookup to support not only String to String mapping but also List<String> to String mapping.
My approach to support that is as follows:

  1. changing lookup map from Map<String, String> to Map<Object, String>
    I think this is the way to minimize the side effect on existing code and easily supporting multi-dimensional key
  2. multi-dimensional lookup uses Map<MultiKey, String>
    I'm try to use org.apache.commons.collections.keyvalue.MultiKey to represent multi-dimensional lookup key

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions