Currently, Druid supports value transformation of only one dimension because DimensionSpec accepts single input dimension.
I want to extend DimensionSpec to accept inputs from multiple dimensions especially in order to enable lookup based join with a small table on multiple columns as mentioned https://groups.google.com/forum/#!searchin/druid-development/multiple/druid-development/OUnXaRgEZG0/PnOYxmgpBwAJ.
This is different from #2374 which targets generating multiple output values from one input value.
As DimensionSpec is not limited for lookup, there can be other benefits from multi-dimensional support.
For example, with #2292, query time arithmetic calculation with multiple dimensions will be possible, which is currently supported only for post aggregated metric values.
Approach to support multiple dimensions that I think is as follows:
- changing
DimensionSpec to accept list of dimensions
- enabling
StorageAdapters to provide dimensionSelector for multi-dimensional DimensionSpec
I think makeDimensionSelectorUndecorated() is target method to change and
IncrementalIndex, IncrementalIndexStorageAdapter, and QueryableIndexStorageAdapter are target classes to change.
- forcing Cursor-based processing for queries with multi-dimensional DimensionSpec
Currently, only search query uses bitmap processing. Just changing to use cursor-based processing when multi-dimensional spec is used.
- enabling
ExtractionFn to accept list of String as input
ExtractionFn of multi-dimensional inputs implements already existing interface apply(Object) with check routine that Object is List<String>.
And I'm thinking about generalization of lookup to support not only String to String mapping but also List<String> to String mapping.
My approach to support that is as follows:
- changing lookup map from
Map<String, String> to Map<Object, String>
I think this is the way to minimize the side effect on existing code and easily supporting multi-dimensional key
- multi-dimensional lookup uses
Map<MultiKey, String>
I'm try to use org.apache.commons.collections.keyvalue.MultiKey to represent multi-dimensional lookup key
Currently, Druid supports value transformation of only one dimension because DimensionSpec accepts single input dimension.
I want to extend DimensionSpec to accept inputs from multiple dimensions especially in order to enable lookup based join with a small table on multiple columns as mentioned https://groups.google.com/forum/#!searchin/druid-development/multiple/druid-development/OUnXaRgEZG0/PnOYxmgpBwAJ.
This is different from #2374 which targets generating multiple output values from one input value.
As DimensionSpec is not limited for lookup, there can be other benefits from multi-dimensional support.
For example, with #2292, query time arithmetic calculation with multiple dimensions will be possible, which is currently supported only for post aggregated metric values.
Approach to support multiple dimensions that I think is as follows:
DimensionSpecto accept list of dimensionsStorageAdapters to provide dimensionSelector for multi-dimensional DimensionSpecI think
makeDimensionSelectorUndecorated()is target method to change andIncrementalIndex,IncrementalIndexStorageAdapter, andQueryableIndexStorageAdapterare target classes to change.Currently, only search query uses bitmap processing. Just changing to use cursor-based processing when multi-dimensional spec is used.
ExtractionFnto accept list of String as inputExtractionFnof multi-dimensional inputs implements already existing interface apply(Object) with check routine that Object isList<String>.And I'm thinking about generalization of lookup to support not only
StringtoStringmapping but alsoList<String>toStringmapping.My approach to support that is as follows:
Map<String, String>toMap<Object, String>I think this is the way to minimize the side effect on existing code and easily supporting multi-dimensional key
Map<MultiKey, String>I'm try to use
org.apache.commons.collections.keyvalue.MultiKeyto represent multi-dimensional lookup key