Motivation
We are looking forward to using Druid's scanQuery to do the order by operation, but so far, Druid has no corresponding version support.
To use orderby, we can only use groupbyquery / topnquery.
However, We are not satisfied with the performance of groupByQuery. Using topn to implement, the performance will be better, but there are limitations. It can only be one-dimensional.
We support the orderby algorithm of scanquery.
Mainly to modify the scanQueryEngine,ScanQueryRunnerFactory class,
I don't know if anyone has tried this before. What problems will encounter?
Step 1: extract the column to be sorted and the corresponding offset, and store the column to be sorted and the offset value of topn in the MultiColumnSorter. The column to be sorted is the key and the offset is the value
Step 2: extract relevant rows according to the offset of MultiColumnSorter and generate the final topn return.
Motivation
We are looking forward to using Druid's scanQuery to do the order by operation, but so far, Druid has no corresponding version support.
To use orderby, we can only use groupbyquery / topnquery.
However, We are not satisfied with the performance of groupByQuery. Using topn to implement, the performance will be better, but there are limitations. It can only be one-dimensional.
We support the orderby algorithm of scanquery.
Mainly to modify the scanQueryEngine,ScanQueryRunnerFactory class,
I don't know if anyone has tried this before. What problems will encounter?
Step 1: extract the column to be sorted and the corresponding offset, and store the column to be sorted and the offset value of topn in the MultiColumnSorter. The column to be sorted is the key and the offset is the value
Step 2: extract relevant rows according to the offset of MultiColumnSorter and generate the final topn return.