Affected Version
0.17.0
Description
The following query should be returning materialized sketches in the form of base64 encoded strings such that the results can be exported, forklifted and later on used in other contexts.
The Druid documentation states that the following three functions would return sketches, but if they are used at top-level as in the following example, they either return counts or unique counts but no sketches:
SELECT
DS_HLL(countryName),
DS_THETA(countryName),
DS_QUANTILES_SKETCH(countryName)
FROM wikipedia
This yields
105, 105.0, 2383
The documentation states:
DS_HLL(expr, [lgK, tgtHllType]) | Creates an HLL sketch on the values of expr
DS_THETA(expr, [size]) | Creates a Theta sketch on the values of expr
DS_QUANTILES_SKETCH(expr, [k]) | Creates a Quantiles sketch on the values of expr
INTERESTINGLY, the output of the above functions also has a co-dependence on whether there are further projections in the query:
The following query produces outputs that are numbers:
SELECT
DS_QUANTILES_SKETCH(countryName, 8),
DS_QUANTILES_SKETCH(countryName, 64),
DS_QUANTILES_SKETCH(countryName),
SUM(sum_added)
FROM wikipedia
|| col1 || col2 || col3 || col4 ||
| 2383 | 2383 | 2383 | 11774265 |
But when adding further projections, the output of these first three projections now changes:
SELECT
DS_QUANTILES_SKETCH(countryName, 8),
DS_QUANTILES_SKETCH(countryName, 64),
DS_QUANTILES_SKETCH(countryName),
DS_GET_QUANTILES(DS_QUANTILES_SKETCH(countryName), 0.1, 0.3, 0.5, 0.7, 0.9),
SUM(sum_added)
FROM wikipedia
now the output is:
|| col1 || col2 || col3 || col4 || col5 ||
| AgMIGggAA... | AgMIGggAA... | AgMIGggAA... | [0.0,0.0,0.0,0.0,0.0] | 11774265 |
So the output of the first expression "DS_QUANTILES_SKETCH(countryName, 8)" has changed just because another column expression got added to the query
Affected Version
0.17.0
Description
The following query should be returning materialized sketches in the form of base64 encoded strings such that the results can be exported, forklifted and later on used in other contexts.
The Druid documentation states that the following three functions would return sketches, but if they are used at top-level as in the following example, they either return counts or unique counts but no sketches:
This yields
105, 105.0, 2383
The documentation states:
INTERESTINGLY, the output of the above functions also has a co-dependence on whether there are further projections in the query:
The following query produces outputs that are numbers:
|| col1 || col2 || col3 || col4 ||
| 2383 | 2383 | 2383 | 11774265 |
But when adding further projections, the output of these first three projections now changes:
now the output is:
|| col1 || col2 || col3 || col4 || col5 ||
| AgMIGggAA... | AgMIGggAA... | AgMIGggAA... | [0.0,0.0,0.0,0.0,0.0] | 11774265 |
So the output of the first expression "DS_QUANTILES_SKETCH(countryName, 8)" has changed just because another column expression got added to the query