Skip to content

SQL: top-level sketch functions do not return sketches #9419

@sascha-coenen

Description

@sascha-coenen

Affected Version

0.17.0

Description

The following query should be returning materialized sketches in the form of base64 encoded strings such that the results can be exported, forklifted and later on used in other contexts.
The Druid documentation states that the following three functions would return sketches, but if they are used at top-level as in the following example, they either return counts or unique counts but no sketches:

SELECT
	DS_HLL(countryName),
	DS_THETA(countryName),
	DS_QUANTILES_SKETCH(countryName)
FROM wikipedia

This yields
105, 105.0, 2383

The documentation states:

DS_HLL(expr, [lgK, tgtHllType]) | Creates an HLL sketch on the values of expr

DS_THETA(expr, [size]) | Creates a Theta sketch on the values of expr

DS_QUANTILES_SKETCH(expr, [k]) | Creates a Quantiles sketch on the values of expr

INTERESTINGLY, the output of the above functions also has a co-dependence on whether there are further projections in the query:

The following query produces outputs that are numbers:

SELECT
	DS_QUANTILES_SKETCH(countryName, 8),
	DS_QUANTILES_SKETCH(countryName, 64),
	DS_QUANTILES_SKETCH(countryName),
	SUM(sum_added)
FROM wikipedia

|| col1 || col2 || col3 || col4 ||
| 2383 | 2383 | 2383 | 11774265 |

But when adding further projections, the output of these first three projections now changes:

SELECT
	DS_QUANTILES_SKETCH(countryName, 8),
	DS_QUANTILES_SKETCH(countryName, 64),
	DS_QUANTILES_SKETCH(countryName),
	DS_GET_QUANTILES(DS_QUANTILES_SKETCH(countryName), 0.1, 0.3, 0.5, 0.7, 0.9),
	SUM(sum_added)
FROM wikipedia

now the output is:

|| col1 || col2 || col3 || col4 || col5 ||
| AgMIGggAA... | AgMIGggAA... | AgMIGggAA... | [0.0,0.0,0.0,0.0,0.0] | 11774265 |

So the output of the first expression "DS_QUANTILES_SKETCH(countryName, 8)" has changed just because another column expression got added to the query

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions