Skip to content

Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.#13247

Merged
gianm merged 11 commits intoapache:masterfrom
gianm:harmonize-ds-sketch-fns
Nov 3, 2022
Merged

Always return sketches from DS_HLL, DS_THETA, DS_QUANTILES_SKETCH.#13247
gianm merged 11 commits intoapache:masterfrom
gianm:harmonize-ds-sketch-fns

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Oct 20, 2022

These aggregation functions are documented as creating sketches. However, they are planned into native aggregators that include finalization logic to convert the sketch to a number of some sort. This creates an inconsistency: the functions sometimes return sketches, and sometimes return numbers, depending on where they lie in the native query plan.

This patch changes these SQL aggregators to never finalize, by using the "shouldFinalize" feature of the native aggregators. It already existed for theta sketches. This patch adds the feature for hll and quantiles sketches. [Side note: an alternate approach would have been to disable finalization entirely for the queries planned by SQL, and in cases where we want it, inject an explicit finalizing post-aggregator. I didn't do this, because it would have been a bigger change and I wanted to focus on fixing the behavior of these three functions first.]

As to impact, Druid finalizes aggregators in two cases:

  • When they appear in the outer level of a query (not a subquery).
  • When they are used as input to an expression or finalizing-field-access post-aggregator (not any other kind of post-aggregator).

With this patch, the functions will no longer be finalized in these cases.

The second item is not likely to matter much. The SQL functions all declare return type OTHER, which would not be usable as an input to any other function that makes sense and that would be planned into an expression.

So, the main effect of this patch is the first item. To provide backwards compatibility with anyone that was depending on the old behavior, the patch adds a "sqlFinalizeOuterSketches" query context parameter that restores the old behavior.

Other changes:

  1. Move various argument-checking logic from runtime to planning time in
    DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.

  2. Add various JsonIgnores to the sketches to simplify their JSON representations.

  3. Allow chaining of ExpressionPostAggregators and other PostAggregators
    in the SQL layer.

  4. Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
    now that expressions can operate on complex inputs.

  5. Adjust return type to thetaSketch (instead of OTHER) in
    ThetaSketchSetBaseOperatorConversion.

These aggregation functions are documented as creating sketches. However,
they are planned into native aggregators that include finalization logic
to convert the sketch to a number of some sort. This creates an
inconsistency: the functions sometimes return sketches, and sometimes
return numbers, depending on where they lie in the native query plan.

This patch changes these SQL aggregators to _never_ finalize, by using
the "shouldFinalize" feature of the native aggregators. It already
existed for theta sketches. This patch adds the feature for hll and
quantiles sketches.

As to impact, Druid finalizes aggregators in two cases:

- When they appear in the outer level of a query (not a subquery).
- When they are used as input to an expression or finalizing-field-access
  post-aggregator (not any other kind of post-aggregator).

With this patch, the functions will no longer be finalized in these cases.

The second item is not likely to matter much. The SQL functions all declare
return type OTHER, which would be usable as an input to any other function
that makes sense and that would be planned into an expression.

So, the main effect of this patch is the first item. To provide backwards
compatibility with anyone that was depending on the old behavior, the
patch adds a "sqlFinalizeOuterSketches" query context parameter that
restores the old behavior.

Other changes:

1) Move various argument-checking logic from runtime to planning time in
   DoublesSketchListArgBaseOperatorConversion, by adding an OperandTypeChecker.

2) Add various JsonIgnores to the sketches to simplify their JSON representations.

3) Allow chaining of ExpressionPostAggregators and other PostAggregators
   in the SQL layer.

4) Avoid unnecessary FieldAccessPostAggregator wrapping in the SQL layer,
   now that expressions can operate on complex inputs.

5) Adjust return type to thetaSketch (instead of OTHER) in
   ThetaSketchSetBaseOperatorConversion.
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Oct 20, 2022

Marked as incompatible due to the behavioral change. There is the sqlFinalizeOuterSketches parameter that people can use to restore the old behavior.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🤘

I think the previous behavior was confusing and inconsistent so this seems like a large improvement that more than makes up for the incompatibility.

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Oct 21, 2022

This pull request introduces 1 alert when merging 5de1c21 into 86e6e61 - view on LGTM.com

new alerts:

  • 1 for Inconsistent equals and hashCode

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Oct 24, 2022

This pull request introduces 1 alert when merging dac1662 into d98c808 - view on LGTM.com

new alerts:

  • 1 for Inconsistent equals and hashCode

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Oct 25, 2022

This pull request introduces 1 alert when merging 07af49b into 1e39bc6 - view on LGTM.com

new alerts:

  • 1 for Inconsistent equals and hashCode

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Oct 28, 2022

This pull request introduces 1 alert when merging dcbe0a3 into 3202024 - view on LGTM.com

new alerts:

  • 1 for Inconsistent equals and hashCode

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Nov 1, 2022

This pull request introduces 1 alert when merging 64e2eb0 into 176934e - view on LGTM.com

new alerts:

  • 1 for Inconsistent equals and hashCode

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Nov 3, 2022

This pull request introduces 1 alert when merging 7cad838 into 176934e - view on LGTM.com

new alerts:

  • 1 for Inconsistent equals and hashCode

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants