Skip to content

Add round support for DS-HLL#8023

Merged
gianm merged 4 commits intoapache:masterfrom
ccaominh:feature-446-add-round-to-ds-hll
Jul 5, 2019
Merged

Add round support for DS-HLL#8023
gianm merged 4 commits intoapache:masterfrom
ccaominh:feature-446-add-round-to-ds-hll

Conversation

@ccaominh
Copy link
Copy Markdown
Contributor

@ccaominh ccaominh commented Jul 4, 2019

Description

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added unit tests or modified existing tests to cover new code paths.

For reviewers: the key changed/added classes in this PR are HllSketchAggregatorFactory, HllSketchBuildAggregatorFactory, HllSketchMergeAggregatorFactory, and HllSketchAggregatorTest.

ccaominh added 2 commits July 3, 2019 18:08
Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jul 4, 2019

The similar patch for hyperUnique and cardinality aggregators is here: #4720

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jul 5, 2019

The test failure looks legitimate. It happened while running in standards-compliant null handling mode, which I suppose changed the results a bit (2 rows came back instead of 1).

Failed tests: 
org.apache.druid.query.aggregation.datasketches.hll.sql.HllSketchSqlAggregatorTest.testApproxCountDistinctHllSketchIsRounded(org.apache.druid.query.aggregation.datasketches.hll.sql.HllSketchSqlAggregatorTest)
  Run 1: HllSketchSqlAggregatorTest.testApproxCountDistinctHllSketchIsRounded:419 expected:<1> but was:<2>
  Run 2: HllSketchSqlAggregatorTest.testApproxCountDistinctHllSketchIsRounded:419 expected:<1> but was:<2>
  Run 3: HllSketchSqlAggregatorTest.testApproxCountDistinctHllSketchIsRounded:419 expected:<1> but was:<2>
  Run 4: HllSketchSqlAggregatorTest.testApproxCountDistinctHllSketchIsRounded:419 expected:<1> but was:<2>

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jul 5, 2019

By the way, the mode is controlled by a system property. You can set -Ddruid.generic.useDefaultValueForNull=false locally to get the same behavior in your IDE or local Maven runs.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM after the adjustment to the tests to handle the other null handling mode.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks @ccaominh!

@gianm gianm added this to the 0.16.0 milestone Jul 5, 2019
@gianm gianm merged commit 0ded0ce into apache:master Jul 5, 2019
gianm pushed a commit to implydata/druid-public that referenced this pull request Jul 5, 2019
* Add round support for DS-HLL

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.

* Fix checkstyle errors

* Change HllSketchSqlAggregator to do rounding

* Fix test for standard-compliant null handling mode
@ccaominh ccaominh deleted the feature-446-add-round-to-ds-hll branch July 5, 2019 23:18
gianm pushed a commit to implydata/druid-public that referenced this pull request Jul 9, 2019
* Add round support for DS-HLL

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.

* Fix checkstyle errors

* Change HllSketchSqlAggregator to do rounding

* Fix test for standard-compliant null handling mode
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants