[PROPOSAL] Add support for t-digest backed aggregators

### Motivation

TDigest (https://github.com/tdunning/t-digest) is a popular datastructure for accurate on-line accumulation of rank-based statistics such as quantiles and trimmed means. The datastructure is also designed for parallel programming use cases like distributed aggregations or map reduce jobs by making combining two intermediate t-digests easy and efficient. 

There are various other projects like Apache Mahout, streaming-lib and Elastic Search which have adopted T-Digest. It would be good to add T-Digest based aggregators in Druid as well. This would be complimentary to existing approximate sketch generation algorithms in Druid like moments and yahoo quantile sketches.


### Proposed changes

A new module called druid-tdigestsketch will be added in the the extension-contrib module. Proposal is to add following aggregators:
1) buildTDigestSketch - this aggregator will generate t-digest based sketches over numeric value. This generally would be used during the indexing phase where a pre-aggregated sketch over a metric's values will be created. This aggregator could also be used for generating sketches on the fly during query time itself.
2) mergeTDigestSketch - this aggregator will take care of combining existing t-digest based sketches. This aggregator will generally be used during query time to combine sketches generated during the indexing phase by buildTDigestSketch aggregator.
3) quantilesFromTDigestSketch - this post aggregator will take in an array of fractions, and generate quantiles on the t-digest sketches generated by the above two aggregators.

### Rationale
At my work, various data engineering teams have been using t-digest based sketch aggregations both in and outside of Druid. They have found it to be a good fit for their various use cases. 

### Operational impact

No operational impact.

### Test plan (optional)

There is enough literature out there that has tested out performance and correctness of t-digest. Other than unit tests, the plan would be verify on a dev Druid cluster that the results returned by this aggregator are similar to t-digest aggregation used in other frameworks like Spark, mapreduce, etc. 

### Future work (optional)
1) Add SQL support 
2) When a new version of t-digest library gets rolled out, and if the serialization format changes, it would be tricky to make the old and new versions interoperable. An option would be to write a new module every time the t-digest library is updated. Or we would need to devise a scheme of versioning aggregators. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PROPOSAL] Add support for t-digest backed aggregators #7303

Motivation

Proposed changes

Rationale

Operational impact

Test plan (optional)

Future work (optional)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[PROPOSAL] Add support for t-digest backed aggregators #7303

Description

Motivation

Proposed changes

Rationale

Operational impact

Test plan (optional)

Future work (optional)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions