apache · kfaraz · Dec 19, 2022 · Dec 17, 2022
diff --git a/docs/development/extensions-core/datasketches-hll.md b/docs/development/extensions-core/datasketches-hll.md
@@ -23,29 +23,33 @@ title: "DataSketches HLL Sketch module"
   -->
 
 
-This module provides Apache Druid aggregators for distinct counting based on HLL sketch from [Apache DataSketches](https://datasketches.apache.org/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row.
-You can use the HLL sketch aggregator on columns of any identifiers. It will return estimated cardinality of the column.
+This module provides Apache Druid aggregators for distinct counting based on HLL sketch from [Apache DataSketches](https://datasketches.apache.org/) library. At ingestion time, this aggregator creates the HLL sketch objects to store in Druid segments. By default, Druid reads and merges sketches at query time. The default result is
+the estimate of the number of distinct values presented to the sketch. You can also use post aggregators to produce a union of sketch columns in the same row.
+You can use the HLL sketch aggregator on any column to estimate its cardinality.
 
 To use this aggregator, make sure you [include](../../development/extensions.md#loading-extensions) the extension in your config file:
 
 ```
 druid.extensions.loadList=["druid-datasketches"]
 ```
 
-### Aggregators
+For additional sketch types supported in Druid, see [DataSketches extension](datasketches-extension.md).
 
-|property|description|required?|
+## Aggregators
+
+|Property|Description|Required?|
 |--------|-----------|---------|
-|`type`|This String should be [`HLLSketchBuild`](#hllsketchbuild-aggregator) or [`HLLSketchMerge`](#hllsketchmerge-aggregator)|yes|
-|`name`|A String for the output (result) name of the calculation.|yes|
-|`fieldName`|A String for the name of the input field.|yes|
+|`type`|Either [`HLLSketchBuild`](#hllsketchbuild-aggregator) or [`HLLSketchMerge`](#hllsketchmerge-aggregator).|yes|
+|`name`|String representing the output column to store sketch values.|yes|
+|`fieldName`|The name of the input field.|yes|
 |`lgK`|log2 of K that is the number of buckets in the sketch, parameter that controls the size and the accuracy. Must be between 4 and 21 inclusively.|no, defaults to `12`|
 |`tgtHllType`|The type of the target HLL sketch. Must be `HLL_4`, `HLL_6` or `HLL_8` |no, defaults to `HLL_4`|
 |`round`|Round off values to whole numbers. Only affects query-time behavior and is ignored at ingestion-time.|no, defaults to `false`|
+|`shouldFinalize`|Return the final double type representing the estimate rather than the intermediate sketch type itself. In addition to controlling the finalization of this aggregator, you can control whether all aggregators are finalized with the query context parameters [`finalize`](../../querying/query-context.md) and [`sqlFinalizeOuterSketches`](../../querying/sql-query-context.md).|no, defaults to `true`|
 
 > The default `lgK` value has proven to be sufficient for most use cases; expect only very negligible improvements in accuracy with `lgK` values over `16` in normal circumstances.
 
-#### HLLSketchBuild Aggregator
+### HLLSketchBuild aggregator
 
 ```
 {
@@ -76,7 +80,7 @@ When applied at query time on an existing dimension, you can use the resulting c
 > ```
 >
 
-#### HLLSketchMerge Aggregator
+### HLLSketchMerge aggregator
 
 ```
 {
@@ -91,9 +95,9 @@ When applied at query time on an existing dimension, you can use the resulting c
 
 You can use the `HLLSketchMerge` aggregator to ingest pre-generated sketches from an input dataset. For example, you can set up a batch processing job to generate the sketches before sending the data to Druid. You must serialize the sketches in the input dataset to Base64-encoded bytes. Then, specify `HLLSketchMerge` for the input column in the native ingestion `metricsSpec`.
 
-### Post Aggregators
+## Post aggregators
 
-#### Estimate
+### Estimate
 
 Returns the distinct count estimate as a double.
 
@@ -106,7 +110,7 @@ Returns the distinct count estimate as a double.
 }
 ```
 
-#### Estimate with bounds
+### Estimate with bounds
 
 Returns a distinct count estimate and error bounds from an HLL sketch.
 The result will be an array containing three double values: estimate, lower bound and upper bound.
@@ -122,7 +126,7 @@ This must be an integer value of 1, 2 or 3 corresponding to approximately 68.3%,
 }
 ```
 
-#### Union
+### Union
 
 ```
 {
@@ -134,7 +138,7 @@ This must be an integer value of 1, 2 or 3 corresponding to approximately 68.3%,
 }
 ```
 
-#### Sketch to string
+### Sketch to string
 
 Human-readable sketch summary for debugging.
 

diff --git a/docs/development/extensions-core/datasketches-kll.md b/docs/development/extensions-core/datasketches-kll.md
@@ -37,7 +37,9 @@ To use this aggregator, make sure you [include](../../development/extensions.md#
 druid.extensions.loadList=["druid-datasketches"]
 ```
 
-### Aggregator
+For additional sketch types supported in Druid, see [DataSketches extension](datasketches-extension.md).
+
+## Aggregator
 
 The result of the aggregation is a KllFloatsSketch or KllDoublesSketch that is the union of all sketches either built from raw data or read from the segments.
 
@@ -50,17 +52,17 @@ The result of the aggregation is a KllFloatsSketch or KllDoublesSketch that is t
  }
 ```
 
-|property|description|required?|
+|Property|Description|Required?|
 |--------|-----------|---------|
-|type|This String should be "KllFloatsSketch" or "KllDoublesSketch"|yes|
-|name|A String for the output (result) name of the calculation.|yes|
-|fieldName|A String for the name of the input field (can contain sketches or raw numeric values).|yes|
-|k|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be from 8 to 65535. See [KLL Sketch Accuracy and Size](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html).|no, defaults to 200|
-|maxStreamLength|This parameter defines the number of items that can be presented to each sketch before it may need to move from off-heap to on-heap memory. This is relevant to query types that use off-heap memory, including [TopN](../../querying/topnquery.md) and [GroupBy](../../querying/groupbyquery.md). Ideally, should be set high enough such that most sketches can stay off-heap.|no, defaults to 1000000000|
+|`type`|Either "KllFloatsSketch" or "KllDoublesSketch"|yes|
+|`name`|A String for the output (result) name of the calculation.|yes|
+|`fieldName`|String for the name of the input field, which may contain sketches or raw numeric values.|yes|
+|`k`|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be from 8 to 65535. See [KLL Sketch Accuracy and Size](https://datasketches.apache.org/docs/KLL/KLLAccuracyAndSize.html).|no, defaults to 200|
+|`maxStreamLength`|This parameter defines the number of items that can be presented to each sketch before it may need to move from off-heap to on-heap memory. This is relevant to query types that use off-heap memory, including [TopN](../../querying/topnquery.md) and [GroupBy](../../querying/groupbyquery.md). Ideally, should be set high enough such that most sketches can stay off-heap.|no, defaults to 1000000000|
 
-### Post Aggregators
+## Post aggregators
 
-#### Quantile
+### Quantile
 
 This returns an approximation to the value that would be preceded by a given fraction of a hypothetical sorted version of the input stream.
 
@@ -73,7 +75,7 @@ This returns an approximation to the value that would be preceded by a given fra
 }
 ```
 
-#### Quantiles
+### Quantiles
 
 This returns an array of quantiles corresponding to a given array of fractions
 
@@ -86,7 +88,7 @@ This returns an array of quantiles corresponding to a given array of fractions
 }
 ```
 
-#### Histogram
+### Histogram
 
 This returns an approximation to the histogram given an array of split points that define the histogram bins or a number of bins (not both). An array of <i>m</i> unique, monotonically increasing split points divide the real number line into <i>m+1</i> consecutive disjoint intervals. The definition of an interval is inclusive of the left split point and exclusive of the right split point. If the number of bins is specified instead of split points, the interval between the minimum and maximum values is divided into the given number of equally-spaced bins.
 
@@ -100,7 +102,7 @@ This returns an approximation to the histogram given an array of split points th
 }
 ```
 
-#### Rank
+### Rank
 
 This returns an approximation to the rank of a given value that is the fraction of the distribution less than that value.
 
@@ -112,7 +114,7 @@ This returns an approximation to the rank of a given value that is the fraction
   "value" : <value>
 }
 ```
-#### CDF
+### CDF
 
 This returns an approximation to the Cumulative Distribution Function given an array of split points that define the edges of the bins. An array of <i>m</i> unique, monotonically increasing split points divide the real number line into <i>m+1</i> consecutive disjoint intervals. The definition of an interval is inclusive of the left split point and exclusive of the right split point. The resulting array of fractions can be viewed as ranks of each split point with one additional rank that is always 1.
 
@@ -125,7 +127,7 @@ This returns an approximation to the Cumulative Distribution Function given an a
 }
 ```
 
-#### Sketch Summary
+### Sketch Summary
 
 This returns a summary of the sketch that can be used for debugging. This is the result of calling toString() method.
 

diff --git a/docs/development/extensions-core/datasketches-quantiles.md b/docs/development/extensions-core/datasketches-quantiles.md
@@ -37,7 +37,9 @@ To use this aggregator, make sure you [include](../../development/extensions.md#
 druid.extensions.loadList=["druid-datasketches"]
 ```
 
-### Aggregator
+For additional sketch types supported in Druid, see [DataSketches extension](datasketches-extension.md).
+
+## Aggregator
 
 The result of the aggregation is a DoublesSketch that is the union of all sketches either built from raw data or read from the segments.
 
@@ -50,17 +52,18 @@ The result of the aggregation is a DoublesSketch that is the union of all sketch
  }
 ```
 
-|property|description|required?|
+|Property|Description|Required?|
 |--------|-----------|---------|
-|type|This String should always be "quantilesDoublesSketch"|yes|
-|name|A String for the output (result) name of the calculation.|yes|
-|fieldName|A String for the name of the input field (can contain sketches or raw numeric values).|yes|
-|k|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2 from 2 to 32768. See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for details.|no, defaults to 128|
-|maxStreamLength|This parameter defines the number of items that can be presented to each sketch before it may need to move from off-heap to on-heap memory. This is relevant to query types that use off-heap memory, including [TopN](../../querying/topnquery.md) and [GroupBy](../../querying/groupbyquery.md). Ideally, should be set high enough such that most sketches can stay off-heap.|no, defaults to 1000000000|
+|`type`|This string should always be "quantilesDoublesSketch"|yes|
+|`name`|String representing the output column to store sketch values.|yes|
+|`fieldName`|A string for the name of the input field (can contain sketches or raw numeric values).|yes|
+|`k`|Parameter that determines the accuracy and size of the sketch. Higher k means higher accuracy but more space to store sketches. Must be a power of 2 from 2 to 32768. See [accuracy information](https://datasketches.apache.org/docs/Quantiles/OrigQuantilesSketch) in the DataSketches documentation for details.|no, defaults to 128|
+|`maxStreamLength`|This parameter defines the number of items that can be presented to each sketch before it may need to move from off-heap to on-heap memory. This is relevant to query types that use off-heap memory, including [TopN](../../querying/topnquery.md) and [GroupBy](../../querying/groupbyquery.md). Ideally, should be set high enough such that most sketches can stay off-heap.|no, defaults to 1000000000|
+|`shouldFinalize`|Return the final double type representing the estimate rather than the intermediate sketch type itself. In addition to controlling the finalization of this aggregator, you can control whether all aggregators are finalized with the query context parameters [`finalize`](../../querying/query-context.md) and [`sqlFinalizeOuterSketches`](../../querying/sql-query-context.md).|no, defaults to `true`|
 
-### Post Aggregators
+## Post aggregators
 
-#### Quantile
+### Quantile
 
 This returns an approximation to the value that would be preceded by a given fraction of a hypothetical sorted version of the input stream.
 
@@ -73,7 +76,7 @@ This returns an approximation to the value that would be preceded by a given fra
 }
 ```
 
-#### Quantiles
+### Quantiles
 
 This returns an array of quantiles corresponding to a given array of fractions
 
@@ -86,7 +89,7 @@ This returns an array of quantiles corresponding to a given array of fractions
 }
 ```
 
-#### Histogram
+### Histogram
 
 This returns an approximation to the histogram given an array of split points that define the histogram bins or a number of bins (not both). An array of <i>m</i> unique, monotonically increasing split points divide the real number line into <i>m+1</i> consecutive disjoint intervals. The definition of an interval is inclusive of the left split point and exclusive of the right split point. If the number of bins is specified instead of split points, the interval between the minimum and maximum values is divided into the given number of equally-spaced bins.
 
@@ -100,7 +103,7 @@ This returns an approximation to the histogram given an array of split points th
 }
 ```
 
-#### Rank
+### Rank
 
 This returns an approximation to the rank of a given value that is the fraction of the distribution less than that value.
 
@@ -112,7 +115,7 @@ This returns an approximation to the rank of a given value that is the fraction
   "value" : <value>
 }
 ```
-#### CDF
+### CDF
 
 This returns an approximation to the Cumulative Distribution Function given an array of split points that define the edges of the bins. An array of <i>m</i> unique, monotonically increasing split points divide the real number line into <i>m+1</i> consecutive disjoint intervals. The definition of an interval is inclusive of the left split point and exclusive of the right split point. The resulting array of fractions can be viewed as ranks of each split point with one additional rank that is always 1.
 
@@ -125,7 +128,7 @@ This returns an approximation to the Cumulative Distribution Function given an a
 }
 ```
 
-#### Sketch Summary
+### Sketch summary
 
 This returns a summary of the sketch that can be used for debugging. This is the result of calling toString() method.