Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/content/development/extensions-contrib/distinctcount.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ To use this Apache Druid (incubating) extension, make sure to [include](../../op

Additionally, follow these steps:

(1) First, use a single dimension hash-based partition spec to partition data by a single dimension. For example visitor_id. This to make sure all rows with a particular value for that dimension will go into the same segment, or this might over count.
(2) Second, use distinctCount to calculate the distinct count, make sure queryGranularity is divided exactly by segmentGranularity or else the result will be wrong.
1. First, use a single dimension hash-based partition spec to partition data by a single dimension. For example visitor_id. This to make sure all rows with a particular value for that dimension will go into the same segment, or this might over count.
2. Second, use distinctCount to calculate the distinct count, make sure queryGranularity is divided exactly by segmentGranularity or else the result will be wrong.

There are some limitations, when used with groupBy, the groupBy keys' numbers should not exceed maxIntermediateRows in every segment. If exceeded the result will be wrong. When used with topN, numValuesPerPass should not be too big. If too big the distinctCount will use a lot of memory and might cause the JVM to go our of memory.

Expand Down
2 changes: 2 additions & 0 deletions docs/content/development/extensions-contrib/influx.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ A typical line looks like this:
```cpu,application=dbhost=prdb123,region=us-east-1 usage_idle=99.24,usage_user=0.55 1520722030000000000```

which contains four parts:

- measurement: A string indicating the name of the measurement represented (e.g. cpu, network, web_requests)
- tags: zero or more key-value pairs (i.e. dimensions)
- measurements: one or more key-value pairs; values can be numeric, boolean, or string
Expand All @@ -43,6 +44,7 @@ which contains four parts:
The parser extracts these fields into a map, giving the measurement the key `measurement` and the timestamp the key `_ts`. The tag and measurement keys are copied verbatim, so users should take care to avoid name collisions. It is up to the ingestion spec to decide which fields should be treated as dimensions and which should be treated as metrics (typically tags correspond to dimensions and measurements correspond to metrics).

The parser is configured like so:

```json
"parser": {
"type": "string",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ In materialized-view-maintenance, dataSouces user ingested are called "base-data
The `derivativeDataSource` supervisor is used to keep the timeline of derived-dataSource consistent with base-dataSource. Each `derivativeDataSource` supervisor is responsible for one derived-dataSource.

A sample derivativeDataSource supervisor spec is shown below:

```json
{
"type": "derivativeDataSource",
Expand Down Expand Up @@ -90,6 +91,7 @@ A sample derivativeDataSource supervisor spec is shown below:
In materialized-view-selection, we implement a new query type `view`. When we request a view query, Druid will try its best to optimize the query based on query dataSource and intervals.

A sample view query spec is shown below:

```json
{
"queryType": "view",
Expand Down Expand Up @@ -124,6 +126,7 @@ A sample view query spec is shown below:
}
}
```

There are 2 parts in a view query:

|Field|Description|Required|
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ druid.extensions.loadList=["druid-momentsketch"]
The result of the aggregation is a momentsketch that is the union of all sketches either built from raw data or read from the segments.

The `momentSketch` aggregator operates over raw data while the `momentSketchMerge` aggregator should be used when aggregating pre-computed sketches.

```json
{
"type" : <aggregator_type>,
Expand All @@ -59,6 +60,7 @@ The `momentSketch` aggregator operates over raw data while the `momentSketchMerg
### Post Aggregators

Users can query for a set of quantiles using the `momentSketchSolveQuantiles` post-aggregator on the sketches created by the `momentSketch` or `momentSketchMerge` aggregators.

```json
{
"type" : "momentSketchSolveQuantiles",
Expand All @@ -69,6 +71,7 @@ Users can query for a set of quantiles using the `momentSketchSolveQuantiles` po
```

Users can also query for the min/max of a distribution:

```json
{
"type" : "momentSketchMin" | "momentSketchMax",
Expand All @@ -79,6 +82,7 @@ Users can also query for the min/max of a distribution:

### Example
As an example of a query with sketches pre-aggregated at ingestion time, one could set up the following aggregator at ingest:

```json
{
"type": "momentSketch",
Expand All @@ -88,7 +92,9 @@ As an example of a query with sketches pre-aggregated at ingestion time, one cou
"compress": true,
}
```

and make queries using the following aggregator + post-aggregator:

```json
{
"aggregations": [{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ These Aggregate Window Functions consume standard Druid Aggregators and outputs
Moving Average encapsulates the [groupBy query](../../querying/groupbyquery.html) (Or [timeseries](../../querying/timeseriesquery.html) in case of no dimensions) in order to rely on the maturity of these query types.

It runs the query in two main phases:

1. Runs an inner [groupBy](../../querying/groupbyquery.html) or [timeseries](../../querying/timeseriesquery.html) query to compute Aggregators (i.e. daily count of events).
2. Passes over aggregated results in Broker, in order to compute Averagers (i.e. moving 7 day average of the daily count).

Expand Down Expand Up @@ -110,6 +111,7 @@ These are properties which are common to all Averagers:
#### Standard averagers

These averagers offer four functions:

* Mean (Average)
* MeanNoNulls (Ignores empty buckets).
* Max
Expand All @@ -121,6 +123,7 @@ In that case, the first records will ignore missing buckets and average won't be
However, this also means that empty days in a sparse dataset will also be ignored.

Example of usage:

```json
{ "type" : "doubleMean", "name" : <output_name>, "fieldName": <input_name> }
```
Expand All @@ -130,6 +133,7 @@ This optional parameter is used to calculate over a single bucket within each cy
A prime example would be weekly buckets, resulting in a Day of Week calculation. (Other examples: Month of year, Hour of day).

I.e. when using these parameters:

* *granularity*: period=P1D (daily)
* *buckets*: 28
* *cycleSize*: 7
Expand All @@ -146,6 +150,7 @@ All examples are based on the Wikipedia dataset provided in the Druid [tutorials
Calculating a 7-buckets moving average for Wikipedia edit deltas.

Query syntax:

```json
{
"queryType": "movingAverage",
Expand Down Expand Up @@ -176,6 +181,7 @@ Query syntax:
```

Result:

```json
[ {
"version" : "v1",
Expand Down Expand Up @@ -217,6 +223,7 @@ Result:
Calculating a 7-buckets moving average for Wikipedia edit deltas, plus a ratio between the current period and the moving average.

Query syntax:

```json
{
"queryType": "movingAverage",
Expand Down Expand Up @@ -264,6 +271,7 @@ Query syntax:
```

Result:

```json
[ {
"version" : "v1",
Expand Down Expand Up @@ -306,6 +314,7 @@ Result:
Calculating an average of every first 10-minutes of the last 3 hours:

Query syntax:

```json
{
"queryType": "movingAverage",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,9 @@ The result of the aggregation is a T-Digest sketch that is built ingesting numer
"compression": <parameter that controls size and accuracy>
}
```

Example:

```json
{
"type": "buildTDigestSketch",
Expand Down Expand Up @@ -95,6 +97,7 @@ The result of the aggregation is a T-Digest sketch that is built by merging pre-
|compression|Parameter that determines the accuracy and size of the sketch. Higher compression means higher accuracy but more space to store sketches.|no, defaults to 100|

Example:

```json
{
"queryType": "groupBy",
Expand All @@ -110,6 +113,7 @@ Example:
"intervals": ["2016-01-01T00:00:00.000Z/2016-01-31T00:00:00.000Z"]
}
```

### Post Aggregators

#### Quantiles
Expand All @@ -133,6 +137,7 @@ This returns an array of quantiles corresponding to a given array of fractions.
|fractions|Non-empty array of fractions between 0 and 1|yes|

Example:

```json
{
"queryType": "groupBy",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ Return a list of all user names.
Return the name and role information of the user with name {userName}

Example output:

```json
{
"name": "druid2",
Expand All @@ -183,9 +184,11 @@ Example output:
```

This API supports the following flags:

- `?full`: The response will also include the full information for each role currently assigned to the user.

Example output:

```json
{
"name": "druid2",
Expand Down Expand Up @@ -268,6 +271,7 @@ Return a list of all role names.
Return name and permissions for the role named {roleName}.

Example output:

```json
{
"name": "druidRole2",
Expand Down Expand Up @@ -299,6 +303,7 @@ This API supports the following flags:
- `?simplifyPermissions`: The permissions in the output will contain only a list of `resourceAction` objects, without the extraneous `resourceNamePattern` field. The `users` field will be null when `?full` is not specified.

Example output:

```json
{
"name": "druidRole2",
Expand Down
1 change: 1 addition & 0 deletions docs/content/development/extensions-core/druid-lookups.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ Same for Loading cache, developer can implement a new type of loading cache by i

##### Example of Polling On-heap Lookup
This example demonstrates a polling cache that will update its on-heap cache every 10 minutes

```json
{
"type":"pollingLookup",
Expand Down
4 changes: 4 additions & 0 deletions docs/content/development/extensions-core/orc.md
Original file line number Diff line number Diff line change
Expand Up @@ -269,13 +269,15 @@ This extension, first available in version 0.15.0, replaces the previous 'contri
ingestion task is *incompatible*, and will need modified to work with the newer 'core' extension.

To migrate to 0.15.0+:

* In `inputSpec` of `ioConfig`, `inputFormat` must be changed from `"org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat"` to
`"org.apache.orc.mapreduce.OrcInputFormat"`
* The 'contrib' extension supported a `typeString` property, which provided the schema of the
ORC file, of which was essentially required to have the types correct, but notably _not_ the column names, which
facilitated column renaming. In the 'core' extension, column renaming can be achieved with
[`flattenSpec` expressions](../../ingestion/flatten-json.html). For example, `"typeString":"struct<time:string,name:string>"`
with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid schema would need replaced with:

```json
"flattenSpec": {
"fields": [
Expand All @@ -293,10 +295,12 @@ with the actual schema `struct<_col0:string,_col1:string>`, to preserve Druid sc
...
}
```

* The 'contrib' extension supported a `mapFieldNameFormat` property, which provided a way to specify a dimension to
flatten `OrcMap` columns with primitive types. This functionality has also been replaced with
[`flattenSpec` expressions](../../ingestion/flatten-json.html). For example: `"mapFieldNameFormat": "<PARENT>_<CHILD>"`
for a dimension `nestedData_dim1`, to preserve Druid schema could be replaced with

```json
"flattenSpec": {
"fields": [
Expand Down
6 changes: 6 additions & 0 deletions docs/content/querying/filters.md
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ greater than, less than, greater than or equal to, less than or equal to, and "b
Bound filters support the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.

The following bound filter expresses the condition `21 <= age <= 31`:

```json
{
"type": "bound",
Expand All @@ -293,6 +294,7 @@ The following bound filter expresses the condition `21 <= age <= 31`:
```

This filter expresses the condition `foo <= name <= hoo`, using the default lexicographic sorting order.

```json
{
"type": "bound",
Expand All @@ -303,6 +305,7 @@ This filter expresses the condition `foo <= name <= hoo`, using the default lexi
```

Using strict bounds, this filter expresses the condition `21 < age < 31`

```json
{
"type": "bound",
Expand All @@ -316,6 +319,7 @@ Using strict bounds, this filter expresses the condition `21 < age < 31`
```

The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses `age < 31`.

```json
{
"type": "bound",
Expand All @@ -327,6 +331,7 @@ The user can also specify a one-sided bound by omitting "upper" or "lower". This
```

Likewise, this filter expresses `age >= 18`

```json
{
"type": "bound",
Expand Down Expand Up @@ -355,6 +360,7 @@ The interval filter supports the use of extraction functions, see [Filtering wit
If an extraction function is used with this filter, the extraction function should output values that are parseable as long milliseconds.

The following example filters on the time ranges of October 1-7, 2014 and November 15-16, 2014.

```json
{
"type" : "interval",
Expand Down