Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 28 additions & 1 deletion docs/misc/math-expr.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,6 @@ See javadoc of java.lang.Math for detailed explanation for each function.
|toradians|toradians(x) converts an angle measured in degrees to an approximately equivalent angle measured in radians|
|ulp|ulp(x) returns the size of an ulp of the argument x|


## Array functions

| function | description |
Expand Down Expand Up @@ -227,6 +226,34 @@ map((x) -> x + 1, x)
```
in this case, the `x` when evaluating `x + 1` is the lambda argument, thus an element of the multi-valued column `x`, rather than the column `x` itself.


## JSON functions
JSON functions provide facilities to extract, transform, and create `COMPLEX<json>` values.

| function | description |
|---|---|
| json_value(expr, path) | Extract a Druid literal (`STRING`, `LONG`, `DOUBLE`) value from `expr` using JSONPath syntax of `path` |
| json_query(expr, path) | Extract a `COMPLEX<json>` value from `expr` using JSONPath syntax of `path` |
| json_object(expr1, expr2[, expr3, expr4 ...]) | Construct a `COMPLEX<json>` with alternating 'key' and 'value' arguments|
| parse_json(expr) | Deserialize a JSON `STRING` into a `COMPLEX<json>`. If the input is not a `STRING` or it is invalid JSON, this function will result in an error.|
| try_parse_json(expr) | Deserialize a JSON `STRING` into a `COMPLEX<json>`. If the input is not a `STRING` or it is invalid JSON, this function will result in a `NULL` value. |
| to_json_string(expr) | Convert `expr` into a JSON `STRING` value |
| json_keys(expr, path) | Get array of field names from `expr` at the specified JSONPath `path`, or null if the data does not exist or have any fields |
| json_paths(expr) | Get array of all JSONPath paths available from `expr` |

### JSONPath syntax

Druid supports a small, simplified subset of the [JSONPath syntax](https://github.com/json-path/JsonPath/blob/master/README.md) operators, primarily limited to extracting individual values from nested data structures.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Druid supports a small, simplified subset of the [JSONPath syntax](https://github.com/json-path/JsonPath/blob/master/README.md) operators, primarily limited to extracting individual values from nested data structures.
Druid supports a subset of the [JSONPath syntax](https://github.com/json-path/JsonPath/blob/master/README.md) operators, primarily limited to extracting individual values from nested data structures.


|Operator|Description|
| --- | --- |
|`$`| Root element. All JSONPath expressions start with this operator. |
|`.<name>`| Child element in dot notation. |
|`['<name>']`| Child element in bracket notation. |
|`[<number>]`| Array index. |

See [SQL JSON documentation](../querying/sql-json-functions.md#jsonpath-syntax) for examples.

## Reduction functions

Reduction functions operate on zero or more expressions and return a single expression. If no expressions are passed as
Expand Down
16 changes: 15 additions & 1 deletion docs/querying/sql-data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Columns in Druid are associated with a specific data type. This topic describes

Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit
float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like
hyperUnique and approxHistogram columns).
json, hyperUnique, and approxHistogram columns).
Comment thread
vtlim marked this conversation as resolved.

Timestamps (including the `__time` column) are treated by Druid as longs, with the value being the number of
milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any
Expand Down Expand Up @@ -112,3 +112,17 @@ When `druid.expressions.useStrictBooleans = false` (the default mode), Druid use
When `druid.expressions.useStrictBooleans = true`, Druid uses three-valued logic for
[expressions](../misc/math-expr.md) evaluation, such as `expression` virtual columns or `expression` filters.
However, even in this mode, Druid uses two-valued logic for filter types other than `expression`.

## Nested columns
Druid supports storing nested data structures in segments using the native `COMPLEX<json>` type. You can interact
with this data using [JSON functions](sql-json-functions.md), which can extract nested values, parse from string,
serialize to string, and create new `COMPLEX<json>` structures.

`COMPLEX` types in general currently have limited functionality outside of the use of the specialized functions which
understand them, and so have undefined behavior when:
* grouping on complex values
* filtered directly on complex values, such as `WHERE json is NULL`
* used as inputs to aggregators without specialized handling for a specific complex type

In many cases, functions are provided to translate `COMPLEX` value types to `STRING`, which serves as a workaround
solution until `COMPLEX` type functionality can be improved.
67 changes: 67 additions & 0 deletions docs/querying/sql-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -647,6 +647,46 @@ Parses `address` into an IPv4 address stored as an integer.

Converts `address` into an IPv4 address in dot-decimal notation.

## JSON_KEYS

**Function type:** [JSON](sql-json-functions.md)

`JSON_KEYS(expr, path)`

Returns an array of field names from `expr` at the specified `path`.

## JSON_OBJECT

**Function type:** [JSON](sql-json-functions.md)

`JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])`

Constructs a new `COMPLEX<json>` object. The `KEY` expressions must evaluate to string types. The `VALUE` expressions can be composed of any input type, including other `COMPLEX<json>` values. `JSON_OBJECT` can accept colon-separated key-value pairs. The following syntax is equivalent: `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.

## JSON_PATHS

**Function type:** [JSON](sql-json-functions.md)

`JSON_PATHS(expr)`

Returns an array of all paths which refer to literal values in `expr` in JSONPath format.

## JSON_QUERY

**Function type:** [JSON](sql-json-functions.md)

`JSON_QUERY(expr, path)`

Extracts a `COMPLEX<json>` value from `expr`, at the specified `path`.

## JSON_VALUE

**Function type:** [JSON](sql-json-functions.md)

`JSON_VALUE(expr, path [RETURNING sqlType])`

Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`.

## LATEST

`LATEST(expr)`
Expand Down Expand Up @@ -899,6 +939,14 @@ Returns NULL if two values are equal, else returns the first value.

Returns `e2` if `e1` is null, else returns `e1`.

## PARSE_JSON

**Function type:** [JSON](sql-json-functions.md)

`PARSE_JSON(expr)`

Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in an error.

## PARSE_LONG

`PARSE_LONG(<CHARACTER>, [<INTEGER>])`
Expand Down Expand Up @@ -1267,6 +1315,15 @@ Adds a certain amount of time to a given timestamp.

Takes the difference between two timestamps, returning the results in the given units.

## TO_JSON_STRING

**Function type:** [JSON](sql-json-functions.md)

`TO_JSON_STRING(expr)`

Serializes `expr` into a JSON string.


## TRIM

`TRIM([BOTH|LEADING|TRAILING] [<chars> FROM] expr)`
Expand All @@ -1291,6 +1348,16 @@ Alias for [`TRUNCATE`](#truncate).

Truncates a numerical expression to a specific number of decimal digits.


## TRY_PARSE_JSON

**Function type:** [JSON](sql-json-functions.md)

`TRY_PARSE_JSON(expr)`

Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value.


## UPPER

`UPPER(expr)`
Expand Down
71 changes: 71 additions & 0 deletions docs/querying/sql-json-functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
---
id: sql-json-functions
title: "SQL JSON functions"
sidebar_label: "JSON functions"
---

<!--
~ Licensed to the Apache Software Foundation (ASF) under one
~ or more contributor license agreements. See the NOTICE file
~ distributed with this work for additional information
~ regarding copyright ownership. The ASF licenses this file
~ to you under the Apache License, Version 2.0 (the
~ "License"); you may not use this file except in compliance
~ with the License. You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing,
~ software distributed under the License is distributed on an
~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
~ KIND, either express or implied. See the License for the
~ specific language governing permissions and limitations
~ under the License.
-->

<!--
The format of the tables that describe the functions and operators
should not be changed without updating the script create-sql-docs
in web-console/script/create-sql-docs, because the script detects
patterns in this markdown file and parse it to TypeScript file for web console
-->

Druid supports nested columns, which provide optimized storage and indexes for nested data structures. Use
the following JSON functions to extract, transform, and create `COMPLEX<json>` values.

| Function | Notes |
| --- | --- |
|`JSON_KEYS(expr, path)`| Returns an array of field names from `expr` at the specified `path`.|
|`JSON_OBJECT(KEY expr1 VALUE expr2[, KEY expr3 VALUE expr4, ...])` | Constructs a new `COMPLEX<json>` object. The `KEY` expressions must evaluate to string types. The `VALUE` expressions can be composed of any input type, including other `COMPLEX<json>` values. `JSON_OBJECT` can accept colon-separated key-value pairs. The following syntax is equivalent: `JSON_OBJECT(expr1:expr2[, expr3:expr4, ...])`.|
|`JSON_PATHS(expr)`| Returns an array of all paths which refer to literal values in `expr` in JSONPath format. |
|`JSON_QUERY(expr, path)`| Extracts a `COMPLEX<json>` value from `expr`, at the specified `path`. |
|`JSON_VALUE(expr, path [RETURNING sqlType])`| Extracts a literal value from `expr` at the specified `path`. If you specify `RETURNING` and an SQL type name (such as `VARCHAR`, `BIGINT`, `DOUBLE`, etc) the function plans the query using the suggested type. Otherwise, it attempts to infer the type based on the context. If it can't infer the type, it defaults to `VARCHAR`.|
|`PARSE_JSON(expr)`|Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in an error.|
|`TRY_PARSE_JSON(expr)`|Parses `expr` into a `COMPLEX<json>` object. This operator deserializes JSON values when processing them, translating stringified JSON into a nested structure. If the input is not a `VARCHAR` or it is invalid JSON, this function will result in a `NULL` value.|
|`TO_JSON_STRING(expr)`|Serializes `expr` into a JSON string.|

### JSONPath syntax

Druid supports a subset of the [JSONPath syntax](https://github.com/json-path/JsonPath/blob/master/README.md) operators, primarily limited to extracting individual values from nested data structures.

|Operator|Description|
| --- | --- |
|`$`| Root element. All JSONPath expressions start with this operator. |
|`.<name>`| Child element in dot notation. |
|`['<name>']`| Child element in bracket notation. |
|`[<number>]`| Array index. |

Consider the following example input JSON:

```json
{"x":1, "y":[1, 2, 3]}
```

- To return the entire JSON object:<br>
`$` -> `{"x":1, "y":[1, 2, 3]}`
- To return the value of the key "x":<br>
`$.x` -> `1`
- For a key that contains an array, to return the entire array:<br>
`$['y']` -> `[1, 2, 3]`
- For a key that contains an array, to return an item in the array:<br>
`$.y[1]` -> `2`
111 changes: 109 additions & 2 deletions docs/querying/virtual-columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ Virtual columns are queryable column "views" created from a set of columns durin

A virtual column can potentially draw from multiple underlying columns, although a virtual column always presents itself as a single column.

Virtual columns can be used as dimensions or as inputs to aggregators.
Virtual columns can be referenced by their output names to be used as [dimensions](./dimensionspecs.md) or as inputs to [filters](./filters.md) and [aggregators](./aggregations.md).

Each Apache Druid query can accept a list of virtual columns as a parameter. The following scan query is provided as an example:

Expand Down Expand Up @@ -64,6 +64,8 @@ Each Apache Druid query can accept a list of virtual columns as a parameter. The
## Virtual column types

### Expression virtual column
Expression virtual columns use Druid's native [expression](../misc/math-expr.md) system to allow defining query time
transforms of inputs from one or more columns.

The expression virtual column has the following syntax:

Expand All @@ -78,6 +80,111 @@ The expression virtual column has the following syntax:

|property|description|required?|
|--------|-----------|---------|
|type|Must be `"expression"` to indicate that this is an expression virtual column.|yes|
|name|The name of the virtual column.|yes|
Comment thread
vtlim marked this conversation as resolved.
|expression|An [expression](../misc/math-expr.md) that takes a row as input and outputs a value for the virtual column.|yes|
|outputType|The expression's output will be coerced to this type. Can be LONG, FLOAT, DOUBLE, or STRING.|no, default is FLOAT|
|outputType|The expression's output will be coerced to this type. Can be LONG, FLOAT, DOUBLE, STRING, ARRAY types, or COMPLEX types.|no, default is FLOAT|


### Nested field virtual column

The nested field virtual column is an optimized virtual column that can provide direct access into various paths of
a `COMPLEX<json>` column, including using their indexes.

This virtual column is used for the SQL operators `JSON_VALUE` (if `processFromRaw` is set to false) or `JSON_QUERY`
(if `processFromRaw` is true), and accepts 'JSONPath' or 'jq' syntax string representations of paths, or a parsed
list of "path parts" in order to determine what should be selected from the column.

You can define a nested field virtual column with any of the following equivalent syntaxes. The examples all produce
the same output value, with each example showing a different way to specify how to access the nested value. The first
is using JSONPath syntax `path`, the second with a jq `path`, and the third uses `pathParts`.

```json
{
"type": "nested-field",
"columnName": "shipTo",
"outputName": "v0",
"expectedType": "STRING",
"path": "$.phoneNumbers[1].number"
}
```

```json
{
"type": "nested-field",
"columnName": "shipTo",
"outputName": "v1",
"expectedType": "STRING",
"path": ".phoneNumbers[1].number",
"useJqSyntax": true
}
```

```json
{
"type": "nested-field",
"columnName": "shipTo",
"outputName": "v2",
"expectedType": "STRING",
"pathParts": [
{
"type": "field",
"field": "phoneNumbers"
},
{
"type": "arrayElement",
"index": 1
},
{
"type": "field",
"field": "number"
}
]
}
```

|property|description|required?|
|--------|-----------|---------|
|type|Must be `"nested-field"` to indicate that this is a nested field virtual column.|yes|
|columnName|The name of the `COMPLEX<json>` input column.|yes|
|outputName|The name of the virtual column.|yes|
|expectedType|The native Druid output type of the column, Druid will coerce output to this type if it does not match the underlying data. This can be `STRING`, `LONG`, `FLOAT`, `DOUBLE`, or `COMPLEX<json>`. Extracting `ARRAY` types is not yet supported.|no, default `STRING`|
|pathParts|The parsed path parts used to locate the nested values. `path` will be translated into `pathParts` internally. One of `path` or `pathParts` must be set|no, if `path` is defined|
|processFromRaw|If set to true, the virtual column will process the "raw" JSON data to extract values rather than using an optimized "literal" value selector. This option allows extracting non-literal values (such as nested JSON objects or arrays) as a `COMPLEX<json>` at the cost of much slower performance.|no, default false|
|path|'JSONPath' (or 'jq') syntax path. One of `path` or `pathParts` must be set. |no, if `pathParts` is defined|
|useJqSyntax|If true, parse `path` using 'jq' syntax instead of 'JSONPath'.|no, default is false|

#### Nested path part
Comment thread
vtlim marked this conversation as resolved.
Specify `pathParts` as an array of objects that describe each component of the path to traverse. Each object can take the following properties:

|property|description|required?|
|--------|-----------|---------|
|type|Must be 'field' or 'arrayElement'. Use `field` when accessing a specific field in a nested structure. Use `arrayElement` when accessing a specific integer position of an array (zero based).|yes|
|field|The name of the 'field' in a 'field' `type` path part|yes, if `type` is 'field'|
|index|The array element index if `type` is `arrayElement`|yes, if `type` is 'arrayElement'|



### List filtered virtual column
This virtual column provides an alternative way to use
['list filtered' dimension spec](./dimensionspecs.md#filtered-dimensionspecs) as a virtual column. It has optimized
access to the underlying column value indexes that can provide a small performance improvement in some cases.


```json
{
"type": "mv-filtered",
"name": "filteredDim3",
"delegate": "dim3",
"values": ["hello", "world"],
"isAllowList": true
}
```

|property|description|required?|
|--------|-----------|---------|
|type|Must be `"mv-filtered"` to indicate that this is a list filtered virtual column.|yes|
|name|The output name of the virtual column|yes|
|delegate|The name of the multi-value STRING input column to filter|yes|
|values|Set of STRING values to allow or deny|yes|
|isAllowList|If true, the output of the virtual column will be limited to the set specified by `values`, else it will provide all values _except_ those specified.|No, default true|
3 changes: 2 additions & 1 deletion web-console/script/create-sql-docs.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ const snarkdown = require('snarkdown');

const writefile = 'lib/sql-docs.js';

const MINIMUM_EXPECTED_NUMBER_OF_FUNCTIONS = 150;
const MINIMUM_EXPECTED_NUMBER_OF_FUNCTIONS = 158;
const MINIMUM_EXPECTED_NUMBER_OF_DATA_TYPES = 14;

function hasHtmlTags(str) {
Expand Down Expand Up @@ -63,6 +63,7 @@ const readDoc = async () => {
await fs.readFile('../docs/querying/sql-scalar.md', 'utf-8'),
await fs.readFile('../docs/querying/sql-aggregations.md', 'utf-8'),
await fs.readFile('../docs/querying/sql-multivalue-string-functions.md', 'utf-8'),
await fs.readFile('../docs/querying/sql-json-functions.md', 'utf-8'),
Comment thread
vtlim marked this conversation as resolved.
await fs.readFile('../docs/querying/sql-operators.md', 'utf-8'),
].join('\n');

Expand Down
Loading