diff --git a/docs/querying/sql-array-functions.md b/docs/querying/sql-array-functions.md new file mode 100644 index 000000000000..ed7d51b9e82e --- /dev/null +++ b/docs/querying/sql-array-functions.md @@ -0,0 +1,57 @@ +--- +id: sql-array-functions +title: "SQL ARRAY functions" +sidebar_label: "Array functions" +--- + + + + + + +> Apache Druid supports two query languages: Druid SQL and [native queries](querying.md). +> This document describes the SQL language. + +This page describes the operations you can perform on arrays using [Druid SQL](./sql.md). See [`ARRAY` data type documentation](./sql-data-types.md#arrays) for additional details. + +All array references in the array function documentation can refer to multi-value string columns or `ARRAY` literals. These functions are largely +identical to the [multi-value string functions](sql-multivalue-string-functions.md), but use `ARRAY` types and behavior. + +|Function|Description| +|--------|-----| +|`ARRAY[expr1, expr2, ...]`|Constructs a SQL `ARRAY` literal from the expression arguments, using the type of the first argument as the output array type.| +|`ARRAY_LENGTH(arr)`|Returns length of the array expression.| +|`ARRAY_OFFSET(arr, long)`|Returns the array element at the 0-based index supplied, or null for an out of range index.| +|`ARRAY_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.| +|`ARRAY_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns 1 if `arr` contains `expr`. If `expr` is an array, returns 1 if `arr` contains all elements of `expr`. Otherwise returns 0.| +|`ARRAY_OVERLAP(arr1, arr2)`|Returns 1 if `arr1` and `arr2` have any elements in common, else 0.| +|`ARRAY_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `-1` or `null` if `druid.generic.useDefaultValueForNull=false`.| +|`ARRAY_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `-1` or `null` if `druid.generic.useDefaultValueForNull=false`.| +|`ARRAY_PREPEND(expr, arr)`|Prepends `expr` to `arr` at the beginning, the resulting array type determined by the type of `arr`.| +|`ARRAY_APPEND(arr1, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of `arr1`.| +|`ARRAY_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| +|`ARRAY_SLICE(arr, start, end)`|Returns the subarray of `arr` from the 0-based index `start` (inclusive) to `end` (exclusive). Returns `null`, if `start` is less than 0, greater than length of `arr`, or greater than `end`.| +|`ARRAY_TO_STRING(arr, str)`|Joins all elements of `arr` by the delimiter specified by `str`.| +|`STRING_TO_ARRAY(str1, str2)`|Splits `str1` into an array on the delimiter specified by `str2`.| diff --git a/docs/querying/sql-data-types.md b/docs/querying/sql-data-types.md index 9e2b6739c642..4e6286d032d6 100644 --- a/docs/querying/sql-data-types.md +++ b/docs/querying/sql-data-types.md @@ -31,9 +31,13 @@ Columns in Druid are associated with a specific data type. This topic describes ## Standard types -Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit -float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like -json, hyperUnique, and approxHistogram columns). +Druid natively supports the following basic column types: +* LONG: (64 bit signed int) +* FLOAT (32 bit float) +* DOUBLE: (64 bit float) +* STRING: (UTF-8 encoded strings and string arrays) +* COMPLEX: non-standard data types, such as nested JSON, hyperUnique and approxHistogram, and DataSketches +* ARRAY: arrays composed of any of these types Timestamps (including the `__time` column) are treated by Druid as longs, with the value being the number of milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any @@ -65,6 +69,7 @@ The following table describes how Druid maps SQL types onto native types when ru |BIGINT|LONG|`0`|Druid LONG columns (except `__time`) are reported as BIGINT| |TIMESTAMP|LONG|`0`, meaning 1970-01-01 00:00:00 UTC|Druid's `__time` column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. `2000-01-02 03:04:05`, _not_ ISO8601 formatting. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).| |DATE|LONG|`0`, meaning 1970-01-01|Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. `2000-01-02`. For handling other formats, use one of the [time functions](sql-scalar.md#date-and-time-functions).| +|ARRAY|ARRAY|`NULL`|Druid native array types work as SQL arrays, and multi-value strings can be converted to arrays. See the [`ARRAY` details](#arrays).| |OTHER|COMPLEX|none|May represent various Druid column types such as hyperUnique, approxHistogram, etc.| * Default value applies if `druid.generic.useDefaultValueForNull = true` (the default mode). Otherwise, the default value is `NULL` for all types. @@ -73,9 +78,10 @@ The following table describes how Druid maps SQL types onto native types when ru Druid's native type system allows strings to potentially have multiple values. These [multi-value string dimensions](multi-value-dimensions.md) are reported in SQL as `VARCHAR` typed, and can be -syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are +syntactically used like any other `VARCHAR`. Regular string functions that refer to multi-value string dimensions are applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special -[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations. +[multi-value string functions](sql-multivalue-string-functions.md), which can perform powerful array-aware operations, but retain +their `VARCHAR` typing and behavior. Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to an implicit SQL `UNNEST`. Refer to the documentation on [multi-value string dimensions](multi-value-dimensions.md) @@ -85,8 +91,45 @@ for additional details. > they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be > incorrectly optimized by the Druid SQL planner: `multi_val_dim = 'a' AND multi_val_dim = 'b'` is optimized to > `false`, even though it is possible for a single row to have both "a" and "b" as values for `multi_val_dim`. The -> SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior -> in native queries. +> SQL behavior of multi-value dimensions may change in a future release to more closely align with their behavior +> in native queries, but the [multi-value string functions](./sql-multivalue-string-functions.md) should be able to provide +> nearly all possible native functionality. + +## Arrays +Druid supports `ARRAY` types constructed at query time, though it currently lacks the ability to store them in +segments. `ARRAY` types behave as standard SQL arrays, where results are grouped by matching entire arrays. This is in +contrast to the implicit `UNNEST` that occurs when grouping on multi-value dimensions directly or when used with the +multi-value functions. You can convert multi-value dimensions to standard SQL arrays either by explicitly by converting +them with `MV_TO_ARRAY` or implicitly when used within the [array functions](./sql-array-functions.md). Arrays may +also be constructed from multiple columns using the array functions. + +## Multi-value strings behavior +The behavior of Druid [multi-value string dimensions](multi-value-dimensions.md) varies depending on the context of +their usage. + +When used with standard `VARCHAR` functions which expect a single input value per row, such as `CONCAT`, Druid will map +the function across all values in the row. If the row is null or empty, the function receives `NULL` as its input. + +When used with the explicit [multi-value string functions](./sql-multivalue-string-functions.md), Druid processes the +row values as if they were `ARRAY` typed. Any operations which produce null and empty rows are distinguished as +separate values (unlike implicit mapping behavior). These multi-value string functions, typically denoted with an `MV_` +prefix, retain their `VARCHAR` type after the computation is complete. Note that Druid multi-value columns do _not_ +distinguish between empty and null rows. An empty row will never appear natively as input to a multi-valued function, +but any multi-value function which manipulates the array form of the value may produce an empty array, which is handled +separately while processing. + +> Do not mix the usage of multi-value functions and normal scalar functions within the same expression, as the planner will be unable +> to determine how to properly process the value given its ambiguous usage. A multi-value string must be treated consistently within +> an expression. + +When converted to `ARRAY` or used with [array functions](./sql-array-functions.md), multi-value strings behave as standard SQL arrays and can no longer +be manipulated with non-array functions. + +Druid serializes multi-value `VARCHAR` results as a JSON string of the array, if grouping was not applied on the value. +If the value was grouped, due to the implicit `UNNEST` behavior, all results will always be standard single value +`VARCHAR`. `ARRAY` typed results will be serialized into stringified JSON arrays if the context parameter +`sqlStringifyArrays` is set, otherwise they remain in their array format. + ## NULL values diff --git a/docs/querying/sql-multivalue-string-functions.md b/docs/querying/sql-multivalue-string-functions.md index 85bab5e6793c..d0d9040a2448 100644 --- a/docs/querying/sql-multivalue-string-functions.md +++ b/docs/querying/sql-multivalue-string-functions.md @@ -36,26 +36,28 @@ sidebar_label: "Multi-value string functions" Druid supports string dimensions containing multiple values. This page describes the operations you can perform on multi-value string dimensions using [Druid SQL](./sql.md). -See [Multi-value dimensions](multi-value-dimensions.md) for more information. +See [SQL multi-value strings](./sql-data-types.md#multi-value-strings) and native [Multi-value dimensions](multi-value-dimensions.md) for more information. -All "array" references in the multi-value string function documentation can refer to multi-value string columns or -`ARRAY` literals. +All array references in the multi-value string function documentation can refer to multi-value string columns or +`ARRAY` types. These functions are largely identical to the [array functions](./sql-array-functions.md), but use +`VARCHAR` types and behavior. Multi-value strings can also be converted to `ARRAY` types using `MV_TO_ARRAY`. For +additional details about `ARRAY` types, see [`ARRAY` data type documentation](./sql-data-types.md#arrays). -|Function|Notes| +|Function|Description| |--------|-----| -|`ARRAY[expr1, expr2, ...]`|Constructs a SQL ARRAY literal from the expression arguments, using the type of the first argument as the output array type.| |`MV_FILTER_ONLY(expr, arr)`|Filters multi-value `expr` to include only values contained in array `arr`.| |`MV_FILTER_NONE(expr, arr)`|Filters multi-value `expr` to include no values contained in array `arr`.| -|`MV_LENGTH(arr)`|Returns length of array expression.| -|`MV_OFFSET(arr, long)`|Returns the array element at the 0 based index supplied, or null for an out of range index.| -|`MV_ORDINAL(arr, long)`|Returns the array element at the 1 based index supplied, or null for an out of range index.| -|`MV_CONTAINS(arr, expr)`|Returns 1 if the array contains the element specified by `expr`, or contains all elements specified by `expr` if `expr` is an array, else 0.| -|`MV_OVERLAP(arr1, arr2)`|Returns 1 if arr1 and arr2 have any elements in common, else 0.| -|`MV_OFFSET_OF(arr, expr)`|Returns the 0 based index of the first occurrence of `expr` in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false` if no matching elements exist in the array.| -|`MV_ORDINAL_OF(arr, expr)`|Returns the 1 based index of the first occurrence of `expr` in the array, or `-1` or `null` if `druid.generic.useDefaultValueForNull=false` if no matching elements exist in the array.| +|`MV_LENGTH(arr)`|Returns length of the array expression.| +|`MV_OFFSET(arr, long)`|Returns the array element at the 0-based index supplied, or null for an out of range index.| +|`MV_ORDINAL(arr, long)`|Returns the array element at the 1-based index supplied, or null for an out of range index.| +|`MV_CONTAINS(arr, expr)`|If `expr` is a scalar type, returns 1 if `arr` contains `expr`. If `expr` is an array, returns 1 if `arr` contains all elements of `expr`. Otherwise returns 0.| +|`MV_OVERLAP(arr1, arr2)`|Returns 1 if `arr1` and `arr2` have any elements in common, else 0.| +|`MV_OFFSET_OF(arr, expr)`|Returns the 0-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `-1` or `null` if `druid.generic.useDefaultValueForNull=false`.| +|`MV_ORDINAL_OF(arr, expr)`|Returns the 1-based index of the first occurrence of `expr` in the array. If no matching elements exist in the array, returns `-1` or `null` if `druid.generic.useDefaultValueForNull=false`.| |`MV_PREPEND(expr, arr)`|Adds `expr` to `arr` at the beginning, the resulting array type determined by the type of the array.| |`MV_APPEND(arr1, expr)`|Appends `expr` to `arr`, the resulting array type determined by the type of the first array.| -|`MV_CONCAT(arr1, arr2)`|Concatenates 2 arrays, the resulting array type determined by the type of the first array.| -|`MV_SLICE(arr, start, end)`|Returns the subarray of `arr` from the 0 based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or less than end.| +|`MV_CONCAT(arr1, arr2)`|Concatenates `arr2` to `arr1`. The resulting array type is determined by the type of `arr1`.| +|`MV_SLICE(arr, start, end)`|Returns the subarray of `arr` from the 0-based index start(inclusive) to end(exclusive), or `null`, if start is less than 0, greater than length of arr or greater than end.| |`MV_TO_STRING(arr, str)`|Joins all elements of `arr` by the delimiter specified by `str`.| |`STRING_TO_MV(str1, str2)`|Splits `str1` into an array on the delimiter specified by `str2`.| +|`MV_TO_ARRAY(str)`|Converts a multi-value string from a `VARCHAR` to a `VARCHAR ARRAY`.| diff --git a/web-console/script/create-sql-docs.js b/web-console/script/create-sql-docs.js index 561db7e2cc46..794afb06e556 100755 --- a/web-console/script/create-sql-docs.js +++ b/web-console/script/create-sql-docs.js @@ -67,9 +67,10 @@ const readDoc = async () => { await fs.readFile('../docs/querying/sql-data-types.md', 'utf-8'), await fs.readFile('../docs/querying/sql-scalar.md', 'utf-8'), await fs.readFile('../docs/querying/sql-aggregations.md', 'utf-8'), + await fs.readFile('../docs/querying/sql-array-functions.md', 'utf-8'), await fs.readFile('../docs/querying/sql-multivalue-string-functions.md', 'utf-8'), await fs.readFile('../docs/querying/sql-json-functions.md', 'utf-8'), - await fs.readFile('../docs/querying/sql-operators.md', 'utf-8'), + await fs.readFile('../docs/querying/sql-operators.md', 'utf-8') ].join('\n'); const lines = data.split('\n');