Skip to content

document new filters and stuff#14760

Merged
vtlim merged 7 commits intoapache:masterfrom
clintropolis:new-filter-docs
Aug 8, 2023
Merged

document new filters and stuff#14760
vtlim merged 7 commits intoapache:masterfrom
clintropolis:new-filter-docs

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Aug 5, 2023

Description

Adds documentation to the new filters and SQL query context added in #14542, and also re-arranges some of the native filter documentation and makes things consistently use tables to specify their grammar similar to as I did in #14497.

Comment on lines -102 to -113
.getAPI {
color: #0073e6;
font-weight: bold;
}
.postAPI {
color: #00bf7d;
font-weight: bold;
}
.deleteAPI {
color: #f49200;
font-weight: bold;
} No newline at end of file
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm... idk why this happened, must've been when running mvn commands to spellcheck before I committed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to just restore the changes to this file?

Copy link
Copy Markdown
Contributor

@writer-jill writer-jill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some suggestions!

Comment thread docs/ingestion/schema-design.md Outdated
the [array functions](../querying/sql-array-functions.md) or [UNNEST](../querying/sql-functions.md#unnest). Nested
columns can be queried with the [JSON functions](../querying/sql-json-functions.md).

We also highly recommend setting `druid.generic.useDefaultValueForNull=false` when using these columns since it also enables out of the box `ARRAY` type filtering. If this is not set to true, setting `sqlUseBoundsAndSelectors` to `false` on the [SQL query context](../querying/sql-query-context.md) can enable `ARRAY` filtering.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this. First it says set useDefaultValueForNull to false to enable ARRAY filtering. Then it says if this is set to false (not set to true) you can set something else to enable ARRAY filtering.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yeah, this was a mistake

Comment thread docs/querying/filters.md Outdated
| -------- | ----------- | -------- |
| `type` | Must be "selector".| Yes |
| `dimension` | Input column or virtual column name to filter. | Yes |
| `value` | String value to match. | No, if not specified the filter will match NULL values. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `value` | String value to match. | No, if not specified the filter will match NULL values. |
| `value` | String value to match. | No, if not specified the filter matches NULL values. |

Comment thread docs/querying/filters.md Outdated

This is the equivalent of `WHERE <dimension_string> = '<dimension_value_string>'` or `WHERE <dimension_string> IS NULL`
(if the `value` is `null`).
The selector filter is limited to only being able to match against `STRING` (single and multi-valued), `LONG`, `FLOAT`,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The selector filter is limited to only being able to match against `STRING` (single and multi-valued), `LONG`, `FLOAT`,
The selector filter can only match against `STRING` (single and multi-valued), `LONG`, `FLOAT`,

Comment thread docs/querying/filters.md Outdated
`DOUBLE` types. Use the newer null and equality filters to match against `ARRAY` or `COMPLEX` types.

The selector filter supports the use of extraction functions, see [Filtering with Extraction Functions](#filtering-with-extraction-functions) for details.
When the selector filter matches against numeric inputs, the string `value` will be best effort coerced into a numeric value.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When the selector filter matches against numeric inputs, the string `value` will be best effort coerced into a numeric value.
When the selector filter matches against numeric inputs, the string `value` will be best-effort coerced into a numeric value.

Comment thread docs/querying/filters.md Outdated
## Equality Filter

## Logical expression filters
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter intends to have more SQL compatible behavior than the selector filter and so cannot match NULL values, use the null filter instead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter intends to have more SQL compatible behavior than the selector filter and so cannot match NULL values, use the null filter instead.
The equality filter is a replacement for the selector filter with the ability to match against any type of column. The equality filter is designed to include more SQL-compatible behavior than the selector filter and so can't match null values. To match null values, use the null filter.

Comment thread docs/querying/filters.md Outdated
Druid supports filtering on timestamp, string, long, and float columns.

Note that only string columns have bitmap indexes. Therefore, queries that filter on other column types will need to
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Therefore, queries that filter on other column types will need to
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Therefore, queries that filter on other column types will need to
Note that only string columns and columns produced with the ['auto' ingestion spec](../ingestion/ingestion-spec.md#dimension-objects) also used by [type aware schema discovery](../ingestion/schema-design.md#type-aware-schema-discovery) have bitmap indexes. Queries that filter on other column types must

Comment thread docs/querying/filters.md Outdated

### Filtering on multi-value string columns

All filters will return true if any one of the dimension values is satisfies the filter.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
All filters will return true if any one of the dimension values is satisfies the filter.
All filters return true if any one of the dimension values satisfies the filter.

Comment thread docs/querying/sql-query-context.md Outdated
|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)|
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)|
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)|
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |

Comment thread docs/querying/filters.md Outdated
@@ -490,20 +806,31 @@ should be specified as if the timestamp values were strings.

If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the user wishes to interpret the timestamp with a specific format, timezone, or locale, the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function) is useful.
If you want to interpret the timestamp with a specific format, timezone, or locale, use the [Time Format Extraction Function](./dimensionspecs.md#time-format-extraction-function).

Comment thread docs/querying/filters.md Outdated
```
will successfully match the entire row, but not match a row with value `['a', 'c']`.

To express this filter in SQL, one would need to use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To express this filter in SQL, one would need to use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters.
To express this filter in SQL, use [SQL multi-value string functions](./sql-multivalue-string-functions.md) such as `MV_CONTAINS`, which can be optimized by the planner to the same native filters.

Copy link
Copy Markdown
Contributor

@writer-jill writer-jill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some suggestions!

Copy link
Copy Markdown
Member

@vtlim vtlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits on style

Comment thread docs/querying/filters.md Outdated
| -------- | ----------- | -------- |
| `type` | Must be "equality".| Yes |
| `column` | Input column or virtual column name to filter. | Yes |
| `matchValueType` | String specifying the type of value to match, for example `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `matchValueType` | String specifying the type of value to match, for example `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes |
| `matchValueType` | String specifying the type of value to match. For example, `STRING`, `LONG`, `DOUBLE`, `FLOAT`, `ARRAY<STRING>`, `ARRAY<LONG>`, or any other Druid type. The `matchValueType` determines how Druid interprets the `matchValue` to assist in converting to the type of the matched `column`. | Yes |

Comment thread docs/querying/filters.md Outdated
Note that the column comparison filter converts all values to strings prior to comparison. This allows differently-typed input columns to match without a cast operation.

Search filters can be used to filter on partial string matches.
### Example: equivalent of `WHERE someColumn = someLongColumn`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent of `WHERE someColumn = someLongColumn`.
### Example: equivalent of `WHERE someColumn = someLongColumn`

Comment thread docs/querying/filters.md Outdated

Note that the bound filter matches null values if you don't specify a lower bound. Use the range filter if SQL-compatible behavior.

### Example: equivalent to `WHERE 21 <= age <= 31`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent to `WHERE 21 <= age <= 31`:
### Example: equivalent to `WHERE 21 <= age <= 31`

Comment thread docs/querying/filters.md Outdated
```

This filter expresses the condition `foo <= name <= hoo`, using the default lexicographic sorting order.
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order.
### Example: equivalent to `WHERE 'foo' <= name <= 'hoo'`, using the default lexicographic sorting order

Comment thread docs/querying/filters.md Outdated
```

The user can also specify a one-sided bound by omitting "upper" or "lower". This filter expresses `age < 31`.
### Example: equivalent to `WHERE age < 31`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Example: equivalent to `WHERE age < 31`.
### Example: equivalent to `WHERE age < 31`

Comment thread docs/querying/filters.md Outdated

All filters return true if any one of the dimension values is satisfies the filter.

#### Example: multi-value match behavior.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: multi-value match behavior.
#### Example: multi-value match behavior

Comment thread docs/querying/filters.md Outdated
the "regex" filter) the numeric column values will be converted to strings during the scan.

For example, filtering on a specific value, `myFloatColumn = 10.1`:
#### Example: filtering on a specific value, `myFloatColumn = 10.1`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: filtering on a specific value, `myFloatColumn = 10.1`:
#### Example: filtering on a specific value, `myFloatColumn = 10.1`

Comment thread docs/querying/filters.md Outdated
```

Filtering on a range of values, `10 <= myFloatColumn < 20`:
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`:
#### Example: filtering on a range of values, `10 <= myFloatColumn < 20`

Comment thread docs/querying/filters.md Outdated
```

Filtering on day of week:
#### Example: filtering on day of week using an extractionFn
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#### Example: filtering on day of week using an extractionFn
#### Example: filtering on day of week using an extraction function

Comment thread docs/querying/sql-query-context.md Outdated
|`enableTimeBoundaryPlanning`|If true, SQL queries will get converted to TimeBoundary queries wherever possible. TimeBoundary queries are very efficient for min-max calculation on `__time` column in a datasource |`druid.query.default.context.enableTimeBoundaryPlanning` on the Broker (default: false)|
|`useNativeQueryExplain`|If true, `EXPLAIN PLAN FOR` will return the explain plan as a JSON representation of equivalent native query(s), else it will return the original version of explain plan generated by Calcite.<br /><br />This property is provided for backwards compatibility. It is not recommended to use this parameter unless you were depending on the older behavior.|`druid.sql.planner.useNativeQueryExplain` on the Broker (default: true)|
|`sqlFinalizeOuterSketches`|If false (default behavior in Druid 25.0.0 and later), `DS_HLL`, `DS_THETA`, and `DS_QUANTILES_SKETCH` return sketches in query results, as documented. If true (default behavior in Druid 24.0.1 and earlier), sketches from these functions are finalized when they appear in query results.<br /><br />This property is provided for backwards compatibility with behavior in Druid 24.0.1 and earlier. It is not recommended to use this parameter unless you were depending on the older behavior. Instead, use a function that does not return a sketch, such as `APPROX_COUNT_DISTINCT_DS_HLL`, `APPROX_COUNT_DISTINCT_DS_THETA`, `APPROX_QUANTILE_DS`, `DS_THETA_ESTIMATE`, or `DS_GET_QUANTILE`.|`druid.query.default.context.sqlFinalizeOuterSketches` on the Broker (default: false)|
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull`. |
|`sqlUseBoundAndSelectors`|If false (default behavior if `druid.generic.useDefaultValueForNull=false` in Druid 27.0.0 and later), the SQL planner will use [equality](./filters.md#equality-filter), [null](./filters.md#null-filter), and [range](./filters.md#range-filter) filters instead of [selector](./filters.md#selector-filter) and [bounds](./filters.md#bound-filter). This value must be set to `false` for correct behavior for filtering `ARRAY` typed values. | Defaults to same value as `druid.generic.useDefaultValueForNull` |

@vtlim vtlim merged commit e57f880 into apache:master Aug 8, 2023
@clintropolis clintropolis deleted the new-filter-docs branch August 8, 2023 23:09
clintropolis added a commit to clintropolis/druid that referenced this pull request Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants