diff --git a/docs/en/extending-doris/doris-on-es.md b/docs/en/extending-doris/doris-on-es.md index 04fdb84cb5f31e..f61b458e96364b 100644 --- a/docs/en/extending-doris/doris-on-es.md +++ b/docs/en/extending-doris/doris-on-es.md @@ -452,6 +452,67 @@ select * from es_table where esquery(k4, ' { 4. After calculating the result, return it to client +## Best Practices + +### Suggestions for using Date type fields + +The use of Datetype fields in ES is very flexible, but in Doris On ES, if the type of the Date type field is not set properly, it will cause the filter condition can not be pushed down. + +When creating an index, do maximum format compatibility with the setting of the Date type format: + +``` + "dt": { + "type": "date", + "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" + } +``` + +When creating this field in Doris, it is recommended to set it to `date` or `datetime`, and it can also be set to `varchar` type. The following SQL statements can be used to directly push the filter condition down to ES + + +``` +select * from doe where k2 > '2020-06-21'; + +select * from doe where k2 < '2020-06-21 12:00:00'; + +select * from doe where k2 < 1593497011; + +select * from doe where k2 < now(); + +select * from doe where k2 < date_format(now(), '%Y-%m-%d'); +``` + +`Notice`: + +* If you don’t set the format for the time type field In ES, the default format for Date-type field is + +``` +strict_date_optional_time||epoch_millis +``` +* If the date field indexed into ES is unix timestamp, it needs to be converted to `ms`, and the internal timestamp of ES is processed according to `ms` unit, otherwise Doris On ES will display wrong column data + +### Fetch ES metadata field `_id` + +When indexing documents without specifying `_id`, ES will assign a globally unique `_id` field to each document. Users can also specify a `_id` with special represent some business meaning for the document when indexing; if needed, Doris On ES can get the value of this field by adding the `_id` field of type `varchar` when creating the ES external table + +``` +CREATE EXTERNAL TABLE `doe` ( + `_id` varchar COMMENT "", + `city` varchar COMMENT "" +) ENGINE=ELASTICSEARCH +PROPERTIES ( +"hosts" = "http://127.0.0.1:8200", +"user" = "root", +"password" = "root", +"index" = "doe", +"type" = "doc" +} +``` +`Notice`: + +1. The filtering condition of the `_id` field only supports two types: `=` and `in` +2. The `_id` field can only be of type `varchar` + ## Q&A 1. ES Version Requirements @@ -469,6 +530,3 @@ select * from es_table where esquery(k4, ' { 4. Whether the aggregation operation can be pushed down At present, Doris On ES does not support push-down operations such as sum, avg, min/max, etc., all documents satisfying the conditions are obtained from the ES in batch flow, and then calculated in Doris - -5. Filters for date type fields cannot be pushed down - Due to the time format problem, the date type field will not be pushed down in most cases; the date type filtering can be in the form of a string, and the date format needs to be completely consistent with ES diff --git a/docs/zh-CN/extending-doris/doris-on-es.md b/docs/zh-CN/extending-doris/doris-on-es.md index 9b5824399e9bec..22a7367171ca32 100644 --- a/docs/zh-CN/extending-doris/doris-on-es.md +++ b/docs/zh-CN/extending-doris/doris-on-es.md @@ -449,6 +449,67 @@ select * from es_table where esquery(k4, ' { 4. Doris计算完结果后,返回给用户 +## 最佳实践 + +### 时间类型字段使用建议 + +在ES中,时间类型的字段使用十分灵活,但是在Doris On ES中如果对时间类型字段的类型设置不当,则会造成过滤条件无法下推 + +创建索引时对时间类型格式的设置做最大程度的格式兼容: + +``` + "dt": { + "type": "date", + "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" + } +``` + +在Doris中建立该字段时建议设置为`date`或`datetime`,也可以设置为`varchar`类型, 使用如下SQL语句都可以直接将过滤条件下推至ES: + +``` +select * from doe where k2 > '2020-06-21'; + +select * from doe where k2 < '2020-06-21 12:00:00'; + +select * from doe where k2 < 1593497011; + +select * from doe where k2 < now(); + +select * from doe where k2 < date_format(now(), '%Y-%m-%d'); +``` + +注意: + +* 在ES中如果不对时间类型的字段设置`format`, 默认的时间类型字段格式为 + +``` +strict_date_optional_time||epoch_millis +``` + +* 导入到ES的日期字段如果是时间戳需要转换成`ms`, ES内部处理时间戳都是按照`ms`进行处理的, 否则Doris On ES会出现显示错误 + +### 获取ES元数据字段`_id` + +导入文档在不指定`_id`的情况下ES会给每个文档分配一个全局唯一的`_id`即主键, 用户也可以在导入时为文档指定一个含有特殊业务意义的`_id`; 如果需要在Doris On ES中获取该字段值,建表时可以增加类型为`varchar`的`_id`字段: + +``` +CREATE EXTERNAL TABLE `doe` ( + `_id` varchar COMMENT "", + `city` varchar COMMENT "" +) ENGINE=ELASTICSEARCH +PROPERTIES ( +"hosts" = "http://127.0.0.1:8200", +"user" = "root", +"password" = "root", +"index" = "doe", +"type" = "doc" +} +``` + +注意: + +1. `_id`字段的过滤条件仅支持`=`和`in`两种 +2. `_id`字段只能是`varchar`类型 ## Q&A @@ -466,7 +527,4 @@ select * from es_table where esquery(k4, ' { 4. 聚合操作是否可以下推 目前Doris On ES不支持聚合操作如sum, avg, min/max 等下推,计算方式是批量流式的从ES获取所有满足条件的文档,然后在Doris中进行计算 - -5. 日期类型字段的过滤条件无法下推 - - 日期类型的字段因为时间格式的问题,大多数情况下都不会下推;对于日期类型的过滤可以采用字符串形式,日期格式需要和ES中保持完全一致 \ No newline at end of file + \ No newline at end of file