Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 61 additions & 3 deletions docs/en/extending-doris/doris-on-es.md
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,67 @@ select * from es_table where esquery(k4, ' {

4. After calculating the result, return it to client

## Best Practices

### Suggestions for using Date type fields

The use of Datetype fields in ES is very flexible, but in Doris On ES, if the type of the Date type field is not set properly, it will cause the filter condition can not be pushed down.

When creating an index, do maximum format compatibility with the setting of the Date type format:

```
"dt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
```

When creating this field in Doris, it is recommended to set it to `date` or `datetime`, and it can also be set to `varchar` type. The following SQL statements can be used to directly push the filter condition down to ES


```
select * from doe where k2 > '2020-06-21';

select * from doe where k2 < '2020-06-21 12:00:00';

select * from doe where k2 < 1593497011;

select * from doe where k2 < now();

select * from doe where k2 < date_format(now(), '%Y-%m-%d');
```

`Notice`:

* If you don’t set the format for the time type field In ES, the default format for Date-type field is

```
strict_date_optional_time||epoch_millis
```
* If the date field indexed into ES is unix timestamp, it needs to be converted to `ms`, and the internal timestamp of ES is processed according to `ms` unit, otherwise Doris On ES will display wrong column data

### Fetch ES metadata field `_id`

When indexing documents without specifying `_id`, ES will assign a globally unique `_id` field to each document. Users can also specify a `_id` with special represent some business meaning for the document when indexing; if needed, Doris On ES can get the value of this field by adding the `_id` field of type `varchar` when creating the ES external table

```
CREATE EXTERNAL TABLE `doe` (
`_id` varchar COMMENT "",
`city` varchar COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://127.0.0.1:8200",
"user" = "root",
"password" = "root",
"index" = "doe",
"type" = "doc"
}
```
`Notice`:

1. The filtering condition of the `_id` field only supports two types: `=` and `in`
2. The `_id` field can only be of type `varchar`

## Q&A

1. ES Version Requirements
Expand All @@ -469,6 +530,3 @@ select * from es_table where esquery(k4, ' {
4. Whether the aggregation operation can be pushed down

At present, Doris On ES does not support push-down operations such as sum, avg, min/max, etc., all documents satisfying the conditions are obtained from the ES in batch flow, and then calculated in Doris

5. Filters for date type fields cannot be pushed down
Due to the time format problem, the date type field will not be pushed down in most cases; the date type filtering can be in the form of a string, and the date format needs to be completely consistent with ES
66 changes: 62 additions & 4 deletions docs/zh-CN/extending-doris/doris-on-es.md
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,67 @@ select * from es_table where esquery(k4, ' {

4. Doris计算完结果后,返回给用户

## 最佳实践

### 时间类型字段使用建议

在ES中,时间类型的字段使用十分灵活,但是在Doris On ES中如果对时间类型字段的类型设置不当,则会造成过滤条件无法下推

创建索引时对时间类型格式的设置做最大程度的格式兼容:

```
"dt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
}
```

在Doris中建立该字段时建议设置为`date`或`datetime`,也可以设置为`varchar`类型, 使用如下SQL语句都可以直接将过滤条件下推至ES:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否设置成char?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不建议用char,因为用户如果想那_id往往都是自己设置的,自己设置的ID长度大部分情况下可能并不一致


```
select * from doe where k2 > '2020-06-21';

select * from doe where k2 < '2020-06-21 12:00:00';

select * from doe where k2 < 1593497011;

select * from doe where k2 < now();

select * from doe where k2 < date_format(now(), '%Y-%m-%d');
```

注意:

* 在ES中如果不对时间类型的字段设置`format`, 默认的时间类型字段格式为

```
strict_date_optional_time||epoch_millis
```

* 导入到ES的日期字段如果是时间戳需要转换成`ms`, ES内部处理时间戳都是按照`ms`进行处理的, 否则Doris On ES会出现显示错误

### 获取ES元数据字段`_id`

导入文档在不指定`_id`的情况下ES会给每个文档分配一个全局唯一的`_id`即主键, 用户也可以在导入时为文档指定一个含有特殊业务意义的`_id`; 如果需要在Doris On ES中获取该字段值,建表时可以增加类型为`varchar`的`_id`字段:

```
CREATE EXTERNAL TABLE `doe` (
`_id` varchar COMMENT "",
`city` varchar COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://127.0.0.1:8200",
"user" = "root",
"password" = "root",
"index" = "doe",
"type" = "doc"
}
```

注意:

1. `_id`字段的过滤条件仅支持`=`和`in`两种
2. `_id`字段只能是`varchar`类型

## Q&A

Expand All @@ -466,7 +527,4 @@ select * from es_table where esquery(k4, ' {
4. 聚合操作是否可以下推

目前Doris On ES不支持聚合操作如sum, avg, min/max 等下推,计算方式是批量流式的从ES获取所有满足条件的文档,然后在Doris中进行计算

5. 日期类型字段的过滤条件无法下推

日期类型的字段因为时间格式的问题,大多数情况下都不会下推;对于日期类型的过滤可以采用字符串形式,日期格式需要和ES中保持完全一致