Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/en/administrator-guide/load-data/broker-load-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,10 @@ The following is a detailed explanation of some parameters of the data descripti

The where statement in ```data_desc``` is responsible for filtering the data that has been transformed. The unselected rows which is filtered by where predicate will not be calculated in ```max_filter_ratio``` . If there are more then one where predicate of the same table , the multi where predicate will be merged from different ```data_desc``` and the policy is AND.

+ merge\_type
The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value, which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which means that the data that meets the delete condition is processed according to DELETE semantics and the rest is processed according to APPEND semantics


#### Import job parameters

Import job parameters mainly refer to the parameters in Broker load creating import statement that belong to ``opt_properties``. Import operation parameters act on the whole import operation.
Expand Down
4 changes: 4 additions & 0 deletions docs/en/administrator-guide/load-data/routine-load-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -167,6 +167,10 @@ The detailed syntax for creating a routine load task can be connected to Doris a

3. For a column type loaded with a range limit, if the original data can pass the type conversion normally, but cannot pass the range limit, strict mode will not affect it. For example, if the type is decimal(1,0) and the original data is 10, it is eligible for type conversion but not for column declarations. This data strict has no effect on it.

* merge\_type
The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value, which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which means that the data that meets the delete condition is processed according to DELETE semantics and the rest is processed according to APPEND semantics


#### strict mode and load relationship of source data

Here is an example of a column type of TinyInt.
Expand Down
4 changes: 4 additions & 0 deletions docs/en/administrator-guide/load-data/stream-load-manual.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,6 +143,10 @@ The number of rows in the original file = `dpp.abnorm.ALL + dpp.norm.ALL`

Memory limit. Default is 2GB. Unit is Bytes

+ merge\_type
The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value, which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which means that the data that meets the delete condition is processed according to DELETE semantics and the rest is processed according to APPEND semantics


### Return results

Since Stream load is a synchronous import method, the result of the import is directly returned to the user by creating the return value of the import.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,13 @@ under the License.
PROPERTIES ("key"="value")
note:
Can also be merged into the above schema change operation to modify, see the example below

7. Enable batch delete support
grammar:
ENABLE FEATURE "BATCH_DELETE"
note:
Only support unique tables



Rename supports modification of the following names:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ under the License.

To describe the data source.
syntax:
[MERGE|APPEND|DELETE]
DATA INFILE
(
"file_path1"[, file_path2, ...]
Expand All @@ -68,7 +69,8 @@ under the License.
[FORMAT AS "file_type"]
[(column_list)]
[SET (k1 = func(k2))]
[WHERE predicate]
[WHERE predicate]
[DELETE ON label=true]

Explain:
file_path:
Expand Down Expand Up @@ -116,6 +118,14 @@ under the License.
WHERE:

After filtering the transformed data, data that meets where predicates can be loaded. Only column names in tables can be referenced in WHERE statements.

merge_type:

The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value, which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which means that the data that meets the delete on condition is processed according to DELETE semantics and the rest is processed according to APPEND semantics

delete_on_predicates:

Only used when merge type is MERGE

3. broker_name

Expand Down Expand Up @@ -190,7 +200,7 @@ under the License.

## example

1. Load a batch of data from HDFS, specify timeout and filtering ratio. Use the broker with the inscription my_hdfs_broker. Simple authentication.
1. Load a batch of data from HDFS, specify timeout and filtering ratio. Use the broker with the plaintext ugi my_hdfs_broker. Simple authentication.

LOAD LABEL example_db.label1
(
Expand Down Expand Up @@ -422,6 +432,27 @@ under the License.
SET (data_time=str_to_date(data_time, '%Y-%m-%d %H%%3A%i%%3A%s'))
)
WITH BROKER "hdfs" ("username"="user", "password"="pass");

13. Load a batch of data from HDFS, specify timeout and filtering ratio. Use the broker with the plaintext ugi my_hdfs_broker. Simple authentication. delete the data when v2 >100, other append

LOAD LABEL example_db.label1
(
MERGE DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file")
INTO TABLE `my_table`
COLUMNS TERMINATED BY "\t"
(k1, k2, k3, v2, v1)
)
DELETE ON v2 >100
WITH BROKER my_hdfs_broker
(
"username" = "hdfs_user",
"password" = "hdfs_passwd"
)
PROPERTIES
(
"timeout" = "3600",
"max_filter_ratio" = "0.1"
);

## keyword

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,11 @@ FROM data_source
Used to describe the load data. grammar:

```
[merge_type],
[column_separator],
[columns_mapping],
[where_predicates],
[delete_on_predicates]
[partitions]
```

Expand Down Expand Up @@ -106,6 +108,14 @@ FROM data_source

`PARTITION(p1, p2, p3)`

5. merge_type:

The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value, which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which means that the data that meets the delete on condition is processed according to DELETE semantics and the rest is processed according to APPEND semantics

6. delete_on_predicates:

Only used when merge type is MERGE

4. job_properties

A generic parameter that specifies a routine load job.
Expand Down Expand Up @@ -494,6 +504,29 @@ FROM data_source
{"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387}
]
}

7. Create a Kafka routine load task named test1 for the example_tbl of example_db. delete all data key colunms match v3 >100 key columns.

CREATE ROUTINE LOAD example_db.test1 ON example_tbl
WITH MERGE
COLUMNS(k1, k2, k3, v1, v2, v3),
WHERE k1 > 100 and k2 like "%doris%"
DELETE ON v3 >100
PROPERTIES
(
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
)
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
## keyword

CREATE, ROUTINE, LOAD
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,10 @@ Boolean type, true to indicate that json data starts with an array object and fl
`json_root`
json_root is a valid JSONPATH string that specifies the root node of the JSON Document. The default value is "".

`merge_type`

The type of data merging supports three types: APPEND, DELETE, and MERGE. APPEND is the default value, which means that all this batch of data needs to be appended to the existing data. DELETE means to delete all rows with the same key as this batch of data. MERGE semantics Need to be used in conjunction with the delete condition, which means that the data that meets the delete condition is processed according to DELETE semantics and the rest is processed according to APPEND semantics

RETURN VALUES

After the load is completed, the related content of this load will be returned in Json format. Current field included
Expand Down Expand Up @@ -240,6 +244,11 @@ Where url is the url given by ErrorURL.
Matched imports are made by specifying jsonpath parameter, such as `category`, `author`, and `price`, for example:
curl --location-trusted -u root -H "columns: category, price, author" -H "label:123" -H "format: json" -H "jsonpaths: [\"$.category\",\"$.price\",\"$.author\"]" -H "strip_outer_array: true" -H "json_root: $.RECORDS" -T testData http://host:port/api/testDb/testTbl/_stream_load

13. delete all data which key columns match the load data
curl --location-trusted -u root -H "merge_type: DELETE" -T testData http://host:port/api/testDb/testTbl/_stream_load
14. delete all data which key columns match the load data where flag is true, others append
curl --location-trusted -u root: -H "column_separator:," -H "columns: siteid, citycode, username, pv, flag" -H "merge_type: MERGE" -H "delete: flag=1" -T testData http://host:port/api/testDb/testTbl/_stream_load

## keyword

STREAM, LOAD
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,8 @@ Label 的另一个作用,是防止用户重复导入相同的数据。**强烈
2. 对于导入的某列由函数变换生成时,strict mode 对其不产生影响。

3. 对于导入的某列类型包含范围限制的,如果原始数据能正常通过类型转换,但无法通过范围限制的,strict mode 对其也不产生影响。例如:如果类型是 decimal(1,0), 原始数据为 10,则属于可以通过类型转换但不在列声明的范围内。这种数据 strict 对其不产生影响。
+ merge\_type
数据的合并类型,一共支持三种类型APPEND、DELETE、MERGE 其中,APPEND是默认值,表示这批数据全部需要追加到现有数据中,DELETE 表示删除与这批数据key相同的所有行,MERGE 语义 需要与delete 条件联合使用,表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理

#### strict mode 与 source data 的导入关系

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ FE 中的 JobScheduler 根据汇报结果,继续生成后续新的 Task,或
2. 对于导入的某列由函数变换生成时,strict mode 对其不产生影响。

3. 对于导入的某列类型包含范围限制的,如果原始数据能正常通过类型转换,但无法通过范围限制的,strict mode 对其也不产生影响。例如:如果类型是 decimal(1,0), 原始数据为 10,则属于可以通过类型转换但不在列声明的范围内。这种数据 strict 对其不产生影响。
* merge\_type
数据的合并类型,一共支持三种类型APPEND、DELETE、MERGE 其中,APPEND是默认值,表示这批数据全部需要追加到现有数据中,DELETE 表示删除与这批数据key相同的所有行,MERGE 语义 需要与delete 条件联合使用,表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理

#### strict mode 与 source data 的导入关系

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ Stream load 由于使用的是 HTTP 协议,所以所有导入任务有关的
2. 对于导入的某列由函数变换生成时,strict mode 对其不产生影响。

3. 对于导入的某列类型包含范围限制的,如果原始数据能正常通过类型转换,但无法通过范围限制的,strict mode 对其也不产生影响。例如:如果类型是 decimal(1,0), 原始数据为 10,则属于可以通过类型转换但不在列声明的范围内。这种数据 strict 对其不产生影响。
+ merge\_type
数据的合并类型,一共支持三种类型APPEND、DELETE、MERGE 其中,APPEND是默认值,表示这批数据全部需要追加到现有数据中,DELETE 表示删除与这批数据key相同的所有行,MERGE 语义 需要与delete 条件联合使用,表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理

#### strict mode 与 source data 的导入关系

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,13 @@ under the License.
注意:
1) index 中的所有列都要写出来
2) value 列在 key 列之后

6. 启用批量删除支持
语法:
ENABLE FEATURE "BATCH_DELETE"
注意:
1) 只能用在unique 表
2) 用于旧表支持批量删除功能,新表创建时已经支持

6. 修改table的属性,目前支持修改bloom filter列, colocate_with 属性和dynamic_partition属性,replication_num和default.replication_num属性
语法:
Expand Down Expand Up @@ -343,6 +350,8 @@ under the License.
15. 修改表的 in_memory 属性

ALTER TABLE example_db.my_table set ("in_memory" = "true");
16. 启用 批量删除功能
ALTER TABLE example_db.my_table ENABLE FEATURE "BATCH_DELETE"


[rename]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ under the License.

用于描述一批导入数据。
语法:
[MERGE|APPEND|DELETE]
DATA INFILE
(
"file_path1"[, file_path2, ...]
Expand All @@ -68,7 +69,8 @@ under the License.
[FORMAT AS "file_type"]
[(column_list)]
[SET (k1 = func(k2))]
[WHERE predicate]
[WHERE predicate]
[DELETE ON label=true]

说明:
file_path:
Expand Down Expand Up @@ -111,6 +113,14 @@ under the License.
WHERE:

对做完 transform 的数据进行过滤,符合 where 条件的数据才能被导入。WHERE 语句中只可引用表中列名。

merge_type:

数据的合并类型,一共支持三种类型APPEND、DELETE、MERGE 其中,APPEND是默认值,表示这批数据全部需要追加到现有数据中,DELETE 表示删除与这批数据key相同的所有行,MERGE 语义 需要与delete on条件联合使用,表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理,

delete_on_predicates:

表示删除条件,仅在 merge type 为MERGE 时有意义,语法与where 相同
3. broker_name

所使用的 broker 名称,可以通过 show broker 命令查看。
Expand Down Expand Up @@ -184,7 +194,7 @@ under the License.

## example

1. 从 HDFS 导入一批数据,指定超时时间和过滤比例。使用铭文 my_hdfs_broker 的 broker。简单认证。
1. 从 HDFS 导入一批数据,指定超时时间和过滤比例。使用明文 my_hdfs_broker 的 broker。简单认证。

LOAD LABEL example_db.label1
(
Expand Down Expand Up @@ -429,7 +439,28 @@ under the License.
SET (data_time=str_to_date(data_time, '%Y-%m-%d %H%%3A%i%%3A%s'))
)
WITH BROKER "hdfs" ("username"="user", "password"="pass");


13. 从 HDFS 导入一批数据,指定超时时间和过滤比例。使用明文 my_hdfs_broker 的 broker。简单认证。并且将原有数据中与 导入数据中v2 大于100 的列相匹配的列删除,其他列正常导入

LOAD LABEL example_db.label1
(
MERGE DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file")
INTO TABLE `my_table`
COLUMNS TERMINATED BY "\t"
(k1, k2, k3, v2, v1)
)
DELETE ON v2 >100
WITH BROKER my_hdfs_broker
(
"username" = "hdfs_user",
"password" = "hdfs_passwd"
)
PROPERTIES
(
"timeout" = "3600",
"max_filter_ratio" = "0.1"
);

## keyword

BROKER,LOAD
Original file line number Diff line number Diff line change
Expand Up @@ -49,10 +49,11 @@ under the License.
3. load_properties

用于描述导入数据。语法:

[merge_type],
[column_separator],
[columns_mapping],
[where_predicates],
[delete_on_predicates],
[partitions]

1. column_separator:
Expand Down Expand Up @@ -97,6 +98,10 @@ under the License.
示例:

PARTITION(p1, p2, p3)
5. merge_type
数据的合并类型,一共支持三种类型APPEND、DELETE、MERGE 其中,APPEND是默认值,表示这批数据全部需要追加到现有数据中,DELETE 表示删除与这批数据key相同的所有行,MERGE 语义 需要与delete on条件联合使用,表示满足delete 条件的数据按照DELETE 语义处理其余的按照APPEND 语义处理, 语法为[WITH MERGE|APPEND|DELETE]
6. delete_on_predicates
表示删除条件,仅在 merge type 为MERGE 时有意义,语法与where 相同

4. job_properties

Expand Down Expand Up @@ -432,6 +437,28 @@ under the License.
{"category":"33","author":"3avc","title":"SayingsoftheCentury","timestamp":1589191387}
]
}
7. 为 example_db 的 example_tbl 创建一个名为 test1 的 Kafka 例行导入任务。并且删除与v3 >100 行相匹配的key列的行

CREATE ROUTINE LOAD example_db.test1 ON example_tbl
WITH MERGE
COLUMNS(k1, k2, k3, v1, v2, v3),
WHERE k1 > 100 and k2 like "%doris%"
DELETE ON v3 >100
PROPERTIES
(
"desired_concurrent_number"="3",
"max_batch_interval" = "20",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200",
"strict_mode" = "false"
)
FROM KAFKA
(
"kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
"kafka_topic" = "my_topic",
"kafka_partitions" = "0,1,2,3",
"kafka_offsets" = "101,0,0,200"
);
## keyword

CREATE,ROUTINE,LOAD
Expand Down
Loading