Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions be/src/exprs/hll_function.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ void HllFunctions::hll_init(FunctionContext *, StringVal* dst) {
dst->len = sizeof(HyperLogLog);
dst->ptr = (uint8_t*)new HyperLogLog();
}
StringVal HllFunctions::empty_hll(FunctionContext* ctx) {
return AnyValUtil::from_string_temp(ctx, HyperLogLog::empty());
}

template <typename T>
void HllFunctions::hll_update(FunctionContext *, const T &src, StringVal* dst) {
Expand Down
1 change: 1 addition & 0 deletions be/src/exprs/hll_function.h
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ class HllFunctions {
public:
static void init();
static StringVal hll_hash(FunctionContext* ctx, const StringVal& dest_base);
static StringVal empty_hll(FunctionContext* ctx);
static void hll_init(FunctionContext*, StringVal* dst);

template <typename T>
Expand Down
8 changes: 8 additions & 0 deletions be/src/olap/hll.h
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,14 @@ class HyperLogLog {

int64_t estimate_cardinality();

static std::string empty() {
static HyperLogLog hll;
std::string buf;
buf.resize(HLL_EMPTY_SIZE);
hll.serialize((uint8_t*)buf.c_str());
return buf;
}

// only for debug
std::string to_string() {
switch (_type) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@

HLL_HASH(column_name)
生成HLL列类型,用于insert或导入的时候,导入的使用见相关说明

EMPTY_HLL()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

english doc

生成空HLL列,用于insert或导入的时候补充默认值,导入的使用见相关说明

## example
1. 首先创建一张含有hll列的表
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -286,8 +286,8 @@

7. 导入数据到含有HLL列的表,可以是表中的列或者数据里面的列

如果表中有三列分别是(id,v1,v2)。其中v1和v2列是hll列。导入的源文件有3列。则(column_list)中声明第一列为id,第二三列为一个临时命名的k1,k2。
在SET中必须给表中的hll列特殊声明 hll_hash。表中的v1列等于原始数据中的hll_hash(k1)列。
如果表中有三列分别是(id,v1,v2,v3)。其中v1和v2列是hll列。导入的源文件有3列。则(column_list)中声明第一列为id,第二三列为一个临时命名的k1,k2。
在SET中必须给表中的hll列特殊声明 hll_hash。表中的v1列等于原始数据中的hll_hash(k1)列, 表中的v3列在原始数据中并没有对应的值,使用empty_hll补充默认值
LOAD LABEL example_db.label7
(
DATA INFILE("hdfs://hdfs_host:hdfs_port/user/palo/data/input/file")
Expand All @@ -297,7 +297,8 @@
(id, k1, k2)
SET (
v1 = hll_hash(k1),
v2 = hll_hash(k2)
v2 = hll_hash(k2),
v3 = empty_hll()
)
)
WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@
6. 使用streaming方式导入(用户是defalut_cluster中的)
seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/_stream_load

7. 导入含有HLL列的表,可以是表中的列或者数据中的列用于生成HLL列
curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1)" -T testData http://host:port/api/testDb/testTbl/_stream_load
7. 导入含有HLL列的表,可以是表中的列或者数据中的列用于生成HLL列,也可使用empty_hll补充数据中没有的列
curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1), v2=empty_hll()" -T testData http://host:port/api/testDb/testTbl/_stream_load

8. 导入数据进行严格模式过滤,并设置时区为 Africa/Abidjan
curl --location-trusted -u root -H "strict_mode: true" -H "timezone: Africa/Abidjan" -T testData http://host:port/api/testDb/testTbl/_stream_load
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ This function is used to estimate the cardinality of a single HLL sequence
HLL_HASH(column_name)
Generate HLL column types for insert or import, see the instructions for the use of imports

EMPTY_HLL()
Generate empty HLL column types for insert or import, see the instructions for the use of imports

## example
1. First create a table with HLL columns
create table test(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -302,9 +302,9 @@

7. Load data into tables containing HLL columns, which can be columns in tables or columns in data

If there are three columns in the table (id, v1, v2). The V1 and V2 columns are HLL columns. The imported source file has three columns. Then (column_list) declares that the first column is id, and the second and third columns are temporarily named k1, k2.
If there are three columns in the table (id, v1, v2, v3). The V1 and V2 columns are HLL columns. The imported source file has three columns. Then (column_list) declares that the first column is id, and the second and third columns are temporarily named k1, k2.

In SET, the HLL column in the table must be specifically declared hll_hash. The V1 column in the table is equal to the hll_hash (k1) column in the original data.
In SET, the HLL column in the table must be specifically declared hll_hash. The V1 column in the table is equal to the hll_hash (k1) column in the original data.The v3 column in the table does not have a corresponding value in the original data, and empty_hll is used to supplement the default value.

LOAD LABEL example_db.label7
(
Expand All @@ -315,7 +315,8 @@
(id, k1, k2)
SET (
v1 = hll_hash(k1),
v2 = hll_hash(k2)
v2 = hll_hash(k2),
v3 = empty_hll()
)
)
WITH BROKER hdfs ("username"="hdfs_user", "password"="hdfs_password");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,9 @@ Where url is the url given by ErrorURL.

```Seq 1 10 | awk '{OFS="\t"}{print $1, $1 * 10}' | curl --location-trusted -u root -T - http://host:port/api/testDb/testTbl/_stream_load```

7. load a table with HLL columns, which can be columns in the table or columns in the data used to generate HLL columns
7. load a table with HLL columns, which can be columns in the table or columns in the data used to generate HLL columns,you can also use empty_hll to supplement columns that are not in the data

```Curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1)" -T testData http://host:port/api/testDb/testTbl/_stream_load```
```Curl --location-trusted -u root -H "columns: k1, k2, v1=hll_hash(k1), v2=empty_hll()" -T testData http://host:port/api/testDb/testTbl/_stream_load```

8. load data for strict mode filtering and set the time zone to Africa/Abidjan

Expand Down
4 changes: 2 additions & 2 deletions fe/src/main/java/org/apache/doris/planner/BrokerScanNode.java
Original file line number Diff line number Diff line change
Expand Up @@ -277,9 +277,9 @@ private void finalizeParams(ParamCreateContext context) throws UserException, An
+ destSlotDesc.getColumn().getName() + "=hll_hash(xxx)");
}
FunctionCallExpr fn = (FunctionCallExpr) expr;
if (!fn.getFnName().getFunction().equalsIgnoreCase("hll_hash")) {
if (!fn.getFnName().getFunction().equalsIgnoreCase("hll_hash") && !fn.getFnName().getFunction().equalsIgnoreCase("empty_hll")) {
throw new AnalysisException("HLL column must use hll_hash function, like "
+ destSlotDesc.getColumn().getName() + "=hll_hash(xxx)");
+ destSlotDesc.getColumn().getName() + "=hll_hash(xxx) or " + destSlotDesc.getColumn().getName() + "=empty_hll()");
}
expr.setType(Type.HLL);
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -179,16 +179,17 @@ private void finalizeParams() throws UserException {
}
}
}

// check hll_hash
if (dstSlotDesc.getType().getPrimitiveType() == PrimitiveType.HLL) {
if (!(expr instanceof FunctionCallExpr)) {
throw new AnalysisException("HLL column must use hll_hash function, like "
+ dstSlotDesc.getColumn().getName() + "=hll_hash(xxx)");
}
FunctionCallExpr fn = (FunctionCallExpr) expr;
if (!fn.getFnName().getFunction().equalsIgnoreCase("hll_hash")) {
if (!fn.getFnName().getFunction().equalsIgnoreCase("hll_hash") && !fn.getFnName().getFunction().equalsIgnoreCase("empty_hll")) {
throw new AnalysisException("HLL column must use hll_hash function, like "
+ dstSlotDesc.getColumn().getName() + "=hll_hash(xxx)");
+ dstSlotDesc.getColumn().getName() + "=hll_hash(xxx) or " + dstSlotDesc.getColumn().getName() + "=empty_hll()");
}
expr.setType(Type.HLL);
}
Expand Down
2 changes: 2 additions & 0 deletions gensrc/script/doris_builtins_functions.py
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,8 @@
'_ZN5doris12HllFunctions15hll_cardinalityEPN9doris_udf15FunctionContextERKNS1_9StringValE'],
[['hll_hash'], 'VARCHAR', ['VARCHAR'],
'_ZN5doris12HllFunctions8hll_hashEPN9doris_udf15FunctionContextERKNS1_9StringValE'],
[['empty_hll'], 'VARCHAR', [],
'_ZN5doris12HllFunctions9empty_hllEPN9doris_udf15FunctionContextE'],

#bitmap function

Expand Down