From fd536101f8ed0779eccdfd1b8df4578a165cb4dc Mon Sep 17 00:00:00 2001 From: Wei-Ting Kuo Date: Wed, 17 Aug 2022 15:17:28 +0800 Subject: [PATCH 1/3] add a readme for clickbench intro --- clickbench/README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 clickbench/README.md diff --git a/clickbench/README.md b/clickbench/README.md new file mode 100644 index 0000000000000..e69de29bb2d1d From 628905485b0ae3bc9e4c5d3e90179099406b731e Mon Sep 17 00:00:00 2001 From: Wei-Ting Kuo Date: Wed, 17 Aug 2022 15:22:20 +0800 Subject: [PATCH 2/3] update readme --- clickbench/README.md | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/clickbench/README.md b/clickbench/README.md index e69de29bb2d1d..ea7f7c93dc7b4 100644 --- a/clickbench/README.md +++ b/clickbench/README.md @@ -0,0 +1,33 @@ +# Clickbench Benchmark + +## Introduction + + is a new benchmark set provided by ClickHouse. The current result for DataFusion is from Azure's `v16s_v2` VM + + +## The way to reproduce + +1. `git clone git@github.com:ClickHouse/ClickBench.git` +2. cd ClickBench +3. `bash run.sh` + +Note that this will output the needed result like the `result` field in . We need to manually provide `system`, `date`, ... for now, and save a new json into `results` folder. After merged by ClickBench and new html files regenerated, it'll be shown in . + +There're 43 queries to compute, each will be executed 3 times. each rows contains the execution time for these 3 quries. The overall output will be a 43 by 3 matrix. + +## To generate human readable results + +your can do + +```bash +bash run2.sh +``` + +Each query will be only exeute only once, for each query it'll print the SQL expression first then output the result. (Note that this index begins with 1, the ClickBench begins with 0) + +## Known Issues + + +1. importing parquet by `datafusion-cli` make column name case-sensitive (i.e. i need to query thoses column by double quoted string). The original `queries.sql`'s column name contains no double quotes, I manually added the double quotes for all the column name for now + +2. since our parquet importer doesn't support schema, I use some functions to convert the data type in queries (i.e. "EventDate" becomes "EventDate::INT::DATE"). Note that other plaform support this, so it could be added while creating table i.e. \ No newline at end of file From 7dae4c61710566dc900d47d11b31159b4f16e080 Mon Sep 17 00:00:00 2001 From: Wei-Ting Kuo Date: Wed, 17 Aug 2022 15:57:25 +0800 Subject: [PATCH 3/3] add license --- clickbench/README.md | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/clickbench/README.md b/clickbench/README.md index ea7f7c93dc7b4..786652920a52f 100644 --- a/clickbench/README.md +++ b/clickbench/README.md @@ -1,3 +1,22 @@ + + # Clickbench Benchmark ## Introduction