-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377) #37530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377) #37530
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
cffae82 to
6a4ae32
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
6a4ae32 to
a5a6ffe
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
a5a6ffe to
0787c02
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
… without repeated deserialization. (apache#37377) ## Proposed changes Since the value of the partition column is fixed when querying the partition table, we can deserialize the value only once and then repeatedly insert the value into the block. ```sql in Hive: CREATE TABLE parquet_partition_tb ( col1 STRING, col2 INT, col3 DOUBLE ) PARTITIONED BY ( partition_col1 STRING, partition_col2 INT ) STORED AS PARQUET; insert into parquet_partition_tb partition (partition_col1="hello",partition_col2=1) values("word",2,2.3); insert into parquet_partition_tb partition(partition_col1="hello",partition_col2=1 ) select col1,col2,col3 from parquet_partition_tb where partition_col1="hello" and partition_col2=1; Repeat the `insert into xxx select xxx`operation several times. Doris : before: mysql> select count(partition_col1) from parquet_partition_tb; +-----------------------+ | count(partition_col1) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (3.24 sec) mysql> select count(partition_col2) from parquet_partition_tb; +-----------------------+ | count(partition_col2) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (3.34 sec) after: mysql> select count(partition_col1) from parquet_partition_tb ; +-----------------------+ | count(partition_col1) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (0.79 sec) mysql> select count(partition_col2) from parquet_partition_tb; +-----------------------+ | count(partition_col2) | +-----------------------+ | 33554432 | +-----------------------+ 1 row in set (0.51 sec) ``` ## Summary: test sql `select count(partition_col) from tbl;` Number of lines : 33554432 | |before | after| |---|---|--| |boolean | 3.96|0.47 | |tinyint | 3.39|0.47 | |smallint | 3.14|0.50 | |int |3.34|0.51 | |bigint | 3.61|0.51 | |float | 4.59 |0.51 | |double |4.60| 0.55 | |decimal(5,2)| 3.96 |0.61 | |date | 5.80|0.52 | |timestamp | 7.68 | 0.52 | |string | 3.24 |0.79 | Issue Number: close #xxx <!--Describe your changes.-->
0787c02 to
b15f6df
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
TeamCity be ut coverage result: |
bp #37377
Proposed changes
Since the value of the partition column is fixed when querying the partition table, we can deserialize the value only once and then repeatedly insert the value into the block.
Summary:
test sql
select count(partition_col) from tbl;Number of lines : 33554432
Issue Number: close #xxx
Proposed changes
Issue Number: close #xxx