-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](hive) do not split compress data file and support lz4/snappy block codec #23245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix](hive) do not split compress data file and support lz4/snappy block codec #23245
Conversation
Jibing-Li
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
2 similar comments
|
clang-tidy review says "All clean, LGTM! 👍" |
|
clang-tidy review says "All clean, LGTM! 👍" |
3cd8a91 to
f838049
Compare
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
Jibing-Li
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
kaka11chen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
12efedc to
a140a97
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
a140a97 to
e71ca52
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
e71ca52 to
d61ce86
Compare
|
run buildall |
d61ce86 to
d526979
Compare
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
run buildall |
|
clang-tidy review says "All clean, LGTM! 👍" |
|
(From new machine)TeamCity pipeline, clickbench performance test result: |
yiguolei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by at least one committer and no changes requested. |
…ock codec (apache#23245) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count(*)` query of csv file For query like `select count(*) from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: apache#22304
…ock codec (#23245) (#23526) * [Refactor](load) Extract load public code (#22304) * [fix](hive) do not split compress data file and support lz4/snappy block codec (#23245) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count(*)` query of csv file For query like `select count(*) from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: #22304 --------- Co-authored-by: zzzzzzzs <1443539042@qq.com>
…ock codec (apache#23245) 1. do not split compress data file Some data file in hive is compressed with gzip, deflate, etc. These kinds of file can not be splitted. 2. Support lz4 block codec for hive scan node, use lz4 block codec instead of lz4 frame codec 4. Support snappy block codec For hadoop snappy 5. Optimize the `count(*)` query of csv file For query like `select count(*) from tbl`, only need to split the line, no need to split the column. Need to pick to branch-2.0 after this PR: apache#22304
Proposed changes
do not split compress data file
Some data file in hive is compressed with gzip, deflate, etc.
These kinds of file can not be splitted.
Support lz4 block codec
for hive scan node, use lz4 block codec instead of lz4 frame codec
Support snappy block codec
For hadoop snappy
Optimize the
count(*)query of csv fileFor query like
select count(*) from tbl, only need to split the line, no need to split the column.Need to pick to branch-2.0 after this PR: #22304
Further comments
If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...