-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[enhance](parquet) Support BYTE_STREAM_SPLIT encoding for parquet reader #41683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[enhance](parquet) Support BYTE_STREAM_SPLIT encoding for parquet reader #41683
Conversation
|
Thank you for your contribution to Apache Doris. Since 2024-03-18, the Document has been moved to doris-website. |
|
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
|
clang-tidy review says "All clean, LGTM! 👍" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
|
run buildall |
TPC-H: Total hot run time: 40659 ms |
TPC-DS: Total hot run time: 191612 ms |
ClickBench: Total hot run time: 32.87 s |
|
TeamCity be ut coverage result: |
|
run buildall |
TPC-H: Total hot run time: 40984 ms |
|
TeamCity be ut coverage result: |
TPC-DS: Total hot run time: 191651 ms |
ClickBench: Total hot run time: 32.18 s |
|
run buildall |
TPC-H: Total hot run time: 40815 ms |
TPC-DS: Total hot run time: 191118 ms |
ClickBench: Total hot run time: 32.81 s |
|
TeamCity be ut coverage result: |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
…der (apache#41683) ## Proposed changes Impl ByteStreamSplitDecoder to decode BYTE_STREAM_SPLIT encoding parquet. relate pr: apache/arrow#42372 > Apache Parquet does not have any encodings suitable for FP data and the available text compressors (zstd, gzip, etc) do not handle FP data very well. It is possible to apply a simple data transformation named "stream splitting". Such could be "byte stream splitting" which creates K streams of length N where K is the number of bytes in the data type (4 for floats, 8 for doubles) and N is the number of elements in the sequence. --------- Co-authored-by: morningman <morningman@163.com>
…der (apache#41683) ## Proposed changes Impl ByteStreamSplitDecoder to decode BYTE_STREAM_SPLIT encoding parquet. relate pr: apache/arrow#42372 > Apache Parquet does not have any encodings suitable for FP data and the available text compressors (zstd, gzip, etc) do not handle FP data very well. It is possible to apply a simple data transformation named "stream splitting". Such could be "byte stream splitting" which creates K streams of length N where K is the number of bytes in the data type (4 for floats, 8 for doubles) and N is the number of elements in the sequence. --------- Co-authored-by: morningman <morningman@163.com>
…der (apache#41683) ## Proposed changes Impl ByteStreamSplitDecoder to decode BYTE_STREAM_SPLIT encoding parquet. relate pr: apache/arrow#42372 > Apache Parquet does not have any encodings suitable for FP data and the available text compressors (zstd, gzip, etc) do not handle FP data very well. It is possible to apply a simple data transformation named "stream splitting". Such could be "byte stream splitting" which creates K streams of length N where K is the number of bytes in the data type (4 for floats, 8 for doubles) and N is the number of elements in the sequence. --------- Co-authored-by: morningman <morningman@163.com>
Proposed changes
Impl ByteStreamSplitDecoder to decode BYTE_STREAM_SPLIT encoding parquet.
relate pr: apache/arrow#42372