Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Sep 14, 2024

Proposed changes

Hadoop snappycodec source :
https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/SnappyCodec.cc
Example:
OriginData(The original data will be divided into several large data block.) :
large data block1 | large data block2 | large data block3 | ....
The large data block will be divided into several small data block.
Suppose a large data block is divided into three small blocks:
large data block1: | small block1 | small block2 | small block3 |
CompressData: <A [B1 compress(small block1) ] [B2 compress(small block1) ] [B3 compress(small block1)]>

A : original length of the current block of large data block.
sizeof(A) = 4 bytes.
A = length(small block1) + length(small block2) + length(small block3)
Bx : length of small data block bx.
sizeof(Bx) = 4 bytes.
Bx = length(compress(small blockx))

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@suxiaogang223 suxiaogang223 changed the title []fix snappy decompressor bug [branch-2.1](fix) fix snappy decompressor bug Sep 14, 2024
@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.14% (9319/25784)
Line Coverage: 27.70% (76549/276325)
Region Coverage: 26.48% (39267/148312)
Branch Coverage: 23.30% (20024/85944)
Coverage Report: http://coverage.selectdb-in.cc/coverage/530158e2f18d897c8852078547f4c1f70c37840a_530158e2f18d897c8852078547f4c1f70c37840a/report/index.html

@morningman
Copy link
Contributor

please add test case

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify BE UT to test the modification

@suxiaogang223
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.16% (9328/25794)
Line Coverage: 27.73% (76656/276453)
Region Coverage: 26.51% (39363/148483)
Branch Coverage: 23.31% (20047/86000)
Coverage Report: http://coverage.selectdb-in.cc/coverage/98087effcf0fb41f8bf108022a15e2de3a196839_98087effcf0fb41f8bf108022a15e2de3a196839/report/index.html

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 20, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit e0fac66 into apache:branch-2.1 Sep 20, 2024
@suxiaogang223 suxiaogang223 deleted the fix_decompressor branch September 26, 2024 17:12
@yiguolei yiguolei mentioned this pull request Nov 6, 2024
hubgeter pushed a commit to hubgeter/doris that referenced this pull request Jan 15, 2025
)

## Proposed changes
Hadoop snappycodec source :

https://github.com/apache/hadoop/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/src/codec/SnappyCodec.cc
Example:
OriginData(The original data will be divided into several large data
block.) :
     large data block1 | large data block2 | large data block3 | ....
The large data block will be divided into several small data block.
Suppose a large data block is divided into three small blocks:
large data block1: | small block1 | small block2 | small block3 |
CompressData: <A [B1 compress(small block1) ] [B2 compress(small block1)
] [B3 compress(small block1)]>

A : original length of the current block of large data block.
sizeof(A) = 4 bytes.
A = length(small block1) + length(small block2) + length(small block3)
Bx : length of  small data block bx.
sizeof(Bx) = 4 bytes.
Bx = length(compress(small blockx))

Co-authored-by: Socrates <suxiaogang223@icloud.com>
morningman pushed a commit that referenced this pull request Jan 16, 2025
…n` (#46982)

related pr: #40862

Doris `branch-2.1` modified this regression case without modifying the
`master` branch. So this pr fixes the regression case
`test_local_tvf_compression`
github-actions bot pushed a commit that referenced this pull request Jan 16, 2025
…n` (#46982)

related pr: #40862

Doris `branch-2.1` modified this regression case without modifying the
`master` branch. So this pr fixes the regression case
`test_local_tvf_compression`
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…n` (apache#46982)

related pr: apache#40862

Doris `branch-2.1` modified this regression case without modifying the
`master` branch. So this pr fixes the regression case
`test_local_tvf_compression`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants