Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Jun 20, 2025

Cherry-pick main PR: #45966, Fix bugs PR: #50185 #51102

@kaka11chen kaka11chen requested a review from morrySnow as a code owner June 20, 2025 12:24
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen kaka11chen marked this pull request as draft June 20, 2025 12:25
@kaka11chen kaka11chen force-pushed the cherry-pick-45966_3.1 branch 2 times, most recently from 2c2efee to 1881c74 Compare June 20, 2025 12:53
…apache#45966)

related: apache/doris-thirdparty#270

Problem Summary:

The original merge io mechanism `MergeRangeFileReader` requires that the
range must be read in order, and the ranges can be out of order, so the
range cannot be read back.
And if you turn on delayed materialization of orc complex types, you
will need to present a stream readback scenario, such as `select
struct_element(info, 'age'), id from test_orc_struct, where
struct_element(info, 'name') = 'Alice'`.
When late materialization is turned on, the present stream of the parent
node `info` will be read first after `name` is read. When reading `age`,
the parent node `info` needs to be read back. So the late
materialization of the orc complex type cannot be turned on at present.
@kaka11chen kaka11chen force-pushed the cherry-pick-45966_3.1 branch from 1881c74 to 4474baa Compare June 24, 2025 08:45
@kaka11chen
Copy link
Contributor Author

run buildall

… of orc-reader. (apache#51102)

### What problem does this PR solve?

Related PR: apache#45966

Fix merge range not sorted in new merge io facility of orc-reader.
Because the ranges taken from std::unordered_map<orc::StreamId, io::PrefetchRange>&ranges are not sorted, merging adjacent ranges will have a very poor effect.
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen marked this pull request as ready for review June 24, 2025 09:18
…filtered by row group stats, despite stripe stats remaining unfiltered. (apache#50185)

Related PR: apache/doris-thirdparty#306

Problem Summary:
When all row groups are filtered by row group stats, despite stripe
stats remaining unfiltered, stream map is not clear, which caused read
error data.

```
ERROR 1105 (HY000): errCode = 2, detailMessage = (172.20.32.136)[INTERNAL_ERROR]Orc row reader nextBatch failed. reason = Buffer error in ZlibDecompressionStream::NextDecompress.
```
@kaka11chen kaka11chen force-pushed the cherry-pick-45966_3.1 branch from 2d73404 to 2855cb7 Compare June 24, 2025 11:57
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 20.85% (44/211) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 41.39% (11020/26628)
Line Coverage 32.23% (94577/293477)
Region Coverage 31.30% (48733/155709)
Branch Coverage 27.96% (25137/89912)

@morrySnow morrySnow merged commit 2684cc0 into apache:branch-3.1 Jun 25, 2025
20 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants