-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Topn lazy materialize poc rebase #51329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
HappenLee
merged 2 commits into
apache:master
from
hubgeter:topn-lazy-materialize-poc_rebase
May 29, 2025
Merged
Topn lazy materialize poc rebase #51329
HappenLee
merged 2 commits into
apache:master
from
hubgeter:topn-lazy-materialize-poc_rebase
May 29, 2025
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 33729 ms |
TPC-DS: Total hot run time: 192290 ms |
ClickBench: Total hot run time: 29.43 s |
englefly
approved these changes
May 29, 2025
Contributor
|
PR approved by at least one committer and no changes requested. |
Contributor
|
PR approved by anyone and no changes requested. |
morningman
approved these changes
May 29, 2025
16 tasks
morningman
pushed a commit
that referenced
this pull request
Jun 5, 2025
### What problem does this PR solve? Related PR: #51329 Problem Summary: The `LogicalHudiScan` class should overload the method `withOperativeSlots` and return the `LogicalHudiScan` type. Otherwise, the `LogicalFileScanToPhysicalFileScan` rule will be incorrectly applied when querying the hudi table, resulting in the generation of `PhysicalFileScan`. Because `plan` is `LogicalFileScan`, `plan -> !(plan instanceof LogicalHudiScan)` will incorrectly return true. ```java public class LogicalFileScanToPhysicalFileScan extends OneImplementationRuleFactory { @OverRide public Rule build() { return logicalFileScan().when(plan -> !(plan instanceof LogicalHudiScan)).then(fileScan -> new PhysicalFileScan( fileScan.getRelationId(), fileScan.getTable(), fileScan.getQualifier(), DistributionSpecAny.INSTANCE, Optional.empty(), fileScan.getLogicalProperties(), fileScan.getSelectedPartitions(), fileScan.getTableSample(), fileScan.getTableSnapshot(), fileScan.getOperativeSlots()) ).toRule(RuleType.LOGICAL_FILE_SCAN_TO_PHYSICAL_FILE_SCAN_RULE); } } ```
16 tasks
zy-kkk
pushed a commit
that referenced
this pull request
Jun 10, 2025
…51442) In the previous PR #51329, global lazy materialization was implemented. However, since RPC requests will be sent to other BE nodes during the materialization phase, the external node needs to rely on `query ctx` when reading the corresponding file after receiving the RPC request, but the `query ctx` on the BE may have been released, resulting in BE core. Solution: By caching some information in `query ctx`, RPC does not need to rely on `query ctx`.
zy-kkk
added a commit
that referenced
this pull request
Jun 10, 2025
based on #51329 Avoid type matching problems caused by using 0L for comparison
hubgeter
pushed a commit
to hubgeter/doris
that referenced
this pull request
Jun 16, 2025
### What problem does this PR solve? Related PR: apache#51329 Problem Summary: The `LogicalHudiScan` class should overload the method `withOperativeSlots` and return the `LogicalHudiScan` type. Otherwise, the `LogicalFileScanToPhysicalFileScan` rule will be incorrectly applied when querying the hudi table, resulting in the generation of `PhysicalFileScan`. Because `plan` is `LogicalFileScan`, `plan -> !(plan instanceof LogicalHudiScan)` will incorrectly return true. ```java public class LogicalFileScanToPhysicalFileScan extends OneImplementationRuleFactory { @OverRide public Rule build() { return logicalFileScan().when(plan -> !(plan instanceof LogicalHudiScan)).then(fileScan -> new PhysicalFileScan( fileScan.getRelationId(), fileScan.getTable(), fileScan.getQualifier(), DistributionSpecAny.INSTANCE, Optional.empty(), fileScan.getLogicalProperties(), fileScan.getSelectedPartitions(), fileScan.getTableSample(), fileScan.getTableSnapshot(), fileScan.getOperativeSlots()) ).toRule(RuleType.LOGICAL_FILE_SCAN_TO_PHYSICAL_FILE_SCAN_RULE); } } ``` (cherry picked from commit df06507)
16 tasks
morningman
pushed a commit
that referenced
this pull request
Jul 14, 2025
… external tables (#52114) ### What problem does this PR solve? Related PR: #51329 Problem Summary: Topn lazy materialize was introduced in pr#51329 , but the implementation had performance issues when reading external tables. This pr is used for optimization. 1. Before this, the materialization phase read one row of data from the file each time. This pr classifies according to scan_range and reads multiple rows of data from the file at one time. 2. Before this, the materialization phase was a single-threaded file reading phase. This pr creates a scan task and submits the task to the workload group to improve the reading speed. 3. Before this, the runtime profile was transmitted through thrift. This pr introduces the implementation of protobuf and adds the profile information of `RowIDFetcher` to `MATERIALIZATION_OPERATOR`. The example is as follows: 1FE 2BE sql :select * from ali_hive.tpch100_orc.lineitem order by l_partkey limit 10; ``` MATERIALIZATION_OPERATOR (id=3):(ExecTime: 2.645ms) - BlocksProduced: 5 - CloseTime: 0ns - ExecTime: 2.645ms - InitTime: 0ns - MemoryUsage: 0.00 - MemoryUsagePeak: 0.00 - OpenTime: 0ns - ProjectionTime: 528.913us - RowsProduced: 10 - WaitForDependency[MATERIALIZATION_COUNTER_DEPENDENCY]Time: 12sec874ms RowIDFetcher: BackendId:1750838859134: - FileReadBytes: {[2.89 MB, ], [9.51 MB, ], [6.81 MB, ], [4.74 MB, ], [22.33 MB, ], } - FileReadLines: {[1, ], [1, ], [1, ], [1, ], [1, ], } - FileReadTime: {[102.960ms,], [104.028ms,], [99.817ms,], [98.260ms,], [120.129ms,], } - GetBlockAvgTime: {14ms, 2ms, 2ms, 1ms, 3ms, } - InitReaderAvgTime: {14ms, 2ms, 2ms, 1ms, 3ms, } - ScannersRunningTime: {130ms, 124ms, 116ms, 113ms, 151ms, } RowIDFetcher: BackendId:1750936290862: - FileReadBytes: {[13.80 MB, ], [21.28 MB, ], [8.18 MB, ], [16.69 MB, ], [19.16 MB, ], } - FileReadLines: {[1, ], [1, ], [1, ], [1, ], [1, ], } - FileReadTime: {[113.031ms,], [132.087ms,], [105.361ms,], [117.245ms,], [125.535ms,], } - GetBlockAvgTime: {2ms, 2ms, 2ms, 1ms, 3ms, } - InitReaderAvgTime: {2ms, 2ms, 2ms, 1ms, 3ms, } - ScannersRunningTime: {144ms, 160ms, 127ms, 142ms, 159ms, } ```
16 tasks
wenzhenghu
pushed a commit
to wenzhenghu/doris
that referenced
this pull request
Sep 8, 2025
### What problem does this PR solve? Related PR: apache#51329 Problem Summary: PR apache#51329 introduces global lazy materialization for internal tables and Hive/Iceberg catalogs. This PR is to support this feature for TVF, specifically for reading Parquet ORC formats.
16 tasks
dwdwqfwe
pushed a commit
to dwdwqfwe/doris
that referenced
this pull request
Sep 22, 2025
…ation (apache#56137) ### What problem does this PR solve? Previous pr (topn lazy materialization, apache#51329 commit id a4b5008) introduces a bug in runtime filter target translation. the runtime filter target should be basased on PhysicalLazyMaterializeOlapScan, not the inner PhysicalOlapScan.
16 tasks
hubgeter
added a commit
to hubgeter/doris
that referenced
this pull request
Dec 17, 2025
…ache#58785) Related PR: apache#51329 Problem Summary: This PR primarily enables the Parquet reader to use page indexes when reading complex columns, and also fixes a data reading error in PR of topn.
16 tasks
yiguolei
pushed a commit
that referenced
this pull request
Dec 22, 2025
… result (#58785) (#59129) bp #58785 Related PR: #51329 Problem Summary: This PR primarily enables the Parquet reader to use page indexes when reading complex columns, and also fixes a data reading error in PR of topn. ### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
kaka11chen
pushed a commit
to kaka11chen/doris
that referenced
this pull request
Jan 7, 2026
…ache#58785) Related PR: apache#51329 Problem Summary: This PR primarily enables the Parquet reader to use page indexes when reading complex columns, and also fixes a data reading error in PR of topn.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)