Skip to content

Conversation

@hubgeter
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented May 28, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.25% (1113/1337)
Line Coverage 66.14% (18666/28224)
Region Coverage 65.75% (9257/14079)
Branch Coverage 55.55% (4981/8966)

@doris-robot
Copy link

TPC-H: Total hot run time: 33729 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 74f8d2eccf85c989621e345807dbb511a0661e15, data reload: false

------ Round 1 ----------------------------------
q1	26171	5015	5049	5015
q2	1934	275	182	182
q3	10315	1248	675	675
q4	10221	1013	527	527
q5	7564	2382	2314	2314
q6	179	159	134	134
q7	903	743	595	595
q8	9325	1260	1084	1084
q9	6833	5136	5052	5052
q10	6829	2331	1905	1905
q11	496	286	279	279
q12	349	348	220	220
q13	17770	3679	3076	3076
q14	221	236	216	216
q15	557	490	486	486
q16	427	432	373	373
q17	594	862	360	360
q18	7754	7315	7143	7143
q19	1207	953	575	575
q20	343	358	216	216
q21	3691	3180	2352	2352
q22	1044	972	950	950
Total cold run time: 114727 ms
Total hot run time: 33729 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5121	5071	5074	5071
q2	235	320	227	227
q3	2163	2632	2306	2306
q4	1367	1784	1349	1349
q5	4424	4351	4382	4351
q6	215	172	128	128
q7	2001	1910	1785	1785
q8	2567	2534	2560	2534
q9	7193	7250	7020	7020
q10	3043	3192	2763	2763
q11	575	523	499	499
q12	675	776	600	600
q13	3530	3915	3264	3264
q14	282	300	280	280
q15	534	490	482	482
q16	464	499	478	478
q17	1158	1538	1404	1404
q18	7725	7450	7382	7382
q19	808	820	834	820
q20	2038	2074	1899	1899
q21	4746	4431	4379	4379
q22	1055	1071	1003	1003
Total cold run time: 51919 ms
Total hot run time: 50024 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192290 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 74f8d2eccf85c989621e345807dbb511a0661e15, data reload: false

query1	1415	1079	1087	1079
query2	6209	1786	1818	1786
query3	11003	4591	4496	4496
query4	53004	24734	22981	22981
query5	5120	572	455	455
query6	349	214	204	204
query7	4873	532	282	282
query8	284	236	220	220
query9	5542	2623	2650	2623
query10	457	336	267	267
query11	15036	15026	14859	14859
query12	160	106	108	106
query13	1069	532	406	406
query14	10224	6575	6587	6575
query15	225	210	191	191
query16	7098	653	460	460
query17	1079	766	599	599
query18	1560	391	305	305
query19	202	190	176	176
query20	127	127	140	127
query21	211	127	106	106
query22	4198	4365	4318	4318
query23	34161	33532	33647	33532
query24	6633	2411	2412	2411
query25	451	454	403	403
query26	716	271	153	153
query27	3298	518	355	355
query28	3808	2150	2150	2150
query29	590	590	437	437
query30	274	221	191	191
query31	871	866	820	820
query32	78	66	61	61
query33	477	375	306	306
query34	804	867	547	547
query35	804	851	745	745
query36	958	991	874	874
query37	124	105	80	80
query38	4175	4304	4224	4224
query39	1521	1493	1474	1474
query40	205	119	106	106
query41	62	58	56	56
query42	129	112	109	109
query43	524	506	473	473
query44	1393	860	866	860
query45	182	175	169	169
query46	857	1036	647	647
query47	1809	1840	1812	1812
query48	404	461	324	324
query49	630	470	391	391
query50	683	693	417	417
query51	4293	4235	4259	4235
query52	115	115	105	105
query53	238	279	190	190
query54	588	607	543	543
query55	88	89	82	82
query56	305	303	290	290
query57	1085	1115	1063	1063
query58	257	253	274	253
query59	2554	2597	2562	2562
query60	330	316	309	309
query61	125	139	123	123
query62	699	699	674	674
query63	226	196	194	194
query64	1685	1008	730	730
query65	4236	4197	4150	4150
query66	712	427	320	320
query67	15984	15502	15689	15502
query68	6055	900	527	527
query69	544	314	277	277
query70	1227	1111	1051	1051
query71	434	317	311	311
query72	5956	4883	5076	4883
query73	1316	720	362	362
query74	8995	9131	8725	8725
query75	3194	3177	2692	2692
query76	3854	1180	744	744
query77	541	380	302	302
query78	10220	10005	9291	9291
query79	2780	773	576	576
query80	697	506	454	454
query81	496	258	220	220
query82	616	125	95	95
query83	258	262	228	228
query84	293	108	88	88
query85	786	412	335	335
query86	372	289	294	289
query87	4394	4425	4291	4291
query88	3564	2302	2272	2272
query89	403	338	288	288
query90	1643	216	217	216
query91	137	148	116	116
query92	70	64	60	60
query93	2411	976	569	569
query94	735	414	285	285
query95	369	297	288	288
query96	489	566	279	279
query97	2733	2762	2630	2630
query98	241	211	211	211
query99	1712	1392	1262	1262
Total cold run time: 297121 ms
Total hot run time: 192290 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.43 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 74f8d2eccf85c989621e345807dbb511a0661e15, data reload: false

query1	0.03	0.04	0.03
query2	0.14	0.12	0.11
query3	0.35	0.20	0.20
query4	1.59	0.21	0.09
query5	0.42	0.40	0.40
query6	1.13	0.66	0.66
query7	0.02	0.01	0.02
query8	0.05	0.05	0.04
query9	0.61	0.53	0.52
query10	0.58	0.57	0.57
query11	0.26	0.12	0.12
query12	0.25	0.14	0.14
query13	0.64	0.62	0.62
query14	0.80	0.81	0.84
query15	0.97	0.87	0.89
query16	0.36	0.37	0.38
query17	1.09	1.07	1.02
query18	0.24	0.23	0.24
query19	1.94	1.76	1.85
query20	0.01	0.01	0.02
query21	15.40	0.98	0.66
query22	0.92	1.04	0.83
query23	14.67	1.56	0.77
query24	5.17	0.62	0.29
query25	0.16	0.09	0.08
query26	0.55	0.22	0.18
query27	0.09	0.08	0.09
query28	11.05	1.20	0.57
query29	12.52	4.11	3.37
query30	0.28	0.07	0.05
query31	2.83	0.62	0.43
query32	3.24	0.60	0.49
query33	3.06	3.05	3.07
query34	16.97	5.12	4.43
query35	4.48	4.42	4.45
query36	0.63	0.50	0.49
query37	0.20	0.17	0.17
query38	0.17	0.17	0.15
query39	0.05	0.05	0.04
query40	0.20	0.17	0.16
query41	0.10	0.05	0.05
query42	0.06	0.05	0.05
query43	0.05	0.04	0.04
Total cold run time: 104.33 s
Total hot run time: 29.43 s

@englefly englefly marked this pull request as ready for review May 29, 2025 01:25
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 29, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@HappenLee HappenLee merged commit a4b5008 into apache:master May 29, 2025
24 of 27 checks passed
morningman pushed a commit that referenced this pull request Jun 5, 2025
### What problem does this PR solve?

Related PR: #51329 

Problem Summary:
The `LogicalHudiScan` class should overload the method
`withOperativeSlots` and return the `LogicalHudiScan` type. Otherwise,
the `LogicalFileScanToPhysicalFileScan` rule will be incorrectly applied
when querying the hudi table, resulting in the generation of
`PhysicalFileScan`.
Because `plan` is `LogicalFileScan`, `plan -> !(plan instanceof
LogicalHudiScan)` will incorrectly return true.
```java
public class LogicalFileScanToPhysicalFileScan extends OneImplementationRuleFactory {
    @OverRide
    public Rule build() {
        return logicalFileScan().when(plan -> !(plan instanceof LogicalHudiScan)).then(fileScan ->
            new PhysicalFileScan(
                    fileScan.getRelationId(),
                    fileScan.getTable(),
                    fileScan.getQualifier(),
                    DistributionSpecAny.INSTANCE,
                    Optional.empty(),
                    fileScan.getLogicalProperties(),
                    fileScan.getSelectedPartitions(),
                    fileScan.getTableSample(),
                    fileScan.getTableSnapshot(),
                    fileScan.getOperativeSlots())
        ).toRule(RuleType.LOGICAL_FILE_SCAN_TO_PHYSICAL_FILE_SCAN_RULE);
    }
}
```
zy-kkk pushed a commit that referenced this pull request Jun 10, 2025
…51442)

In the previous PR #51329, global lazy materialization was implemented. 
However, since RPC requests will be sent to other BE nodes during the
materialization phase, the external node needs to rely on `query ctx`
when reading the corresponding file after receiving the RPC request, but
the `query ctx` on the BE may have been released, resulting in BE core.
Solution:
By caching some information in `query ctx`, RPC does not need to rely on
`query ctx`.
zy-kkk added a commit that referenced this pull request Jun 10, 2025
based on #51329

Avoid type matching problems caused by using 0L for comparison
hubgeter pushed a commit to hubgeter/doris that referenced this pull request Jun 16, 2025
### What problem does this PR solve?

Related PR: apache#51329

Problem Summary:
The `LogicalHudiScan` class should overload the method
`withOperativeSlots` and return the `LogicalHudiScan` type. Otherwise,
the `LogicalFileScanToPhysicalFileScan` rule will be incorrectly applied
when querying the hudi table, resulting in the generation of
`PhysicalFileScan`.
Because `plan` is `LogicalFileScan`, `plan -> !(plan instanceof
LogicalHudiScan)` will incorrectly return true.
```java
public class LogicalFileScanToPhysicalFileScan extends OneImplementationRuleFactory {
    @OverRide
    public Rule build() {
        return logicalFileScan().when(plan -> !(plan instanceof LogicalHudiScan)).then(fileScan ->
            new PhysicalFileScan(
                    fileScan.getRelationId(),
                    fileScan.getTable(),
                    fileScan.getQualifier(),
                    DistributionSpecAny.INSTANCE,
                    Optional.empty(),
                    fileScan.getLogicalProperties(),
                    fileScan.getSelectedPartitions(),
                    fileScan.getTableSample(),
                    fileScan.getTableSnapshot(),
                    fileScan.getOperativeSlots())
        ).toRule(RuleType.LOGICAL_FILE_SCAN_TO_PHYSICAL_FILE_SCAN_RULE);
    }
}
```

(cherry picked from commit df06507)
morningman pushed a commit that referenced this pull request Jul 14, 2025
… external tables (#52114)

### What problem does this PR solve?
Related PR: #51329

Problem Summary:
Topn lazy materialize was introduced in pr#51329 , but the
implementation had performance issues when reading external tables. This
pr is used for optimization.
1. Before this, the materialization phase read one row of data from the
file each time. This pr classifies according to scan_range and reads
multiple rows of data from the file at one time.
2. Before this, the materialization phase was a single-threaded file
reading phase. This pr creates a scan task and submits the task to the
workload group to improve the reading speed.
3. Before this, the runtime profile was transmitted through thrift. This
pr introduces the implementation of protobuf and adds the profile
information of `RowIDFetcher` to `MATERIALIZATION_OPERATOR`.
The example is as follows:
1FE 2BE
sql :select * from ali_hive.tpch100_orc.lineitem order by l_partkey
limit 10;
```
MATERIALIZATION_OPERATOR  (id=3):(ExecTime:  2.645ms)
        -  BlocksProduced:  5
        -  CloseTime:  0ns
        -  ExecTime:  2.645ms
        -  InitTime:  0ns
        -  MemoryUsage:  0.00  
        -  MemoryUsagePeak:  0.00  
        -  OpenTime:  0ns
        -  ProjectionTime:  528.913us
        -  RowsProduced:  10
        -  WaitForDependency[MATERIALIZATION_COUNTER_DEPENDENCY]Time:  12sec874ms
    RowIDFetcher:  BackendId:1750838859134:
            -  FileReadBytes:  {[2.89  MB,  ],  [9.51  MB,  ],  [6.81  MB,  ],  [4.74  MB,  ],  [22.33  MB,  ],  }
            -  FileReadLines:  {[1,  ],  [1,  ],  [1,  ],  [1,  ],  [1,  ],  }
            -  FileReadTime:  {[102.960ms,],  [104.028ms,],  [99.817ms,],  [98.260ms,],  [120.129ms,],  }
            -  GetBlockAvgTime:  {14ms,  2ms,  2ms,  1ms,  3ms,  }
            -  InitReaderAvgTime:  {14ms,  2ms,  2ms,  1ms,  3ms,  }
            -  ScannersRunningTime:  {130ms,  124ms,  116ms,  113ms,  151ms,  }
    RowIDFetcher:  BackendId:1750936290862:
            -  FileReadBytes:  {[13.80  MB,  ],  [21.28  MB,  ],  [8.18  MB,  ],  [16.69  MB,  ],  [19.16  MB,  ],  }
            -  FileReadLines:  {[1,  ],  [1,  ],  [1,  ],  [1,  ],  [1,  ],  }
            -  FileReadTime:  {[113.031ms,],  [132.087ms,],  [105.361ms,],  [117.245ms,],  [125.535ms,],  }
            -  GetBlockAvgTime:  {2ms,  2ms,  2ms,  1ms,  3ms,  }
            -  InitReaderAvgTime:  {2ms,  2ms,  2ms,  1ms,  3ms,  }
            -  ScannersRunningTime:  {144ms,  160ms,  127ms,  142ms,  159ms,  }
```
morningman pushed a commit that referenced this pull request Sep 6, 2025
### What problem does this PR solve?
Related PR: #51329

Problem Summary:
PR #51329 introduces global lazy materialization for internal tables and
Hive/Iceberg catalogs. This PR is to support this feature for TVF,
specifically for reading Parquet ORC formats.
wenzhenghu pushed a commit to wenzhenghu/doris that referenced this pull request Sep 8, 2025
### What problem does this PR solve?
Related PR: apache#51329

Problem Summary:
PR apache#51329 introduces global lazy materialization for internal tables and
Hive/Iceberg catalogs. This PR is to support this feature for TVF,
specifically for reading Parquet ORC formats.
morrySnow pushed a commit that referenced this pull request Sep 19, 2025
…ation (#56137)

### What problem does this PR solve?

Previous pr (topn lazy materialization, #51329 commit id a4b5008)
introduces a bug in runtime filter target translation.
the runtime filter target should be basased on
PhysicalLazyMaterializeOlapScan, not the inner PhysicalOlapScan.
dwdwqfwe pushed a commit to dwdwqfwe/doris that referenced this pull request Sep 22, 2025
…ation (apache#56137)

### What problem does this PR solve?

Previous pr (topn lazy materialization, apache#51329 commit id a4b5008)
introduces a bug in runtime filter target translation.
the runtime filter target should be basased on
PhysicalLazyMaterializeOlapScan, not the inner PhysicalOlapScan.
morningman pushed a commit that referenced this pull request Dec 16, 2025
…8785)

### What problem does this PR solve?
Related PR: #51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
#51329 when materializing data based on row number in the second stage
of topn.
hubgeter added a commit to hubgeter/doris that referenced this pull request Dec 17, 2025
…ache#58785)

Related PR: apache#51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
of topn.
yiguolei pushed a commit that referenced this pull request Dec 22, 2025
… result (#58785) (#59129)

bp #58785
Related PR: #51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR of
topn.

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
kaka11chen pushed a commit to kaka11chen/doris that referenced this pull request Jan 7, 2026
…ache#58785)

Related PR: apache#51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
of topn.
kaka11chen pushed a commit that referenced this pull request Jan 8, 2026
…8785)

Related PR: #51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
of topn.
morningman pushed a commit that referenced this pull request Jan 8, 2026
…8785)

Related PR: #51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
of topn.
morningman pushed a commit that referenced this pull request Jan 10, 2026
…8785)

Related PR: #51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
of topn.
morningman pushed a commit that referenced this pull request Jan 14, 2026
…8785)

Related PR: #51329
Problem Summary:
This PR primarily enables the Parquet reader to use page indexes when
reading complex columns, and also fixes a data reading error in PR
of topn.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants