Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 commented Jun 4, 2025

What problem does this PR solve?

Related PR: #51329

Problem Summary:
The LogicalHudiScan class should overload the method withOperativeSlots and return the LogicalHudiScan type. Otherwise, the LogicalFileScanToPhysicalFileScan rule will be incorrectly applied when querying the hudi table, resulting in the generation of PhysicalFileScan.
Because plan is LogicalFileScan, plan -> !(plan instanceof LogicalHudiScan) will incorrectly return true.

public class LogicalFileScanToPhysicalFileScan extends OneImplementationRuleFactory {
    @Override
    public Rule build() {
        return logicalFileScan().when(plan -> !(plan instanceof LogicalHudiScan)).then(fileScan ->
            new PhysicalFileScan(
                    fileScan.getRelationId(),
                    fileScan.getTable(),
                    fileScan.getQualifier(),
                    DistributionSpecAny.INSTANCE,
                    Optional.empty(),
                    fileScan.getLogicalProperties(),
                    fileScan.getSelectedPartitions(),
                    fileScan.getTableSample(),
                    fileScan.getTableSnapshot(),
                    fileScan.getOperativeSlots())
        ).toRule(RuleType.LOGICAL_FILE_SCAN_TO_PHYSICAL_FILE_SCAN_RULE);
    }
}

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33968 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0ed78bf96b885fa4f7d7978ca2036f8c1c85db53, data reload: false

------ Round 1 ----------------------------------
q1	26279	5081	5066	5066
q2	1971	301	200	200
q3	10366	1267	723	723
q4	10221	1025	521	521
q5	7639	2448	2350	2350
q6	191	173	133	133
q7	1002	760	621	621
q8	9315	1325	1141	1141
q9	6845	5111	5109	5109
q10	6884	2334	1883	1883
q11	510	299	273	273
q12	341	358	214	214
q13	17788	3707	3067	3067
q14	232	228	220	220
q15	564	487	479	479
q16	432	437	387	387
q17	629	891	384	384
q18	7807	7254	7030	7030
q19	1771	985	567	567
q20	358	353	242	242
q21	3988	3231	2413	2413
q22	1041	1000	945	945
Total cold run time: 116174 ms
Total hot run time: 33968 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5237	5235	5457	5235
q2	252	324	223	223
q3	2180	2657	2271	2271
q4	1369	1799	1443	1443
q5	4512	4378	4474	4378
q6	229	173	131	131
q7	2019	1948	1803	1803
q8	2604	2503	2599	2503
q9	7297	7141	7046	7046
q10	3041	3186	2769	2769
q11	597	514	488	488
q12	687	771	617	617
q13	3489	3842	3303	3303
q14	277	302	279	279
q15	529	493	473	473
q16	471	480	434	434
q17	1163	1526	1445	1445
q18	7874	7718	7547	7547
q19	817	861	905	861
q20	1965	2150	1857	1857
q21	4891	4706	4521	4521
q22	1090	1064	1033	1033
Total cold run time: 52590 ms
Total hot run time: 50660 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192468 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0ed78bf96b885fa4f7d7978ca2036f8c1c85db53, data reload: false

query1	1408	1092	1054	1054
query2	6348	1725	1758	1725
query3	11078	4612	4551	4551
query4	54839	26079	23052	23052
query5	5275	544	455	455
query6	396	217	199	199
query7	4951	525	286	286
query8	304	227	218	218
query9	6282	2645	2618	2618
query10	440	321	274	274
query11	15083	15227	14758	14758
query12	162	108	110	108
query13	1087	526	407	407
query14	10116	6355	6227	6227
query15	198	201	186	186
query16	7068	659	484	484
query17	1063	736	569	569
query18	1572	422	302	302
query19	192	185	182	182
query20	131	121	116	116
query21	212	125	109	109
query22	4525	4549	4539	4539
query23	34230	33519	33544	33519
query24	6768	2409	2458	2409
query25	471	488	449	449
query26	666	281	148	148
query27	2309	513	338	338
query28	3056	2183	2175	2175
query29	580	564	431	431
query30	284	220	194	194
query31	864	882	796	796
query32	68	69	62	62
query33	442	370	307	307
query34	769	896	548	548
query35	781	831	757	757
query36	938	988	898	898
query37	116	102	86	86
query38	4180	4305	4224	4224
query39	1502	1457	1467	1457
query40	211	120	110	110
query41	62	55	59	55
query42	130	118	117	117
query43	513	497	469	469
query44	1349	852	834	834
query45	187	175	164	164
query46	840	1040	651	651
query47	1828	1907	1833	1833
query48	402	428	327	327
query49	656	499	391	391
query50	662	694	397	397
query51	4251	4262	4281	4262
query52	122	110	101	101
query53	232	263	189	189
query54	571	593	522	522
query55	84	86	83	83
query56	315	305	300	300
query57	1155	1190	1123	1123
query58	265	268	277	268
query59	2661	2704	2604	2604
query60	339	343	320	320
query61	135	132	126	126
query62	741	730	692	692
query63	254	192	185	185
query64	1481	1076	752	752
query65	4304	4140	4127	4127
query66	721	391	299	299
query67	16002	15711	15435	15435
query68	5418	881	535	535
query69	523	301	265	265
query70	1193	1085	1023	1023
query71	434	309	292	292
query72	5969	4974	5196	4974
query73	1188	708	356	356
query74	9072	9010	8902	8902
query75	3329	3217	2712	2712
query76	3800	1183	749	749
query77	560	361	289	289
query78	10124	10126	9285	9285
query79	2634	824	658	658
query80	753	508	447	447
query81	508	251	223	223
query82	700	130	97	97
query83	259	253	239	239
query84	287	102	92	92
query85	816	374	311	311
query86	455	285	265	265
query87	4423	4465	4300	4300
query88	3372	2294	2300	2294
query89	403	320	276	276
query90	1760	208	217	208
query91	145	148	117	117
query92	71	65	62	62
query93	2646	918	574	574
query94	749	420	307	307
query95	373	304	280	280
query96	489	584	280	280
query97	2716	2750	2615	2615
query98	230	212	212	212
query99	1396	1458	1276	1276
Total cold run time: 298022 ms
Total hot run time: 192468 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.82 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0ed78bf96b885fa4f7d7978ca2036f8c1c85db53, data reload: false

query1	0.04	0.04	0.03
query2	0.13	0.10	0.12
query3	0.25	0.19	0.19
query4	1.59	0.19	0.11
query5	0.45	0.41	0.42
query6	1.17	0.66	0.66
query7	0.03	0.03	0.01
query8	0.04	0.04	0.04
query9	0.58	0.51	0.51
query10	0.57	0.57	0.58
query11	0.16	0.11	0.10
query12	0.15	0.12	0.12
query13	0.63	0.61	0.60
query14	0.80	0.80	0.80
query15	0.90	0.87	0.85
query16	0.37	0.39	0.39
query17	1.01	1.02	1.08
query18	0.22	0.22	0.21
query19	1.89	1.78	1.87
query20	0.01	0.01	0.01
query21	15.39	0.91	0.55
query22	0.74	1.20	0.60
query23	15.00	1.41	0.66
query24	6.89	1.65	0.59
query25	0.51	0.17	0.16
query26	0.73	0.16	0.15
query27	0.06	0.05	0.05
query28	9.02	0.93	0.46
query29	12.62	3.98	3.34
query30	0.25	0.09	0.06
query31	2.83	0.61	0.40
query32	3.23	0.56	0.47
query33	3.11	3.11	3.08
query34	15.70	5.13	4.46
query35	4.54	4.47	4.48
query36	0.65	0.51	0.48
query37	0.08	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.14	0.14
query41	0.09	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.02	0.02
Total cold run time: 102.75 s
Total hot run time: 28.82 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 4, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2025

PR approved by anyone and no changes requested.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit df06507 into apache:master Jun 5, 2025
31 checks passed
hubgeter pushed a commit to hubgeter/doris that referenced this pull request Jun 16, 2025
### What problem does this PR solve?

Related PR: apache#51329

Problem Summary:
The `LogicalHudiScan` class should overload the method
`withOperativeSlots` and return the `LogicalHudiScan` type. Otherwise,
the `LogicalFileScanToPhysicalFileScan` rule will be incorrectly applied
when querying the hudi table, resulting in the generation of
`PhysicalFileScan`.
Because `plan` is `LogicalFileScan`, `plan -> !(plan instanceof
LogicalHudiScan)` will incorrectly return true.
```java
public class LogicalFileScanToPhysicalFileScan extends OneImplementationRuleFactory {
    @OverRide
    public Rule build() {
        return logicalFileScan().when(plan -> !(plan instanceof LogicalHudiScan)).then(fileScan ->
            new PhysicalFileScan(
                    fileScan.getRelationId(),
                    fileScan.getTable(),
                    fileScan.getQualifier(),
                    DistributionSpecAny.INSTANCE,
                    Optional.empty(),
                    fileScan.getLogicalProperties(),
                    fileScan.getSelectedPartitions(),
                    fileScan.getTableSample(),
                    fileScan.getTableSnapshot(),
                    fileScan.getOperativeSlots())
        ).toRule(RuleType.LOGICAL_FILE_SCAN_TO_PHYSICAL_FILE_SCAN_RULE);
    }
}
```

(cherry picked from commit df06507)
@suxiaogang223 suxiaogang223 deleted the fix_hudi_scan branch June 27, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants