Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #46121

…ate materialization‌ case of parquet reader (#46121)

### What problem does this PR solve?

Related PR: #40641

Problem Summary:

[Fix](parquet-reader) Fixed the issue of excessive scanning data in late
materialization‌ case of parquet reader introduced by #40641 in
scenarios with particularly high filtering rates.
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Dec 30, 2024
@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41121 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ff7b9db29959bcc3fd4f46eb2adc20c1dbd752d6, data reload: false

------ Round 1 ----------------------------------
q1	18008	7503	7338	7338
q2	2438	189	185	185
q3	11285	1150	1170	1150
q4	10372	780	734	734
q5	7772	2885	2897	2885
q6	244	150	148	148
q7	980	620	614	614
q8	9390	2002	2039	2002
q9	6779	6539	6386	6386
q10	7049	2275	2315	2275
q11	470	267	290	267
q12	407	214	215	214
q13	17792	3000	2964	2964
q14	234	209	202	202
q15	575	526	512	512
q16	702	600	614	600
q17	976	583	557	557
q18	7278	6648	6650	6648
q19	1438	998	1008	998
q20	468	206	207	206
q21	3999	3323	3221	3221
q22	1133	1015	1016	1015
Total cold run time: 109789 ms
Total hot run time: 41121 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7336	7279	7251	7251
q2	329	243	229	229
q3	3126	3044	2951	2951
q4	2065	1811	1800	1800
q5	5715	5665	5654	5654
q6	225	138	138	138
q7	2235	1841	1826	1826
q8	3325	3505	3480	3480
q9	8892	8918	8872	8872
q10	3607	3572	3484	3484
q11	602	514	538	514
q12	799	599	561	561
q13	7311	3155	2996	2996
q14	283	262	251	251
q15	565	517	521	517
q16	692	653	642	642
q17	1762	1564	1576	1564
q18	7819	7540	7458	7458
q19	1664	1628	1587	1587
q20	2032	1797	1823	1797
q21	5398	5251	5263	5251
q22	1133	1031	1024	1024
Total cold run time: 66915 ms
Total hot run time: 59847 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192031 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ff7b9db29959bcc3fd4f46eb2adc20c1dbd752d6, data reload: false

query1	985	370	367	367
query2	6513	2102	2075	2075
query3	6749	213	213	213
query4	33855	23377	23343	23343
query5	4391	460	439	439
query6	269	175	175	175
query7	4626	316	314	314
query8	287	229	214	214
query9	9395	2674	2677	2674
query10	499	260	272	260
query11	18132	15179	15079	15079
query12	148	100	99	99
query13	1636	417	423	417
query14	9659	7449	7438	7438
query15	330	175	178	175
query16	8200	461	504	461
query17	1775	584	561	561
query18	2151	319	313	313
query19	362	160	149	149
query20	115	112	113	112
query21	61	46	46	46
query22	4573	4457	4437	4437
query23	35080	33969	34095	33969
query24	11144	2771	2860	2771
query25	649	387	374	374
query26	1351	168	164	164
query27	2821	296	308	296
query28	8050	2461	2437	2437
query29	876	424	420	420
query30	322	171	161	161
query31	1029	804	814	804
query32	92	58	55	55
query33	786	275	275	275
query34	971	489	507	489
query35	868	715	721	715
query36	1094	960	943	943
query37	139	77	71	71
query38	4032	3828	3839	3828
query39	1453	1414	1447	1414
query40	208	81	80	80
query41	57	46	51	46
query42	112	94	98	94
query43	518	495	484	484
query44	1274	822	810	810
query45	184	169	164	164
query46	1143	701	729	701
query47	1953	1847	1880	1847
query48	448	369	369	369
query49	1142	376	375	375
query50	821	399	396	396
query51	7300	7029	7109	7029
query52	101	86	84	84
query53	255	181	182	181
query54	1242	444	446	444
query55	75	78	76	76
query56	253	251	233	233
query57	1243	1149	1064	1064
query58	226	200	204	200
query59	3178	2993	3074	2993
query60	274	260	247	247
query61	114	112	111	111
query62	861	665	658	658
query63	212	188	195	188
query64	5517	649	622	622
query65	3271	3180	3202	3180
query66	1451	310	307	307
query67	16186	15845	15598	15598
query68	3555	587	582	582
query69	416	263	265	263
query70	1186	1152	1029	1029
query71	320	247	247	247
query72	6244	4040	4185	4040
query73	770	350	353	350
query74	9899	8988	8915	8915
query75	3422	2647	2667	2647
query76	2539	1082	1152	1082
query77	477	280	292	280
query78	10452	9783	9656	9656
query79	1056	616	598	598
query80	697	458	452	452
query81	514	241	239	239
query82	218	123	122	122
query83	250	153	156	153
query84	243	88	81	81
query85	1158	365	358	358
query86	341	292	304	292
query87	4517	4413	4336	4336
query88	3691	2408	2371	2371
query89	387	296	299	296
query90	2092	187	185	185
query91	191	159	159	159
query92	62	54	54	54
query93	1041	554	555	554
query94	795	286	295	286
query95	371	274	268	268
query96	608	286	280	280
query97	3301	3199	3264	3199
query98	215	214	201	201
query99	1539	1328	1348	1328
Total cold run time: 297654 ms
Total hot run time: 192031 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ff7b9db29959bcc3fd4f46eb2adc20c1dbd752d6, data reload: false

query1	0.03	0.02	0.03
query2	0.06	0.03	0.03
query3	0.23	0.06	0.07
query4	1.63	0.11	0.10
query5	0.54	0.52	0.52
query6	1.13	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.56	0.51	0.50
query10	0.54	0.55	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.60	0.60
query14	2.93	3.00	3.02
query15	0.89	0.84	0.82
query16	0.38	0.40	0.38
query17	1.07	1.05	1.06
query18	0.24	0.22	0.22
query19	1.94	1.92	1.95
query20	0.01	0.01	0.01
query21	15.38	0.57	0.58
query22	2.76	2.70	1.68
query23	17.03	1.18	0.81
query24	3.72	1.20	0.67
query25	0.30	0.19	0.11
query26	0.43	0.13	0.13
query27	0.04	0.05	0.04
query28	10.58	1.12	1.08
query29	12.60	3.15	3.18
query30	0.25	0.06	0.06
query31	2.84	0.39	0.37
query32	3.25	0.45	0.46
query33	2.98	3.04	3.02
query34	16.86	4.50	4.50
query35	4.59	4.49	4.56
query36	0.66	0.48	0.48
query37	0.10	0.06	0.06
query38	0.05	0.03	0.03
query39	0.03	0.02	0.02
query40	0.15	0.12	0.13
query41	0.08	0.03	0.02
query42	0.03	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 107.86 s
Total hot run time: 32.66 s

@morningman morningman merged commit 6645035 into branch-3.0 Dec 31, 2024
20 of 21 checks passed
@github-actions github-actions bot deleted the auto-pick-46121-branch-3.0 branch December 31, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants