Skip to content

Conversation

@kaka11chen
Copy link
Contributor

Backport #38277

…ache#38277)

Refer to trino's implementation

- Some bugs in the historical version paquet-mr. Use
`CorruptStatistics::should_ignore_statistics()` to handle.

- The old version of parquet uses `min` and `max` stats, and later
implements `min_value` and `max_value`. `Min`/`max` stats cannot be used
for some types and in some cases. This is related to the comparison and
sorting method of values.

- If it is double or float, special cases such as NaN, -0, and 0 must be
handled.

- If the string type only has min and max stats, but no min_value or
max_value, use `ParquetPredicate::_try_read_old_utf8_stats()` to expand
the range reading optimization method for optimization.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40530 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 482d64aa1ecb6dc1f0a9a6b094bbe4fcd9b7dfef, data reload: false

------ Round 1 ----------------------------------
q1	17998	7645	7371	7371
q2	2504	152	147	147
q3	11320	1202	1174	1174
q4	10735	774	725	725
q5	8082	2931	2879	2879
q6	232	157	153	153
q7	1047	648	654	648
q8	9849	1918	1912	1912
q9	6609	6385	6375	6375
q10	6989	2266	2279	2266
q11	442	255	249	249
q12	403	218	219	218
q13	17786	2966	2940	2940
q14	250	209	218	209
q15	558	521	506	506
q16	494	412	422	412
q17	1001	547	560	547
q18	7290	6667	6503	6503
q19	2568	970	937	937
q20	577	262	272	262
q21	3965	3151	3137	3137
q22	1072	960	970	960
Total cold run time: 111771 ms
Total hot run time: 40530 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7358	7276	7252	7252
q2	315	230	239	230
q3	2895	2722	2689	2689
q4	1999	1644	1658	1644
q5	5376	5456	5417	5417
q6	215	138	139	138
q7	2129	1670	1661	1661
q8	3204	3371	3409	3371
q9	8532	8450	8585	8450
q10	3459	3383	3411	3383
q11	583	480	471	471
q12	745	578	572	572
q13	16840	2933	2974	2933
q14	276	251	255	251
q15	546	510	504	504
q16	504	439	476	439
q17	1793	1548	1540	1540
q18	7691	7378	7197	7197
q19	2587	1418	1516	1418
q20	1982	1775	1823	1775
q21	5207	5043	5015	5015
q22	1083	986	978	978
Total cold run time: 75319 ms
Total hot run time: 57328 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189781 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 482d64aa1ecb6dc1f0a9a6b094bbe4fcd9b7dfef, data reload: false

query1	955	380	364	364
query2	6549	1932	1886	1886
query3	6725	214	225	214
query4	33898	23481	23415	23415
query5	4288	458	434	434
query6	256	167	195	167
query7	4639	312	313	312
query8	258	204	208	204
query9	9417	2645	2597	2597
query10	480	296	281	281
query11	18077	15039	15091	15039
query12	148	97	95	95
query13	1644	456	410	410
query14	9504	6953	6983	6953
query15	214	175	176	175
query16	7811	482	439	439
query17	1616	570	555	555
query18	1932	316	311	311
query19	212	149	150	149
query20	115	107	111	107
query21	212	102	100	100
query22	4346	4112	4238	4112
query23	34471	35512	34576	34576
query24	12093	2830	2831	2830
query25	577	394	401	394
query26	1139	164	165	164
query27	2740	302	302	302
query28	7786	2164	2144	2144
query29	660	443	439	439
query30	338	159	151	151
query31	1036	793	791	791
query32	99	59	55	55
query33	771	305	303	303
query34	926	514	494	494
query35	880	717	742	717
query36	1097	920	919	919
query37	217	78	82	78
query38	4028	3894	3786	3786
query39	1460	1424	1414	1414
query40	288	105	98	98
query41	48	45	44	44
query42	117	99	104	99
query43	516	484	469	469
query44	1228	793	787	787
query45	204	169	167	167
query46	1133	724	723	723
query47	1909	1839	1809	1809
query48	424	343	346	343
query49	1251	388	388	388
query50	814	409	404	404
query51	6986	6793	6807	6793
query52	101	95	97	95
query53	263	198	187	187
query54	1154	476	477	476
query55	80	76	80	76
query56	285	266	271	266
query57	1233	1150	1136	1136
query58	247	252	263	252
query59	3061	2974	2882	2882
query60	299	280	279	279
query61	124	118	121	118
query62	860	673	662	662
query63	214	181	183	181
query64	5238	614	597	597
query65	3225	3167	3204	3167
query66	1265	307	301	301
query67	15761	15318	15278	15278
query68	4718	555	553	553
query69	575	291	285	285
query70	1153	1116	1123	1116
query71	413	266	270	266
query72	7282	4139	4207	4139
query73	788	352	343	343
query74	10519	8954	8848	8848
query75	3692	2650	2649	2649
query76	3596	819	849	819
query77	541	304	314	304
query78	9918	9835	9166	9166
query79	6263	589	588	588
query80	1642	445	453	445
query81	561	221	227	221
query82	782	127	126	126
query83	303	134	135	134
query84	294	91	77	77
query85	2113	305	280	280
query86	336	296	288	288
query87	4571	4247	4233	4233
query88	4531	2441	2440	2440
query89	517	285	288	285
query90	1900	190	182	182
query91	137	107	106	106
query92	61	47	47	47
query93	5755	554	538	538
query94	723	282	290	282
query95	345	246	243	243
query96	615	293	279	279
query97	3298	3054	3108	3054
query98	214	201	204	201
query99	1878	1291	1347	1291
Total cold run time: 311976 ms
Total hot run time: 189781 ms

@morningman morningman merged commit bbaa056 into apache:branch-3.0 Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants