Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Jul 24, 2024

Proposed changes

Refer to trino's implementation

  • Some bugs in the historical version paquet-mr. Use CorruptStatistics::should_ignore_statistics() to handle.

  • The old version of parquet uses min and max stats, and later implements min_value and max_value. Min/max stats cannot be used for some types and in some cases. This is related to the comparison and sorting method of values.

  • If it is double or float, special cases such as NaN, -0, and 0 must be handled.

  • If the string type only has min and max stats, but no min_value or max_value, use ParquetPredicate::_try_read_old_utf8_stats() to expand the range reading optimization method for optimization.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the fix_and_opt_parquet_min_max-master branch from 96e0ee0 to bd5d879 Compare July 24, 2024 02:20
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions


#pragma once

#include <gen_cpp/parquet_types.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'gen_cpp/parquet_types.h' file not found [clang-diagnostic-error]

#include <gen_cpp/parquet_types.h>
         ^

", semver=" + (version ? *version : "null") +
", appBuildHash=" + (appBuildHash ? *appBuildHash : "null") + ")";
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
public:
Additional context

be/src/vec/exec/format/parquet/parquet_common.h:166: previously declared here

public:
^

return Status::OK();
}

int compareTo(const SemanticVersion& other) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = compareIntegers(_major, other._major); cmp != 0) return cmp;
if (int cmp = compareIntegers(_major, other._major); cmp != 0) { return cmp;
}

}

int compareTo(const SemanticVersion& other) const {
if (int cmp = compareIntegers(_major, other._major); cmp != 0) return cmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = compareIntegers(_minor, other._minor); cmp != 0) return cmp;
if (int cmp = compareIntegers(_minor, other._minor); cmp != 0) { return cmp;
}


int compareTo(const SemanticVersion& other) const {
if (int cmp = compareIntegers(_major, other._major); cmp != 0) return cmp;
if (int cmp = compareIntegers(_minor, other._minor); cmp != 0) return cmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = compareIntegers(_patch, other._patch); cmp != 0) return cmp;
if (int cmp = compareIntegers(_patch, other._patch); cmp != 0) { return cmp;
}

ParquetStatisticsTest() {}
};

TEST_F(ParquetStatisticsTest, test_try_read_old_utf8_stats) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'TEST_F' exceeds recommended size/complexity thresholds [readability-function-size]

TEST_F(ParquetStatisticsTest, test_try_read_old_utf8_stats) {
^
Additional context

be/test/vec/exec/parquet/parquet_statistics_test.cpp:30: 121 lines including whitespace and comments (threshold 80)

TEST_F(ParquetStatisticsTest, test_try_read_old_utf8_stats) {
^

// specific language governing permissions and limitations
// under the License.

#include <gtest/gtest.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'gtest/gtest.h' file not found [clang-diagnostic-error]

#include <gtest/gtest.h>
         ^

Comment on lines +24 to +25
namespace doris {
namespace vectorized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: nested namespaces can be concatenated [modernize-concat-nested-namespaces]

Suggested change
namespace doris {
namespace vectorized {
namespace doris::vectorized {

be/test/vec/exec/parquet/parquet_version_test.cpp:219:

- } // namespace vectorized
- } // namespace doris
+ } // namespace doris

namespace vectorized {
class ParquetVersionTest : public testing::Test {
public:
ParquetVersionTest() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: use '= default' to define a trivial default constructor [modernize-use-equals-default]

Suggested change
ParquetVersionTest() {}
ParquetVersionTest() = default;

ParquetVersionTest() {}
};

TEST_F(ParquetVersionTest, test_version_parser) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'TEST_F' exceeds recommended size/complexity thresholds [readability-function-size]

TEST_F(ParquetVersionTest, test_version_parser) {
^
Additional context

be/test/vec/exec/parquet/parquet_version_test.cpp:30: 91 lines including whitespace and comments (threshold 80)

TEST_F(ParquetVersionTest, test_version_parser) {
^

@doris-robot
Copy link

TPC-H: Total hot run time: 39895 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bd5d8796c2682f750772f5f95fe0f417647220ad, data reload: false

------ Round 1 ----------------------------------
q1	18005	4479	4341	4341
q2	2636	207	192	192
q3	11580	1164	1099	1099
q4	10329	765	764	764
q5	7583	2712	2676	2676
q6	227	143	137	137
q7	962	624	614	614
q8	9243	2060	2080	2060
q9	8901	6554	6551	6551
q10	8714	3751	3765	3751
q11	471	242	240	240
q12	392	219	217	217
q13	17891	2971	2991	2971
q14	292	234	235	234
q15	529	476	476	476
q16	495	378	382	378
q17	972	686	772	686
q18	7956	7522	7388	7388
q19	5256	1320	1331	1320
q20	714	331	319	319
q21	4954	3195	3236	3195
q22	353	291	286	286
Total cold run time: 118455 ms
Total hot run time: 39895 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4319	4220	4242	4220
q2	377	251	266	251
q3	3004	2787	2738	2738
q4	1850	1599	1582	1582
q5	5267	5307	5301	5301
q6	223	130	130	130
q7	2102	1752	1734	1734
q8	3194	3360	3300	3300
q9	8438	8357	8394	8357
q10	3838	3704	3695	3695
q11	580	494	496	494
q12	801	630	614	614
q13	17382	2951	3000	2951
q14	312	271	280	271
q15	531	484	472	472
q16	480	410	419	410
q17	1752	1481	1446	1446
q18	7766	7580	7477	7477
q19	1661	1577	1610	1577
q20	2009	1762	1779	1762
q21	4788	4715	4704	4704
q22	600	494	509	494
Total cold run time: 71274 ms
Total hot run time: 53980 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174183 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bd5d8796c2682f750772f5f95fe0f417647220ad, data reload: false

query1	907	380	373	373
query2	6457	1885	1823	1823
query3	6656	207	219	207
query4	27756	17597	17656	17597
query5	4194	481	494	481
query6	270	183	166	166
query7	4591	294	291	291
query8	246	198	193	193
query9	8459	2421	2395	2395
query10	459	284	264	264
query11	10945	10067	10218	10067
query12	129	80	85	80
query13	1629	369	362	362
query14	10178	7685	7524	7524
query15	214	173	171	171
query16	7806	467	471	467
query17	1353	573	558	558
query18	1969	286	272	272
query19	189	147	149	147
query20	91	84	84	84
query21	215	137	132	132
query22	4403	4142	3981	3981
query23	33678	33201	33301	33201
query24	12077	2855	2870	2855
query25	656	377	394	377
query26	1784	146	146	146
query27	2941	268	269	268
query28	7726	2008	1998	1998
query29	1142	621	618	618
query30	285	152	152	152
query31	947	743	747	743
query32	93	55	53	53
query33	770	320	325	320
query34	900	476	477	476
query35	889	758	732	732
query36	1093	929	912	912
query37	209	77	79	77
query38	2870	2767	2790	2767
query39	874	824	817	817
query40	281	122	121	121
query41	51	48	46	46
query42	124	101	102	101
query43	498	460	475	460
query44	1174	723	756	723
query45	194	164	162	162
query46	1103	717	738	717
query47	1895	1800	1776	1776
query48	361	290	288	288
query49	1203	426	406	406
query50	772	392	380	380
query51	6766	6688	6666	6666
query52	106	98	93	93
query53	358	288	296	288
query54	899	449	450	449
query55	77	72	75	72
query56	291	275	267	267
query57	1171	1043	1046	1043
query58	265	263	256	256
query59	2964	2650	2594	2594
query60	303	275	341	275
query61	99	93	96	93
query62	832	643	657	643
query63	323	293	296	293
query64	10463	2210	6740	2210
query65	3160	3134	3132	3132
query66	1394	335	331	331
query67	15659	15111	15158	15111
query68	4581	547	547	547
query69	490	352	331	331
query70	1131	1184	1169	1169
query71	395	277	279	277
query72	7119	5483	5728	5483
query73	750	324	322	322
query74	6124	5737	5736	5736
query75	3388	2758	2716	2716
query76	2803	988	924	924
query77	500	320	310	310
query78	10103	8970	8909	8909
query79	2338	531	522	522
query80	1297	476	477	476
query81	573	220	220	220
query82	764	132	129	129
query83	248	165	168	165
query84	241	89	86	86
query85	2052	327	372	327
query86	486	303	329	303
query87	3247	3143	3110	3110
query88	4059	2365	2402	2365
query89	453	389	387	387
query90	1851	190	188	188
query91	130	105	103	103
query92	65	48	50	48
query93	2405	528	514	514
query94	1273	284	300	284
query95	407	316	313	313
query96	598	267	267	267
query97	3165	3045	3080	3045
query98	230	199	196	196
query99	1552	1268	1279	1268
Total cold run time: 285654 ms
Total hot run time: 174183 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.61 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bd5d8796c2682f750772f5f95fe0f417647220ad, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.04	0.06
query4	1.68	0.09	0.10
query5	0.50	0.50	0.49
query6	1.13	0.73	0.73
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.53	0.49	0.50
query10	0.55	0.54	0.56
query11	0.15	0.12	0.12
query12	0.14	0.13	0.13
query13	0.59	0.58	0.57
query14	0.75	0.78	0.78
query15	0.84	0.82	0.81
query16	0.38	0.37	0.37
query17	1.01	1.02	1.07
query18	0.22	0.21	0.22
query19	1.82	1.73	1.83
query20	0.01	0.01	0.00
query21	15.41	0.78	0.68
query22	4.28	6.54	2.73
query23	18.29	1.31	1.38
query24	2.20	0.23	0.22
query25	0.14	0.10	0.09
query26	0.29	0.21	0.21
query27	0.45	0.23	0.24
query28	13.19	1.02	1.02
query29	13.08	3.28	3.26
query30	0.26	0.06	0.05
query31	2.94	0.39	0.39
query32	3.26	0.48	0.47
query33	2.88	2.93	2.93
query34	17.08	4.36	4.34
query35	4.45	4.40	4.45
query36	0.66	0.45	0.49
query37	0.18	0.16	0.16
query38	0.15	0.15	0.14
query39	0.04	0.04	0.04
query40	0.15	0.12	0.13
query41	0.09	0.04	0.05
query42	0.05	0.06	0.04
query43	0.05	0.04	0.04
Total cold run time: 110.28 s
Total hot run time: 31.61 s

@kaka11chen kaka11chen force-pushed the fix_and_opt_parquet_min_max-master branch from bd5d879 to e1d5ba2 Compare July 24, 2024 05:00
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the fix_and_opt_parquet_min_max-master branch from e1d5ba2 to 1ce8ced Compare July 24, 2024 06:27
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the fix_and_opt_parquet_min_max-master branch from 1ce8ced to b42b73a Compare July 24, 2024 06:40
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39838 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b42b73aef9328bb05957f36f485d488889557d2f, data reload: false

------ Round 1 ----------------------------------
q1	17617	4362	4261	4261
q2	2006	188	188	188
q3	10450	1148	1091	1091
q4	10183	819	790	790
q5	7537	2682	2639	2639
q6	220	138	134	134
q7	951	592	602	592
q8	9227	2069	2053	2053
q9	8794	6532	6521	6521
q10	8743	3739	3734	3734
q11	455	240	235	235
q12	405	218	223	218
q13	17757	2970	2972	2970
q14	276	238	249	238
q15	519	483	491	483
q16	507	375	374	374
q17	960	655	705	655
q18	8040	7464	7406	7406
q19	8040	1459	1460	1459
q20	660	312	328	312
q21	4928	3201	3234	3201
q22	351	295	284	284
Total cold run time: 118626 ms
Total hot run time: 39838 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4397	4205	4241	4205
q2	370	263	259	259
q3	3018	2919	2893	2893
q4	1992	1677	1640	1640
q5	5619	5497	5465	5465
q6	228	141	129	129
q7	2188	1896	1804	1804
q8	3285	3408	3405	3405
q9	8667	8789	8786	8786
q10	4103	3817	3778	3778
q11	573	504	513	504
q12	802	639	642	639
q13	15887	3175	3185	3175
q14	315	277	273	273
q15	523	488	496	488
q16	489	441	447	441
q17	1790	1510	1508	1508
q18	8034	7864	7748	7748
q19	1718	1614	1466	1466
q20	2195	1895	1844	1844
q21	9580	4794	4654	4654
q22	609	508	519	508
Total cold run time: 76382 ms
Total hot run time: 55612 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173837 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b42b73aef9328bb05957f36f485d488889557d2f, data reload: false

query1	911	377	366	366
query2	6437	1926	1907	1907
query3	6645	201	216	201
query4	28565	17394	17305	17305
query5	3597	475	476	475
query6	254	187	167	167
query7	4586	291	282	282
query8	235	200	188	188
query9	8539	2392	2370	2370
query10	427	322	283	283
query11	10723	10102	10124	10102
query12	115	83	81	81
query13	1647	383	394	383
query14	10095	7684	7625	7625
query15	221	167	164	164
query16	7633	512	481	481
query17	1553	566	513	513
query18	1796	270	267	267
query19	195	153	167	153
query20	84	80	81	80
query21	212	133	123	123
query22	4464	4256	4031	4031
query23	34070	33602	33655	33602
query24	10818	2877	2904	2877
query25	613	409	403	403
query26	698	153	149	149
query27	2318	283	280	280
query28	5802	2105	2087	2087
query29	878	638	616	616
query30	251	149	160	149
query31	985	748	776	748
query32	95	51	52	51
query33	659	328	345	328
query34	913	498	506	498
query35	900	792	793	792
query36	1194	955	965	955
query37	136	80	81	80
query38	2947	2868	2840	2840
query39	866	806	816	806
query40	201	123	119	119
query41	48	43	43	43
query42	110	97	96	96
query43	510	487	481	481
query44	1049	717	722	717
query45	199	167	172	167
query46	1079	757	722	722
query47	1905	1803	1779	1779
query48	377	289	281	281
query49	843	400	406	400
query50	762	384	380	380
query51	6850	6692	6606	6606
query52	104	95	87	87
query53	353	295	289	289
query54	877	458	439	439
query55	74	75	70	70
query56	298	258	274	258
query57	1142	1054	1041	1041
query58	245	253	274	253
query59	2820	2585	2834	2585
query60	299	285	272	272
query61	97	93	96	93
query62	802	655	650	650
query63	316	280	286	280
query64	9121	2194	1657	1657
query65	3152	3096	3101	3096
query66	756	337	390	337
query67	15349	15095	15050	15050
query68	4532	549	542	542
query69	481	342	343	342
query70	1186	1123	1124	1123
query71	416	274	280	274
query72	7153	5515	5880	5515
query73	739	319	321	319
query74	6124	5734	5660	5660
query75	4199	2684	2677	2677
query76	2762	929	883	883
query77	613	309	290	290
query78	9667	9590	8940	8940
query79	3207	520	522	520
query80	1625	465	465	465
query81	598	220	216	216
query82	832	133	142	133
query83	347	164	170	164
query84	272	89	85	85
query85	1914	316	305	305
query86	485	311	316	311
query87	3259	3116	3132	3116
query88	4563	2336	2359	2336
query89	479	377	374	374
query90	1850	192	188	188
query91	126	170	103	103
query92	56	49	47	47
query93	4743	515	505	505
query94	1225	295	275	275
query95	407	316	317	316
query96	613	273	271	271
query97	3153	3029	3066	3029
query98	232	205	197	197
query99	1562	1270	1260	1260
Total cold run time: 282235 ms
Total hot run time: 173837 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.21 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b42b73aef9328bb05957f36f485d488889557d2f, data reload: false

query1	0.04	0.04	0.04
query2	0.09	0.04	0.03
query3	0.22	0.06	0.06
query4	1.66	0.09	0.08
query5	0.50	0.47	0.48
query6	1.13	0.73	0.72
query7	0.02	0.02	0.02
query8	0.06	0.04	0.05
query9	0.55	0.50	0.47
query10	0.54	0.53	0.54
query11	0.15	0.11	0.12
query12	0.15	0.12	0.13
query13	0.60	0.58	0.58
query14	0.77	0.77	0.76
query15	0.85	0.80	0.81
query16	0.35	0.37	0.37
query17	1.01	0.98	0.96
query18	0.24	0.22	0.22
query19	1.79	1.73	1.70
query20	0.01	0.01	0.01
query21	15.40	0.77	0.66
query22	4.34	7.79	1.53
query23	18.18	1.36	1.25
query24	2.08	0.21	0.24
query25	0.16	0.08	0.09
query26	0.30	0.21	0.20
query27	0.45	0.23	0.22
query28	13.26	1.01	1.00
query29	12.61	3.28	3.30
query30	0.25	0.06	0.06
query31	2.88	0.39	0.38
query32	3.27	0.49	0.47
query33	2.84	2.92	2.87
query34	17.10	4.40	4.40
query35	4.43	4.51	4.46
query36	0.65	0.47	0.47
query37	0.18	0.15	0.16
query38	0.15	0.15	0.14
query39	0.04	0.03	0.03
query40	0.15	0.13	0.13
query41	0.10	0.05	0.05
query42	0.06	0.04	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.65 s
Total hot run time: 30.21 s

@github-actions github-actions bot added the doing label Aug 2, 2024
@kaka11chen kaka11chen marked this pull request as ready for review August 2, 2024 08:40
@kaka11chen
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

}

int SemanticVersion::compare_to(const SemanticVersion& other) const {
if (int cmp = _compare_integers(_major, other._major); cmp != 0) return cmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = _compare_integers(_major, other._major); cmp != 0) return cmp;
if (int cmp = _compare_integers(_major, other._major); cmp != 0) { return cmp;
}


int SemanticVersion::compare_to(const SemanticVersion& other) const {
if (int cmp = _compare_integers(_major, other._major); cmp != 0) return cmp;
if (int cmp = _compare_integers(_minor, other._minor); cmp != 0) return cmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = _compare_integers(_minor, other._minor); cmp != 0) return cmp;
if (int cmp = _compare_integers(_minor, other._minor); cmp != 0) { return cmp;
}

int SemanticVersion::compare_to(const SemanticVersion& other) const {
if (int cmp = _compare_integers(_major, other._major); cmp != 0) return cmp;
if (int cmp = _compare_integers(_minor, other._minor); cmp != 0) return cmp;
if (int cmp = _compare_integers(_patch, other._patch); cmp != 0) return cmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = _compare_integers(_patch, other._patch); cmp != 0) return cmp;
if (int cmp = _compare_integers(_patch, other._patch); cmp != 0) { return cmp;
}

if (int cmp = _compare_integers(_major, other._major); cmp != 0) return cmp;
if (int cmp = _compare_integers(_minor, other._minor); cmp != 0) return cmp;
if (int cmp = _compare_integers(_patch, other._patch); cmp != 0) return cmp;
if (int cmp = _compare_booleans(other._prerelease, _prerelease); cmp != 0) return cmp;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
if (int cmp = _compare_booleans(other._prerelease, _prerelease); cmp != 0) return cmp;
if (int cmp = _compare_booleans(other._prerelease, _prerelease); cmp != 0) { return cmp;
}

std::vector<NumberOrString> _identifiers;
};

private:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: redundant access specifier has the same accessibility as the previous access specifier [readability-redundant-access-specifiers]

Suggested change
private:
Additional context

be/src/vec/exec/format/parquet/parquet_common.h:218: previously declared here

private:
^

@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41555 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7cc53a37032d1f16b2cf0d164e73c17a1d76e94d, data reload: false

------ Round 1 ----------------------------------
q1	17608	4198	4086	4086
q2	2022	205	201	201
q3	10589	1323	1365	1323
q4	10267	810	934	810
q5	7594	2871	2994	2871
q6	221	137	138	137
q7	1083	610	613	610
q8	9438	1815	1967	1815
q9	8512	6599	6633	6599
q10	8728	3895	3884	3884
q11	439	248	246	246
q12	409	227	222	222
q13	17533	2939	2946	2939
q14	283	247	241	241
q15	520	482	505	482
q16	524	395	396	395
q17	985	934	893	893
q18	8025	7243	7422	7243
q19	1414	1237	1236	1236
q20	570	331	348	331
q21	5358	4715	4807	4715
q22	356	276	285	276
Total cold run time: 112478 ms
Total hot run time: 41555 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4078	4034	4013	4013
q2	332	226	218	218
q3	3029	3003	3003	3003
q4	1903	1892	1914	1892
q5	5269	5238	5242	5238
q6	217	128	127	127
q7	2028	1686	1711	1686
q8	3254	3295	3247	3247
q9	8298	8263	8256	8256
q10	3762	3855	3850	3850
q11	542	439	444	439
q12	741	575	552	552
q13	14023	2932	2936	2932
q14	286	266	256	256
q15	513	476	479	476
q16	444	411	391	391
q17	1751	1708	1720	1708
q18	7717	7408	7279	7279
q19	1713	1714	1680	1680
q20	1986	1773	1738	1738
q21	5432	5248	5285	5248
q22	500	446	445	445
Total cold run time: 67818 ms
Total hot run time: 54674 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168434 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7cc53a37032d1f16b2cf0d164e73c17a1d76e94d, data reload: false

query1	923	387	376	376
query2	6493	1691	1674	1674
query3	6670	215	227	215
query4	20317	17465	17471	17465
query5	4284	521	516	516
query6	309	165	162	162
query7	4627	288	301	288
query8	257	212	193	193
query9	8535	2359	2346	2346
query10	469	280	269	269
query11	10578	10074	10035	10035
query12	142	88	87	87
query13	1662	397	374	374
query14	9738	7465	6660	6660
query15	207	157	169	157
query16	7075	463	458	458
query17	933	558	571	558
query18	1905	276	274	274
query19	189	140	142	140
query20	93	83	86	83
query21	201	100	98	98
query22	4477	4021	3981	3981
query23	33607	32853	32833	32833
query24	10323	3018	3058	3018
query25	677	377	392	377
query26	1757	148	148	148
query27	2990	275	279	275
query28	6873	1975	1950	1950
query29	1302	412	438	412
query30	285	153	150	150
query31	948	768	756	756
query32	101	54	56	54
query33	697	316	325	316
query34	909	472	496	472
query35	853	746	700	700
query36	1011	877	858	858
query37	286	80	79	79
query38	2874	2753	2746	2746
query39	859	793	831	793
query40	275	111	110	110
query41	51	44	45	44
query42	122	95	103	95
query43	454	409	402	402
query44	1190	745	740	740
query45	211	176	179	176
query46	1094	806	773	773
query47	1805	1717	1762	1717
query48	367	297	302	297
query49	1217	428	420	420
query50	890	427	428	427
query51	6852	6726	6767	6726
query52	102	88	91	88
query53	251	179	176	176
query54	634	456	461	456
query55	76	77	76	76
query56	288	259	262	259
query57	1138	1015	1035	1015
query58	288	279	279	279
query59	2570	2305	2232	2232
query60	299	270	278	270
query61	97	92	97	92
query62	907	654	670	654
query63	214	179	181	179
query64	5945	1893	1847	1847
query65	3176	3075	3083	3075
query66	1435	353	358	353
query67	15303	14950	14897	14897
query68	4378	569	581	569
query69	436	304	291	291
query70	1106	1079	1108	1079
query71	357	279	285	279
query72	7200	2657	2525	2525
query73	762	330	330	330
query74	5995	5533	5609	5533
query75	3355	2708	2883	2708
query76	2236	1230	1248	1230
query77	428	330	322	322
query78	9427	9130	8930	8930
query79	1312	535	529	529
query80	1014	511	505	505
query81	546	223	221	221
query82	1053	134	126	126
query83	237	168	169	168
query84	265	82	83	82
query85	1282	316	298	298
query86	392	284	305	284
query87	3276	3040	3104	3040
query88	3005	2550	2507	2507
query89	384	300	293	293
query90	1755	198	193	193
query91	138	110	112	110
query92	63	51	56	51
query93	1380	616	624	616
query94	893	319	309	309
query95	391	355	261	261
query96	604	283	288	283
query97	3201	3069	3044	3044
query98	208	199	197	197
query99	1671	1329	1300	1300
Total cold run time: 262437 ms
Total hot run time: 168434 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7cc53a37032d1f16b2cf0d164e73c17a1d76e94d, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.05	0.06
query4	1.66	0.06	0.07
query5	0.47	0.49	0.47
query6	1.14	0.72	0.71
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.58	0.52	0.50
query10	0.57	0.58	0.57
query11	0.15	0.12	0.12
query12	0.15	0.12	0.12
query13	0.63	0.61	0.60
query14	0.77	0.81	0.78
query15	0.89	0.87	0.86
query16	0.35	0.36	0.35
query17	1.02	1.02	1.00
query18	0.23	0.21	0.22
query19	1.84	1.74	1.75
query20	0.01	0.01	0.00
query21	15.40	0.75	0.66
query22	3.75	7.88	1.15
query23	18.03	1.32	1.34
query24	2.27	0.22	0.21
query25	0.18	0.08	0.07
query26	0.31	0.21	0.21
query27	0.45	0.22	0.22
query28	13.15	0.99	0.97
query29	12.52	3.31	3.28
query30	0.25	0.06	0.05
query31	2.89	0.40	0.40
query32	3.25	0.49	0.50
query33	2.94	2.93	2.96
query34	15.44	4.30	4.28
query35	4.27	4.32	4.36
query36	0.67	0.50	0.49
query37	0.18	0.17	0.16
query38	0.16	0.15	0.15
query39	0.04	0.03	0.03
query40	0.16	0.13	0.14
query41	0.10	0.04	0.05
query42	0.05	0.04	0.04
query43	0.04	0.04	0.04
Total cold run time: 107.37 s
Total hot run time: 29.91 s

@kaka11chen kaka11chen force-pushed the fix_and_opt_parquet_min_max-master branch from 7cc53a3 to e8f82a9 Compare August 5, 2024 14:50
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 42085 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit e8f82a92abdbb73037b852598ea6be2c3cb8a92c, data reload: false

------ Round 1 ----------------------------------
q1	18220	4226	4185	4185
q2	2420	214	211	211
q3	11080	1448	1403	1403
q4	11493	873	952	873
q5	8108	2990	3010	2990
q6	225	141	146	141
q7	1072	631	619	619
q8	9432	1847	1947	1847
q9	8450	6675	6664	6664
q10	8709	3835	3862	3835
q11	427	247	253	247
q12	420	231	235	231
q13	17757	2937	2923	2923
q14	272	248	245	245
q15	521	491	496	491
q16	495	407	385	385
q17	956	926	944	926
q18	8011	7363	7214	7214
q19	1393	1216	1215	1215
q20	578	326	351	326
q21	5361	4826	4869	4826
q22	350	301	288	288
Total cold run time: 115750 ms
Total hot run time: 42085 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4119	4054	4060	4054
q2	323	220	220	220
q3	3005	3025	3019	3019
q4	1877	1838	1863	1838
q5	5276	5221	5225	5221
q6	222	131	133	131
q7	2053	1675	1697	1675
q8	3148	3272	3255	3255
q9	8253	8294	8228	8228
q10	3764	3861	3830	3830
q11	572	455	443	443
q12	742	545	536	536
q13	11750	2980	2939	2939
q14	277	260	256	256
q15	515	483	479	479
q16	435	401	388	388
q17	1724	1737	1700	1700
q18	7667	7339	7257	7257
q19	1664	1670	1662	1662
q20	1969	1746	1741	1741
q21	5470	5256	5231	5231
q22	512	445	477	445
Total cold run time: 65337 ms
Total hot run time: 54548 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169444 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit e8f82a92abdbb73037b852598ea6be2c3cb8a92c, data reload: false

query1	919	375	375	375
query2	6472	1743	1712	1712
query3	6682	211	220	211
query4	18927	17608	17374	17374
query5	4325	520	523	520
query6	275	167	180	167
query7	4598	293	288	288
query8	261	188	199	188
query9	8477	2372	2368	2368
query10	428	276	271	271
query11	10497	10050	10064	10050
query12	146	91	106	91
query13	1631	379	376	376
query14	9224	7536	7590	7536
query15	209	160	163	160
query16	7103	438	506	438
query17	929	590	539	539
query18	1928	276	280	276
query19	192	145	148	145
query20	92	85	90	85
query21	203	105	100	100
query22	4221	4109	3843	3843
query23	33750	33009	32808	32808
query24	10328	3050	3042	3042
query25	680	386	384	384
query26	1760	149	151	149
query27	2925	281	279	279
query28	6939	1988	1980	1980
query29	1308	415	413	413
query30	284	149	155	149
query31	957	746	748	746
query32	100	53	57	53
query33	688	310	323	310
query34	914	486	491	486
query35	839	734	738	734
query36	991	867	861	861
query37	298	79	80	79
query38	2829	2802	2772	2772
query39	851	791	801	791
query40	285	114	111	111
query41	46	50	45	45
query42	122	100	102	100
query43	459	437	411	411
query44	1221	732	732	732
query45	204	179	176	176
query46	1094	823	790	790
query47	1783	1703	1744	1703
query48	369	293	289	289
query49	1194	454	448	448
query50	903	450	445	445
query51	6812	6748	6622	6622
query52	101	91	95	91
query53	255	191	190	190
query54	669	461	478	461
query55	82	78	77	77
query56	278	267	275	267
query57	1148	1061	1071	1061
query58	283	266	286	266
query59	2571	2463	2432	2432
query60	306	286	286	286
query61	163	95	92	92
query62	914	657	679	657
query63	227	201	189	189
query64	5882	1924	1936	1924
query65	3156	3098	3110	3098
query66	1450	342	341	341
query67	15278	14850	14995	14850
query68	4331	587	598	587
query69	447	316	291	291
query70	1104	1067	1083	1067
query71	361	277	271	271
query72	7068	2699	2480	2480
query73	772	333	326	326
query74	5971	5704	5623	5623
query75	3364	2742	2728	2728
query76	2312	1223	1307	1223
query77	440	324	313	313
query78	9426	8804	8914	8804
query79	1681	537	545	537
query80	1258	516	505	505
query81	537	231	232	231
query82	1136	134	129	129
query83	243	171	169	169
query84	267	80	79	79
query85	1282	308	298	298
query86	374	297	289	289
query87	3314	3139	3130	3130
query88	2955	2432	2421	2421
query89	380	294	296	294
query90	1782	200	198	198
query91	129	103	102	102
query92	62	50	53	50
query93	1442	624	622	622
query94	878	308	304	304
query95	389	275	268	268
query96	609	280	281	280
query97	3229	3105	3056	3056
query98	219	261	195	195
query99	1628	1291	1271	1271
Total cold run time: 260825 ms
Total hot run time: 169444 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.73 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit e8f82a92abdbb73037b852598ea6be2c3cb8a92c, data reload: false

query1	0.05	0.04	0.04
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.68	0.07	0.07
query5	0.49	0.48	0.48
query6	1.13	0.72	0.71
query7	0.02	0.01	0.01
query8	0.05	0.04	0.04
query9	0.58	0.51	0.52
query10	0.56	0.55	0.57
query11	0.16	0.12	0.12
query12	0.14	0.12	0.12
query13	0.62	0.60	0.60
query14	0.77	0.81	0.83
query15	0.90	0.86	0.90
query16	0.36	0.35	0.35
query17	0.96	1.01	0.98
query18	0.22	0.22	0.22
query19	1.89	1.76	1.74
query20	0.02	0.01	0.01
query21	15.42	0.76	0.65
query22	4.18	8.67	0.98
query23	17.87	1.34	1.32
query24	2.26	0.22	0.22
query25	0.18	0.09	0.08
query26	0.32	0.21	0.22
query27	0.46	0.23	0.23
query28	13.17	1.00	0.97
query29	12.64	3.29	3.32
query30	0.25	0.06	0.06
query31	2.87	0.42	0.40
query32	3.23	0.49	0.48
query33	2.93	2.92	2.96
query34	15.78	4.30	4.24
query35	4.30	4.29	4.31
query36	0.68	0.48	0.48
query37	0.19	0.17	0.16
query38	0.16	0.15	0.15
query39	0.04	0.03	0.04
query40	0.15	0.13	0.14
query41	0.09	0.05	0.05
query42	0.06	0.04	0.05
query43	0.05	0.04	0.04
Total cold run time: 108.17 s
Total hot run time: 29.73 s

@kaka11chen kaka11chen force-pushed the fix_and_opt_parquet_min_max-master branch from af7382c to 7ebb634 Compare August 7, 2024 09:14
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41876 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7ebb63433431ecf1668b172727a3833e0f19a8d9, data reload: false

------ Round 1 ----------------------------------
q1	18095	4206	4149	4149
q2	2643	212	198	198
q3	11876	1379	1407	1379
q4	11008	906	939	906
q5	8017	3045	2978	2978
q6	224	135	135	135
q7	1055	618	616	616
q8	9436	1824	1911	1824
q9	8464	6562	6612	6562
q10	8776	3853	3852	3852
q11	423	252	251	251
q12	410	237	236	236
q13	17772	2960	2978	2960
q14	275	241	246	241
q15	516	494	477	477
q16	503	407	390	390
q17	977	911	914	911
q18	7977	7262	7323	7262
q19	1399	1235	1212	1212
q20	558	328	333	328
q21	5434	4802	4728	4728
q22	357	291	281	281
Total cold run time: 116195 ms
Total hot run time: 41876 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4114	4067	4073	4067
q2	333	225	222	222
q3	3012	3040	2987	2987
q4	1909	1878	1861	1861
q5	5235	5204	5241	5204
q6	220	131	129	129
q7	2048	1712	1711	1711
q8	3177	3241	3224	3224
q9	8259	8312	8245	8245
q10	3784	3852	3859	3852
q11	542	458	450	450
q12	720	591	527	527
q13	13832	2939	2985	2939
q14	293	260	252	252
q15	513	483	479	479
q16	440	409	392	392
q17	1709	1700	1685	1685
q18	7726	7328	7187	7187
q19	1691	1672	1663	1663
q20	1957	1766	1749	1749
q21	5490	5276	5381	5276
q22	512	466	456	456
Total cold run time: 67516 ms
Total hot run time: 54557 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 168872 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7ebb63433431ecf1668b172727a3833e0f19a8d9, data reload: false

query1	929	381	375	375
query2	6473	1764	1711	1711
query3	6655	210	220	210
query4	19073	17469	17317	17317
query5	4324	520	527	520
query6	294	181	173	173
query7	4614	300	293	293
query8	269	211	188	188
query9	8521	2321	2314	2314
query10	440	275	267	267
query11	10633	10052	9968	9968
query12	144	85	87	85
query13	1611	387	367	367
query14	9889	6786	7466	6786
query15	208	166	163	163
query16	7093	479	469	469
query17	946	566	545	545
query18	1912	294	277	277
query19	192	143	141	141
query20	90	88	86	86
query21	209	103	99	99
query22	4454	4199	4056	4056
query23	34054	33008	33118	33008
query24	10265	3098	3128	3098
query25	676	380	382	380
query26	1743	168	152	152
query27	2812	282	281	281
query28	6843	1947	1935	1935
query29	1360	402	410	402
query30	283	148	153	148
query31	933	767	751	751
query32	100	56	56	56
query33	693	312	338	312
query34	899	480	479	479
query35	846	716	732	716
query36	1022	871	878	871
query37	287	79	79	79
query38	2883	2788	2871	2788
query39	867	815	815	815
query40	291	111	112	111
query41	50	46	47	46
query42	115	97	100	97
query43	463	437	438	437
query44	1184	714	721	714
query45	210	182	176	176
query46	1083	801	770	770
query47	1821	1734	1730	1730
query48	368	295	290	290
query49	1199	420	446	420
query50	884	436	435	435
query51	6808	6801	6642	6642
query52	104	93	91	91
query53	260	183	180	180
query54	642	465	453	453
query55	75	74	73	73
query56	294	259	258	258
query57	1172	1053	1045	1045
query58	280	271	267	267
query59	2654	2390	2455	2390
query60	304	273	272	272
query61	94	96	95	95
query62	936	667	664	664
query63	221	188	182	182
query64	5957	1900	1886	1886
query65	3183	3141	3126	3126
query66	1436	334	332	332
query67	15137	14758	14617	14617
query68	4411	560	579	560
query69	444	302	313	302
query70	1121	1053	1077	1053
query71	409	288	283	283
query72	7132	2675	2564	2564
query73	771	330	328	328
query74	6160	5669	5693	5669
query75	3351	2733	2737	2733
query76	2322	1219	1281	1219
query77	430	318	324	318
query78	9335	8976	8909	8909
query79	2011	531	526	526
query80	1238	542	501	501
query81	565	232	230	230
query82	1045	134	138	134
query83	242	172	171	171
query84	287	82	87	82
query85	1325	323	324	323
query86	462	326	308	308
query87	3344	3114	3152	3114
query88	3000	2413	2383	2383
query89	380	292	288	288
query90	1829	198	190	190
query91	131	102	103	102
query92	67	49	51	49
query93	1993	617	618	617
query94	913	311	308	308
query95	378	271	268	268
query96	594	279	286	279
query97	3192	3109	3117	3109
query98	216	198	196	196
query99	1646	1276	1279	1276
Total cold run time: 263508 ms
Total hot run time: 168872 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.16 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7ebb63433431ecf1668b172727a3833e0f19a8d9, data reload: false

query1	0.05	0.04	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.04
query4	1.68	0.07	0.07
query5	0.49	0.48	0.47
query6	1.14	0.71	0.72
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.58	0.51	0.52
query10	0.57	0.57	0.57
query11	0.16	0.12	0.11
query12	0.15	0.12	0.13
query13	0.60	0.61	0.60
query14	0.78	0.81	0.87
query15	0.94	0.88	0.87
query16	0.35	0.35	0.35
query17	1.03	0.99	1.02
query18	0.23	0.21	0.21
query19	1.89	1.78	1.75
query20	0.01	0.00	0.01
query21	15.39	0.74	0.66
query22	4.06	7.38	1.27
query23	17.83	1.37	1.37
query24	2.25	0.23	0.22
query25	0.19	0.08	0.08
query26	0.32	0.22	0.22
query27	0.46	0.23	0.23
query28	13.15	1.00	0.98
query29	12.63	3.35	3.30
query30	0.27	0.05	0.06
query31	2.87	0.42	0.41
query32	3.22	0.49	0.48
query33	2.96	2.97	2.95
query34	15.47	4.29	4.27
query35	4.32	4.31	4.33
query36	0.68	0.47	0.48
query37	0.19	0.17	0.16
query38	0.17	0.15	0.14
query39	0.04	0.03	0.04
query40	0.15	0.14	0.13
query41	0.10	0.04	0.05
query42	0.06	0.04	0.04
query43	0.05	0.04	0.04
Total cold run time: 107.85 s
Total hot run time: 30.16 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Aug 11, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@wuwenchi wuwenchi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit 433b84a into apache:master Aug 12, 2024
wyxxxcat pushed a commit to wyxxxcat/doris that referenced this pull request Aug 14, 2024
…ache#38277)

## Proposed changes

Refer to trino's implementation

- Some bugs in the historical version paquet-mr. Use
`CorruptStatistics::should_ignore_statistics()` to handle.

- The old version of parquet uses `min` and `max` stats, and later
implements `min_value` and `max_value`. `Min`/`max` stats cannot be used
for some types and in some cases. This is related to the comparison and
sorting method of values.

- If it is double or float, special cases such as NaN, -0, and 0 must be
handled.

- If the string type only has min and max stats, but no min_value or
max_value, use `ParquetPredicate::_try_read_old_utf8_stats()` to expand
the range reading optimization method for optimization.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Aug 14, 2024
…ache#38277)

## Proposed changes

Refer to trino's implementation

- Some bugs in the historical version paquet-mr. Use
`CorruptStatistics::should_ignore_statistics()` to handle.

- The old version of parquet uses `min` and `max` stats, and later
implements `min_value` and `max_value`. `Min`/`max` stats cannot be used
for some types and in some cases. This is related to the comparison and
sorting method of values.

- If it is double or float, special cases such as NaN, -0, and 0 must be
handled.

- If the string type only has min and max stats, but no min_value or
max_value, use `ParquetPredicate::_try_read_old_utf8_stats()` to expand
the range reading optimization method for optimization.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Oct 9, 2024
…ache#38277)

Refer to trino's implementation

- Some bugs in the historical version paquet-mr. Use
`CorruptStatistics::should_ignore_statistics()` to handle.

- The old version of parquet uses `min` and `max` stats, and later
implements `min_value` and `max_value`. `Min`/`max` stats cannot be used
for some types and in some cases. This is related to the comparison and
sorting method of values.

- If it is double or float, special cases such as NaN, -0, and 0 must be
handled.

- If the string type only has min and max stats, but no min_value or
max_value, use `ParquetPredicate::_try_read_old_utf8_stats()` to expand
the range reading optimization method for optimization.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.6-merged dev/3.0.3-merged doing reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants