Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Apr 24, 2025

What problem does this PR solve?

Related PR: apache/doris-thirdparty#309 apache/doris-thirdparty#310

Problem Summary:
When using an older version of pyorc (e.g., pyorc-0.3.0), If there are null values in the data, a present stream will be generated for the top level struct column.
However, this behavior does not occur in newer versions of pyorc (e.g., pyorc-0.10.0) or in ORC files generated by tools like Hive or Spark.
Therefore, the present stream generated by the older version causes the ORC file to be read twice during late materialization, resulting in an error 'bad read in next buffer' during the second read. The current solution is to avoid reading the present stream if it is in the top level struct column.

Release note

Fixed an issue where repeated access to the present stream within a top-level struct column would fail during late materialization. This was addressed by avoiding the unnecessary reading of the present stream when it is part of the top-level struct column.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Apr 24, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaka11chen kaka11chen force-pushed the fix_top_struct_column_present_stream_when_late_mat branch from b867139 to 6099567 Compare April 24, 2025 01:30
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33766 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6099567126fb7d50eaa1300c5488590d5d2d452d, data reload: false

------ Round 1 ----------------------------------
q1	25682	5059	5105	5059
q2	2062	267	186	186
q3	10396	1230	711	711
q4	10240	1014	530	530
q5	7538	2287	2297	2287
q6	179	158	130	130
q7	925	753	590	590
q8	9302	1241	1068	1068
q9	6846	5095	5057	5057
q10	6830	2286	1878	1878
q11	464	277	268	268
q12	341	361	215	215
q13	17764	3629	3079	3079
q14	219	219	207	207
q15	527	491	470	470
q16	452	449	393	393
q17	580	832	346	346
q18	7420	7142	7056	7056
q19	1213	949	556	556
q20	327	313	222	222
q21	3935	3590	2431	2431
q22	1036	1027	1047	1027
Total cold run time: 114278 ms
Total hot run time: 33766 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5094	5060	5034	5034
q2	233	340	228	228
q3	2234	2695	2379	2379
q4	1493	1849	1401	1401
q5	4343	4351	4397	4351
q6	214	167	125	125
q7	1962	1912	1780	1780
q8	2562	2520	2481	2481
q9	7168	7081	6925	6925
q10	3011	3193	2726	2726
q11	572	513	470	470
q12	660	767	626	626
q13	3429	3912	3251	3251
q14	276	282	282	282
q15	526	496	482	482
q16	484	509	463	463
q17	1131	1514	1396	1396
q18	7603	7670	7528	7528
q19	803	837	808	808
q20	2077	1977	1803	1803
q21	5274	4851	4807	4807
q22	1058	1059	1029	1029
Total cold run time: 52207 ms
Total hot run time: 50375 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191304 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6099567126fb7d50eaa1300c5488590d5d2d452d, data reload: false

query1	1409	1067	1055	1055
query2	6103	1771	1794	1771
query3	11027	4343	4645	4343
query4	25218	23630	22925	22925
query5	4364	619	484	484
query6	298	194	194	194
query7	3993	492	272	272
query8	289	239	219	219
query9	8497	2569	2581	2569
query10	464	309	249	249
query11	15262	15047	14918	14918
query12	154	117	111	111
query13	1560	520	396	396
query14	8870	6125	6101	6101
query15	196	183	189	183
query16	7462	636	475	475
query17	1163	711	571	571
query18	1984	418	319	319
query19	204	190	164	164
query20	124	122	115	115
query21	206	123	111	111
query22	4555	4543	4354	4354
query23	34449	33290	33560	33290
query24	8608	2449	2368	2368
query25	502	451	398	398
query26	1176	283	160	160
query27	2969	501	339	339
query28	4881	2153	2148	2148
query29	700	571	467	467
query30	280	224	186	186
query31	916	905	777	777
query32	80	61	63	61
query33	560	377	323	323
query34	817	869	513	513
query35	793	830	752	752
query36	966	1027	900	900
query37	120	106	92	92
query38	4201	4261	4150	4150
query39	1484	1430	1438	1430
query40	222	138	114	114
query41	100	53	57	53
query42	126	104	119	104
query43	501	505	481	481
query44	1318	824	835	824
query45	182	180	165	165
query46	855	1032	621	621
query47	1888	1865	1794	1794
query48	416	419	312	312
query49	754	494	459	459
query50	652	722	392	392
query51	4210	4167	4179	4167
query52	106	108	90	90
query53	225	257	180	180
query54	573	570	511	511
query55	82	83	82	82
query56	306	323	300	300
query57	1160	1171	1160	1160
query58	277	261	268	261
query59	2607	2752	2722	2722
query60	324	319	304	304
query61	133	129	127	127
query62	775	753	676	676
query63	231	183	191	183
query64	4189	1057	698	698
query65	4440	4321	4349	4321
query66	1023	396	339	339
query67	15913	15875	15304	15304
query68	8815	886	530	530
query69	484	298	265	265
query70	1147	1124	1127	1124
query71	459	330	277	277
query72	5628	4766	4694	4694
query73	713	582	353	353
query74	8847	9196	8600	8600
query75	4060	3188	2666	2666
query76	3758	1201	775	775
query77	792	372	282	282
query78	9990	10244	9252	9252
query79	1905	805	559	559
query80	641	515	428	428
query81	473	254	218	218
query82	429	125	96	96
query83	257	249	232	232
query84	265	100	88	88
query85	825	360	301	301
query86	336	275	292	275
query87	4363	4432	4301	4301
query88	3262	2226	2214	2214
query89	401	312	274	274
query90	1921	207	218	207
query91	149	139	114	114
query92	73	60	57	57
query93	1332	939	583	583
query94	666	419	361	361
query95	366	307	277	277
query96	497	560	272	272
query97	3134	3208	3134	3134
query98	217	216	203	203
query99	1434	1370	1295	1295
Total cold run time: 278131 ms
Total hot run time: 191304 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6099567126fb7d50eaa1300c5488590d5d2d452d, data reload: false

query1	0.04	0.04	0.03
query2	0.13	0.10	0.11
query3	0.26	0.20	0.20
query4	1.60	0.18	0.19
query5	0.61	0.60	0.58
query6	1.17	0.71	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.58	0.54	0.52
query10	0.58	0.55	0.57
query11	0.16	0.12	0.11
query12	0.15	0.11	0.11
query13	0.62	0.59	0.60
query14	1.25	1.17	1.16
query15	0.87	0.84	0.84
query16	0.39	0.40	0.39
query17	1.05	1.03	1.04
query18	0.21	0.20	0.19
query19	1.94	1.83	1.84
query20	0.01	0.01	0.02
query21	15.40	0.92	0.56
query22	0.75	1.25	0.70
query23	14.81	1.39	0.62
query24	7.54	1.71	1.17
query25	0.56	0.07	0.13
query26	0.64	0.16	0.14
query27	0.05	0.05	0.05
query28	10.09	0.79	0.44
query29	12.56	3.96	3.27
query30	0.25	0.09	0.06
query31	2.83	0.58	0.38
query32	3.22	0.54	0.46
query33	3.08	3.06	3.07
query34	16.07	5.08	4.47
query35	4.54	4.52	4.46
query36	0.66	0.51	0.49
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.14	0.12
query41	0.08	0.03	0.02
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.21 s
Total hot run time: 29.89 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.54% (14534/27146)
Line Coverage 42.36% (126064/297583)
Region Coverage 41.17% (64422/156471)
Branch Coverage 35.77% (32397/90582)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage
Line Coverage
Region Coverage
Branch Coverage

@kaka11chen kaka11chen marked this pull request as draft April 24, 2025 17:12
@kaka11chen kaka11chen force-pushed the fix_top_struct_column_present_stream_when_late_mat branch from 6099567 to 8a3f306 Compare April 24, 2025 17:43
@kaka11chen
Copy link
Contributor Author

run buildall

1 similar comment
@kaka11chen
Copy link
Contributor Author

run buildall

@kaka11chen kaka11chen force-pushed the fix_top_struct_column_present_stream_when_late_mat branch from 8a3f306 to 327c7df Compare April 25, 2025 02:19
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33969 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 327c7dfb3ec25565196faf5f84c692081cb3f6e1, data reload: false

------ Round 1 ----------------------------------
q1	26150	5080	4990	4990
q2	2060	267	171	171
q3	10412	1202	697	697
q4	10213	984	536	536
q5	7517	2331	2350	2331
q6	179	162	134	134
q7	897	741	632	632
q8	9309	1313	1153	1153
q9	6963	5142	5103	5103
q10	6873	2287	1899	1899
q11	478	283	274	274
q12	344	346	215	215
q13	17770	3692	3119	3119
q14	222	225	204	204
q15	526	503	484	484
q16	452	436	397	397
q17	600	859	364	364
q18	7519	7150	7077	7077
q19	1226	966	549	549
q20	329	334	220	220
q21	3987	2608	2453	2453
q22	1077	1022	967	967
Total cold run time: 115103 ms
Total hot run time: 33969 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5042	5091	5059	5059
q2	242	328	235	235
q3	2155	2612	2287	2287
q4	1416	1804	1428	1428
q5	4414	4348	4383	4348
q6	211	170	127	127
q7	2005	1928	1761	1761
q8	2580	2507	2512	2507
q9	7282	7258	6873	6873
q10	3044	3123	2742	2742
q11	562	514	486	486
q12	655	737	632	632
q13	3465	3840	3743	3743
q14	281	288	275	275
q15	525	489	488	488
q16	498	494	459	459
q17	1112	1594	1334	1334
q18	7729	7575	7492	7492
q19	779	784	821	784
q20	1972	2094	1821	1821
q21	5134	4820	4870	4820
q22	1116	1078	1022	1022
Total cold run time: 52219 ms
Total hot run time: 50723 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192460 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 327c7dfb3ec25565196faf5f84c692081cb3f6e1, data reload: false

query1	1422	1059	1066	1059
query2	6325	1807	1812	1807
query3	11109	4648	4547	4547
query4	57423	26130	23552	23552
query5	5021	484	458	458
query6	322	201	204	201
query7	4936	483	282	282
query8	314	245	239	239
query9	5943	2585	2594	2585
query10	437	325	268	268
query11	15025	15051	14715	14715
query12	159	108	107	107
query13	1065	509	392	392
query14	10247	6341	6247	6247
query15	196	200	185	185
query16	7282	644	516	516
query17	1147	757	628	628
query18	1827	410	302	302
query19	188	188	166	166
query20	127	136	115	115
query21	207	133	110	110
query22	4385	4521	4412	4412
query23	34164	33435	33391	33391
query24	6773	2407	2419	2407
query25	458	498	412	412
query26	694	273	158	158
query27	2329	491	340	340
query28	3207	2162	2127	2127
query29	552	556	418	418
query30	266	221	198	198
query31	876	887	790	790
query32	75	60	64	60
query33	447	367	308	308
query34	749	853	517	517
query35	805	848	772	772
query36	963	1000	920	920
query37	120	103	79	79
query38	4170	4205	4197	4197
query39	1516	1410	1429	1410
query40	216	128	108	108
query41	59	53	53	53
query42	122	113	108	108
query43	485	499	488	488
query44	1310	810	804	804
query45	191	175	171	171
query46	833	1008	623	623
query47	1878	1956	1811	1811
query48	398	412	317	317
query49	676	496	391	391
query50	659	698	411	411
query51	4192	4201	4218	4201
query52	110	109	100	100
query53	240	265	186	186
query54	591	573	509	509
query55	82	78	84	78
query56	300	299	277	277
query57	1147	1206	1149	1149
query58	265	247	256	247
query59	2692	2862	2672	2672
query60	327	314	325	314
query61	133	165	128	128
query62	773	773	675	675
query63	228	186	184	184
query64	1973	1051	701	701
query65	4440	4244	4211	4211
query66	714	402	298	298
query67	15877	15731	15273	15273
query68	6225	809	505	505
query69	538	302	265	265
query70	1141	1142	1111	1111
query71	477	328	286	286
query72	5924	4676	4637	4637
query73	1286	589	336	336
query74	8904	8971	8908	8908
query75	3523	3212	2707	2707
query76	3727	1177	735	735
query77	552	370	288	288
query78	9961	10021	9368	9368
query79	2695	799	566	566
query80	609	503	448	448
query81	480	258	232	232
query82	439	123	95	95
query83	373	244	224	224
query84	289	103	92	92
query85	772	359	358	358
query86	413	327	273	273
query87	4342	4381	4296	4296
query88	3341	2183	2204	2183
query89	416	327	282	282
query90	1854	209	212	209
query91	137	144	110	110
query92	79	61	54	54
query93	2067	930	581	581
query94	690	424	310	310
query95	365	305	288	288
query96	477	569	277	277
query97	3192	3220	3085	3085
query98	247	213	200	200
query99	1461	1446	1266	1266
Total cold run time: 301178 ms
Total hot run time: 192460 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 327c7dfb3ec25565196faf5f84c692081cb3f6e1, data reload: false

query1	0.04	0.03	0.03
query2	0.12	0.10	0.12
query3	0.26	0.19	0.19
query4	1.59	0.20	0.20
query5	0.59	0.59	0.60
query6	1.20	0.72	0.73
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.57	0.51	0.52
query10	0.56	0.56	0.55
query11	0.16	0.10	0.11
query12	0.15	0.11	0.12
query13	0.62	0.61	0.59
query14	1.15	1.17	1.20
query15	0.88	0.84	0.86
query16	0.39	0.38	0.37
query17	1.03	1.05	1.04
query18	0.22	0.20	0.20
query19	1.93	1.80	1.82
query20	0.01	0.01	0.01
query21	15.39	0.94	0.56
query22	0.76	1.25	0.73
query23	14.83	1.39	0.65
query24	6.74	1.96	0.78
query25	0.50	0.18	0.07
query26	0.49	0.17	0.14
query27	0.05	0.04	0.05
query28	9.86	0.85	0.45
query29	12.58	4.02	3.36
query30	0.25	0.09	0.07
query31	2.81	0.58	0.37
query32	3.23	0.55	0.47
query33	3.00	3.06	3.02
query34	15.88	5.11	4.49
query35	4.57	4.51	4.49
query36	0.66	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.13
query41	0.08	0.03	0.03
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.62 s
Total hot run time: 29.68 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.76% (14597/27151)
Line Coverage 42.59% (126715/297531)
Region Coverage 41.39% (64781/156508)
Branch Coverage 35.94% (32563/90610)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.22% (14723/26661)
Line Coverage 44.63% (132558/297042)
Region Coverage 41.74% (76358/182920)
Branch Coverage 35.77% (36904/103158)

@kaka11chen kaka11chen force-pushed the fix_top_struct_column_present_stream_when_late_mat branch from 327c7df to f4cb1d4 Compare April 25, 2025 13:24
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34261 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f4cb1d4648e705301339bd2cdc6dd2c50d7a7a21, data reload: false

------ Round 1 ----------------------------------
q1	26111	5144	5273	5144
q2	2092	280	191	191
q3	10422	1251	728	728
q4	10247	1041	559	559
q5	8704	2369	2413	2369
q6	272	164	132	132
q7	921	745	613	613
q8	9318	1292	1070	1070
q9	6868	5056	5083	5056
q10	6832	2306	1885	1885
q11	480	279	286	279
q12	351	357	222	222
q13	17801	3748	3100	3100
q14	229	227	209	209
q15	527	501	491	491
q16	450	456	408	408
q17	597	886	372	372
q18	7500	7278	7123	7123
q19	1361	963	574	574
q20	351	336	219	219
q21	4371	3349	2540	2540
q22	1048	1001	977	977
Total cold run time: 116853 ms
Total hot run time: 34261 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5524	5117	5137	5117
q2	242	328	231	231
q3	2201	2695	2302	2302
q4	1423	1967	1540	1540
q5	4632	4426	4335	4335
q6	211	166	124	124
q7	1984	1937	1794	1794
q8	2604	2570	2456	2456
q9	7138	7141	7089	7089
q10	3048	3220	2743	2743
q11	567	498	500	498
q12	683	757	626	626
q13	3479	3993	3262	3262
q14	284	288	270	270
q15	511	479	477	477
q16	468	510	470	470
q17	1166	1572	1390	1390
q18	7641	7514	7482	7482
q19	802	835	877	835
q20	1898	1956	1803	1803
q21	5200	4705	4639	4639
q22	1083	1036	990	990
Total cold run time: 52789 ms
Total hot run time: 50473 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186247 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f4cb1d4648e705301339bd2cdc6dd2c50d7a7a21, data reload: false

query1	1009	467	506	467
query2	6562	1914	1890	1890
query3	6744	222	212	212
query4	25948	23394	23429	23394
query5	4301	618	472	472
query6	299	188	183	183
query7	4613	495	278	278
query8	273	230	215	215
query9	8596	2539	2545	2539
query10	483	318	280	280
query11	15275	15108	14856	14856
query12	154	113	116	113
query13	1657	515	405	405
query14	8766	6224	6201	6201
query15	212	183	174	174
query16	7222	646	495	495
query17	1217	747	595	595
query18	1976	399	308	308
query19	201	190	175	175
query20	119	119	118	118
query21	216	129	110	110
query22	4117	4119	4236	4119
query23	34137	33146	33054	33054
query24	8469	2335	2364	2335
query25	560	511	390	390
query26	1242	267	158	158
query27	2753	524	321	321
query28	4343	2108	2100	2100
query29	776	534	498	498
query30	282	216	191	191
query31	905	860	758	758
query32	72	65	71	65
query33	576	359	311	311
query34	784	833	515	515
query35	780	816	742	742
query36	958	995	883	883
query37	113	99	81	81
query38	4184	4223	4160	4160
query39	1466	1406	1431	1406
query40	211	119	106	106
query41	56	52	51	51
query42	114	104	105	104
query43	513	521	477	477
query44	1269	792	796	792
query45	179	175	169	169
query46	833	1021	608	608
query47	1764	1805	1707	1707
query48	373	400	286	286
query49	747	512	409	409
query50	646	686	392	392
query51	4288	4146	4053	4053
query52	111	103	95	95
query53	228	248	175	175
query54	574	565	498	498
query55	79	81	79	79
query56	350	305	275	275
query57	1122	1151	1086	1086
query58	262	253	252	252
query59	2712	2751	2674	2674
query60	321	315	300	300
query61	129	126	125	125
query62	804	719	666	666
query63	223	208	188	188
query64	4375	1015	663	663
query65	4312	4207	4281	4207
query66	1161	411	306	306
query67	15755	15554	15396	15396
query68	8052	883	509	509
query69	464	302	271	271
query70	1236	1148	1111	1111
query71	465	314	313	313
query72	5649	4806	4925	4806
query73	746	644	341	341
query74	9155	8849	8606	8606
query75	3744	3206	2800	2800
query76	3656	1187	746	746
query77	794	376	286	286
query78	9975	10190	9257	9257
query79	1899	807	560	560
query80	580	513	451	451
query81	472	258	216	216
query82	428	128	98	98
query83	285	248	240	240
query84	292	109	83	83
query85	801	360	314	314
query86	351	320	278	278
query87	4401	4353	4375	4353
query88	2778	2211	2192	2192
query89	373	317	285	285
query90	1984	211	208	208
query91	139	138	111	111
query92	80	60	55	55
query93	1112	942	576	576
query94	668	403	311	311
query95	374	289	285	285
query96	476	578	272	272
query97	3209	3223	3091	3091
query98	238	203	200	200
query99	1455	1433	1316	1316
Total cold run time: 272202 ms
Total hot run time: 186247 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.03 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f4cb1d4648e705301339bd2cdc6dd2c50d7a7a21, data reload: false

query1	0.04	0.04	0.03
query2	0.11	0.10	0.11
query3	0.25	0.19	0.19
query4	1.59	0.19	0.19
query5	0.58	0.57	0.58
query6	1.16	0.72	0.71
query7	0.03	0.02	0.02
query8	0.04	0.03	0.04
query9	0.56	0.53	0.52
query10	0.58	0.57	0.58
query11	0.15	0.11	0.11
query12	0.15	0.11	0.12
query13	0.62	0.59	0.59
query14	1.17	1.17	1.22
query15	0.87	0.83	0.85
query16	0.40	0.39	0.38
query17	1.05	1.07	1.04
query18	0.21	0.19	0.20
query19	2.00	1.83	1.83
query20	0.02	0.01	0.01
query21	15.45	0.92	0.53
query22	0.78	1.28	0.79
query23	14.75	1.41	0.62
query24	6.86	1.69	1.09
query25	0.53	0.17	0.14
query26	0.52	0.16	0.13
query27	0.05	0.05	0.04
query28	10.32	0.87	0.43
query29	12.52	3.98	3.33
query30	0.25	0.09	0.06
query31	2.83	0.60	0.37
query32	3.23	0.56	0.48
query33	2.98	3.01	3.10
query34	15.86	5.11	4.52
query35	4.47	4.51	4.48
query36	0.68	0.49	0.49
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.17	0.14	0.14
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.15 s
Total hot run time: 30.03 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.13% (14682/27124)
Line Coverage 43.00% (127855/297342)
Region Coverage 41.85% (65476/156437)
Branch Coverage 36.45% (33010/90556)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.34% (14737/26630)
Line Coverage 44.75% (132843/296850)
Region Coverage 41.83% (76496/182860)
Branch Coverage 35.91% (37024/103110)

@kaka11chen kaka11chen force-pushed the fix_top_struct_column_present_stream_when_late_mat branch from f4cb1d4 to 05e5511 Compare April 25, 2025 18:05
@kaka11chen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34301 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 05e551129542f8dfeef1ecf9149f4179265d45d5, data reload: false

------ Round 1 ----------------------------------
q1	26221	5140	5087	5087
q2	2074	288	182	182
q3	10437	1261	717	717
q4	10233	1015	540	540
q5	7573	2390	2348	2348
q6	179	162	132	132
q7	954	748	622	622
q8	9312	1290	1112	1112
q9	6806	5166	5099	5099
q10	6862	2313	1881	1881
q11	485	290	271	271
q12	360	365	238	238
q13	17796	3698	3113	3113
q14	232	222	213	213
q15	527	491	494	491
q16	452	449	405	405
q17	610	876	379	379
q18	7872	7294	7224	7224
q19	1611	984	563	563
q20	336	331	230	230
q21	4098	3454	2502	2502
q22	1053	1013	952	952
Total cold run time: 116083 ms
Total hot run time: 34301 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5244	5157	5172	5157
q2	257	336	235	235
q3	2171	2680	2315	2315
q4	1449	1859	1474	1474
q5	4516	4458	4393	4393
q6	221	181	138	138
q7	2044	1963	1765	1765
q8	2626	2590	2594	2590
q9	7291	7174	7189	7174
q10	2977	3159	2727	2727
q11	592	514	484	484
q12	684	794	654	654
q13	3542	3903	3404	3404
q14	275	291	272	272
q15	578	514	499	499
q16	482	540	466	466
q17	1178	1544	1406	1406
q18	7758	7544	7536	7536
q19	817	836	876	836
q20	1954	2034	1867	1867
q21	5500	4880	4959	4880
q22	1090	1034	1009	1009
Total cold run time: 53246 ms
Total hot run time: 51281 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192765 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 05e551129542f8dfeef1ecf9149f4179265d45d5, data reload: false

query1	1407	1096	1083	1083
query2	6464	1903	1875	1875
query3	10983	4658	4522	4522
query4	25550	24149	23173	23173
query5	4761	633	454	454
query6	294	208	208	208
query7	3978	479	281	281
query8	290	243	219	219
query9	8515	2574	2576	2574
query10	493	307	265	265
query11	15756	15098	14947	14947
query12	175	123	108	108
query13	1566	539	412	412
query14	9217	6103	6086	6086
query15	205	180	177	177
query16	7383	624	456	456
query17	1133	802	589	589
query18	1987	422	328	328
query19	221	189	179	179
query20	128	132	127	127
query21	214	125	104	104
query22	4305	4597	4331	4331
query23	34309	33464	33461	33461
query24	9041	2365	2409	2365
query25	511	456	373	373
query26	1231	276	152	152
query27	2783	518	337	337
query28	4694	2176	2156	2156
query29	701	583	434	434
query30	274	225	190	190
query31	924	874	785	785
query32	86	68	66	66
query33	546	376	308	308
query34	804	881	532	532
query35	840	839	809	809
query36	975	1001	894	894
query37	118	98	75	75
query38	4178	4253	4229	4229
query39	1536	1446	1457	1446
query40	233	125	111	111
query41	56	54	54	54
query42	113	106	117	106
query43	508	494	502	494
query44	1297	842	830	830
query45	180	174	166	166
query46	853	1017	631	631
query47	1813	1897	1821	1821
query48	385	421	309	309
query49	737	555	477	477
query50	662	715	419	419
query51	4172	4469	4268	4268
query52	109	113	107	107
query53	231	258	191	191
query54	597	594	536	536
query55	86	86	88	86
query56	335	323	298	298
query57	1208	1221	1122	1122
query58	271	261	260	260
query59	2896	2839	2802	2802
query60	364	349	330	330
query61	162	149	176	149
query62	771	743	684	684
query63	231	191	189	189
query64	4194	1032	691	691
query65	4395	4323	4341	4323
query66	1116	421	305	305
query67	16009	15597	15226	15226
query68	8972	891	512	512
query69	477	300	265	265
query70	1211	1163	1078	1078
query71	478	323	302	302
query72	5353	4744	4641	4641
query73	720	585	358	358
query74	8921	8967	8934	8934
query75	4205	3214	2725	2725
query76	3747	1205	758	758
query77	800	365	279	279
query78	10005	10145	9288	9288
query79	2354	833	571	571
query80	635	512	527	512
query81	481	263	217	217
query82	464	130	95	95
query83	272	257	238	238
query84	291	102	89	89
query85	792	361	313	313
query86	339	310	290	290
query87	4381	4412	4243	4243
query88	2889	2219	2230	2219
query89	446	341	283	283
query90	2001	215	220	215
query91	203	150	112	112
query92	76	59	56	56
query93	2163	972	581	581
query94	687	412	309	309
query95	377	297	291	291
query96	493	577	282	282
query97	3145	3228	3128	3128
query98	222	210	203	203
query99	1448	1387	1283	1283
Total cold run time: 281608 ms
Total hot run time: 192765 ms

@doris-robot
Copy link

TPC-H: Total hot run time: 33706 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bb6c81a84aefaafd36797f9a4d6b8b934c64e53e, data reload: false

------ Round 1 ----------------------------------
q1	25912	5082	5034	5034
q2	2065	281	188	188
q3	10384	1210	707	707
q4	10222	980	522	522
q5	7530	2352	2379	2352
q6	175	163	131	131
q7	909	732	604	604
q8	9295	1286	1033	1033
q9	6752	5091	5037	5037
q10	6864	2298	1863	1863
q11	471	277	264	264
q12	350	350	207	207
q13	17781	3687	3105	3105
q14	222	223	212	212
q15	548	494	492	492
q16	415	424	373	373
q17	596	846	360	360
q18	7618	7081	7120	7081
q19	1695	948	528	528
q20	324	343	213	213
q21	3867	3304	2423	2423
q22	1048	1008	977	977
Total cold run time: 115043 ms
Total hot run time: 33706 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5223	5125	5123	5123
q2	233	318	224	224
q3	2103	2664	2266	2266
q4	1361	1764	1381	1381
q5	4420	4413	4411	4411
q6	234	167	128	128
q7	2047	1881	1739	1739
q8	2614	2558	2511	2511
q9	7319	7263	6906	6906
q10	3030	3181	2770	2770
q11	586	500	490	490
q12	659	755	605	605
q13	3476	3930	3289	3289
q14	277	296	279	279
q15	519	488	489	488
q16	458	493	429	429
q17	1139	1586	1367	1367
q18	7727	7646	7496	7496
q19	814	820	819	819
q20	1981	2060	1838	1838
q21	5167	4853	4744	4744
q22	1057	1056	1017	1017
Total cold run time: 52444 ms
Total hot run time: 50320 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192638 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bb6c81a84aefaafd36797f9a4d6b8b934c64e53e, data reload: false

query1	1390	1076	1029	1029
query2	6176	1775	1729	1729
query3	11177	4618	4621	4618
query4	53595	24874	22984	22984
query5	5082	563	474	474
query6	348	210	210	210
query7	4904	515	292	292
query8	317	262	237	237
query9	5398	2590	2564	2564
query10	439	309	268	268
query11	14955	14997	14843	14843
query12	170	119	110	110
query13	1086	515	425	425
query14	10140	6303	6245	6245
query15	191	201	175	175
query16	7123	671	525	525
query17	1069	716	552	552
query18	1541	415	305	305
query19	201	180	174	174
query20	131	123	118	118
query21	201	128	116	116
query22	4412	4496	4446	4446
query23	34122	33395	33552	33395
query24	6509	2386	2425	2386
query25	472	459	422	422
query26	680	280	154	154
query27	2214	513	335	335
query28	2926	2138	2145	2138
query29	559	564	433	433
query30	268	214	195	195
query31	859	869	779	779
query32	67	66	59	59
query33	444	371	305	305
query34	781	850	544	544
query35	783	838	777	777
query36	950	998	904	904
query37	119	103	78	78
query38	4135	4237	4235	4235
query39	1484	1469	1426	1426
query40	207	119	104	104
query41	56	56	54	54
query42	122	104	105	104
query43	490	491	471	471
query44	1296	802	800	800
query45	189	171	168	168
query46	832	1035	648	648
query47	1880	1876	1793	1793
query48	382	417	320	320
query49	696	523	424	424
query50	669	714	422	422
query51	4285	4178	4134	4134
query52	112	108	101	101
query53	236	271	199	199
query54	593	592	521	521
query55	87	83	80	80
query56	300	334	299	299
query57	1175	1205	1174	1174
query58	268	259	271	259
query59	2571	2589	2670	2589
query60	331	341	307	307
query61	159	132	161	132
query62	729	761	691	691
query63	229	204	191	191
query64	1631	1056	686	686
query65	4462	4244	4222	4222
query66	737	409	299	299
query67	15862	15651	15733	15651
query68	6531	882	511	511
query69	535	307	264	264
query70	1205	1164	1081	1081
query71	447	311	308	308
query72	5752	4883	4888	4883
query73	1322	656	351	351
query74	8907	9184	8872	8872
query75	3190	3207	2731	2731
query76	3846	1181	758	758
query77	550	397	278	278
query78	10040	9894	9331	9331
query79	2848	784	567	567
query80	888	596	441	441
query81	498	249	215	215
query82	724	129	101	101
query83	288	243	241	241
query84	292	106	87	87
query85	779	353	313	313
query86	437	325	274	274
query87	4408	4368	4401	4368
query88	3636	2187	2151	2151
query89	412	319	283	283
query90	1591	210	211	210
query91	138	139	110	110
query92	77	62	56	56
query93	2602	943	579	579
query94	828	405	305	305
query95	371	293	292	292
query96	489	554	273	273
query97	3230	3245	3072	3072
query98	231	224	203	203
query99	1428	1376	1282	1282
Total cold run time: 296524 ms
Total hot run time: 192638 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bb6c81a84aefaafd36797f9a4d6b8b934c64e53e, data reload: false

query1	0.04	0.04	0.03
query2	0.13	0.11	0.10
query3	0.26	0.20	0.20
query4	1.60	0.19	0.20
query5	0.58	0.58	0.59
query6	1.18	0.71	0.72
query7	0.03	0.01	0.02
query8	0.04	0.03	0.03
query9	0.57	0.51	0.53
query10	0.57	0.58	0.57
query11	0.15	0.12	0.11
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	0.78	0.80	0.80
query15	0.88	0.85	0.87
query16	0.38	0.39	0.38
query17	1.05	1.03	1.04
query18	0.21	0.19	0.19
query19	1.92	1.77	1.78
query20	0.01	0.02	0.01
query21	15.39	0.87	0.57
query22	0.77	1.22	0.67
query23	14.87	1.38	0.61
query24	6.55	1.71	1.28
query25	0.49	0.20	0.13
query26	0.57	0.18	0.15
query27	0.05	0.04	0.05
query28	9.68	0.89	0.44
query29	12.56	4.00	3.37
query30	0.26	0.10	0.07
query31	2.80	0.60	0.38
query32	3.24	0.55	0.45
query33	3.00	3.07	3.05
query34	15.87	5.13	4.56
query35	4.51	4.50	4.54
query36	0.66	0.50	0.48
query37	0.09	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.02
Total cold run time: 102.9 s
Total hot run time: 29.83 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.67% (14740/26961)
Line Coverage 43.74% (129077/295088)
Region Coverage 42.47% (65898/155154)
Branch Coverage 37.04% (33203/89648)

@hello-stephen
Copy link
Contributor

BE Regression P0 && UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 78.27% (20714/26465)
Line Coverage 71.73% (211310/294604)
Region Coverage 69.91% (126938/181568)
Branch Coverage 63.15% (64543/102206)

@kaka11chen kaka11chen marked this pull request as ready for review April 29, 2025 02:27
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 29, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 0a8b978 into apache:master May 7, 2025
27 of 29 checks passed
@morningman morningman added the usercase Important user case type label label May 7, 2025
yiguolei pushed a commit that referenced this pull request May 7, 2025
…sent stream failing to access repeatedly when late materialization occurs. (#50651)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note
Cherry-pick #50358

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
@yiguolei yiguolei mentioned this pull request May 13, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…sent stream failing to access repeatedly when late materialization occurs. (apache#50358)

### What problem does this PR solve?

Related PR: apache/doris-thirdparty#309
apache/doris-thirdparty#310

Problem Summary:
When using an older version of pyorc (e.g., pyorc-0.3.0), If there are
null values in the data, a present stream will be generated for the top
level struct column.
However, this behavior does not occur in newer versions of pyorc (e.g.,
pyorc-0.10.0) or in ORC files generated by tools like Hive or Spark.
Therefore, the present stream generated by the older version causes the
ORC file to be read twice during late materialization, resulting in an
error 'bad read in next buffer' during the second read. The current
solution is to avoid reading the present stream if it is in the top
level struct column.
@gavinchou gavinchou mentioned this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants