Skip to content

Conversation

@morningman
Copy link
Contributor

@morningman morningman commented Mar 26, 2025

What problem does this PR solve?

Problem Summary:

If there is an empty block in lzo file, the decompress may fail:
Lzo decompression failed: MalformedInputException at :13

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

Fix a bug that compressed size may be read wrong.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Mar 26, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34238 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 352a21a39a349b9ce8c67309ecd734d6d897e483, data reload: false

------ Round 1 ----------------------------------
q1	25954	5062	5047	5047
q2	2071	286	163	163
q3	10400	1247	692	692
q4	10232	1005	517	517
q5	7532	2342	2330	2330
q6	191	161	134	134
q7	935	739	611	611
q8	9311	1262	1155	1155
q9	6841	5116	5057	5057
q10	6884	2307	1904	1904
q11	486	270	263	263
q12	348	352	218	218
q13	17772	3714	3077	3077
q14	231	223	211	211
q15	550	513	518	513
q16	631	614	582	582
q17	577	846	349	349
q18	7506	7379	7270	7270
q19	1568	950	553	553
q20	316	325	198	198
q21	4062	2581	2406	2406
q22	1074	1005	988	988
Total cold run time: 115472 ms
Total hot run time: 34238 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5200	5231	5177	5177
q2	232	321	226	226
q3	2155	2652	2281	2281
q4	1450	1849	1400	1400
q5	4553	4417	4415	4415
q6	220	164	126	126
q7	1996	1880	1774	1774
q8	2576	2646	2510	2510
q9	7211	7200	7119	7119
q10	2957	3214	2788	2788
q11	575	495	502	495
q12	697	773	611	611
q13	3462	3938	3303	3303
q14	277	293	278	278
q15	544	513	497	497
q16	638	698	643	643
q17	1165	1537	1351	1351
q18	7670	7583	7396	7396
q19	829	866	1116	866
q20	1998	2048	1881	1881
q21	5319	4906	4619	4619
q22	1084	1040	994	994
Total cold run time: 52808 ms
Total hot run time: 50750 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186600 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 352a21a39a349b9ce8c67309ecd734d6d897e483, data reload: false

query1	1022	487	459	459
query2	6534	1895	1908	1895
query3	6794	229	218	218
query4	26840	23244	23358	23244
query5	4315	659	459	459
query6	306	194	183	183
query7	4602	495	273	273
query8	306	231	237	231
query9	8613	2526	2530	2526
query10	480	327	248	248
query11	15602	15183	14877	14877
query12	175	110	105	105
query13	1653	530	409	409
query14	8686	6223	6184	6184
query15	202	190	170	170
query16	7162	640	494	494
query17	1041	702	555	555
query18	1955	393	300	300
query19	182	180	159	159
query20	116	115	118	115
query21	212	128	103	103
query22	4196	4185	4119	4119
query23	33805	32934	33035	32934
query24	8458	2386	2395	2386
query25	571	469	418	418
query26	1234	270	151	151
query27	2744	493	334	334
query28	4357	2399	2387	2387
query29	746	599	421	421
query30	282	211	188	188
query31	924	883	778	778
query32	79	62	70	62
query33	560	362	299	299
query34	785	847	510	510
query35	784	810	758	758
query36	939	990	876	876
query37	119	105	80	80
query38	4152	4145	4080	4080
query39	1455	1387	1395	1387
query40	210	114	104	104
query41	56	57	53	53
query42	127	103	104	103
query43	503	491	476	476
query44	1281	792	785	785
query45	177	173	171	171
query46	834	1036	642	642
query47	1748	1843	1724	1724
query48	377	413	313	313
query49	776	494	431	431
query50	716	730	399	399
query51	4131	4249	4169	4169
query52	106	105	96	96
query53	215	243	176	176
query54	486	485	407	407
query55	79	92	79	79
query56	264	268	273	268
query57	1134	1145	1109	1109
query58	253	248	240	240
query59	2537	2655	2658	2655
query60	290	272	263	263
query61	149	134	132	132
query62	782	764	654	654
query63	221	188	184	184
query64	4359	1016	689	689
query65	4369	4262	4284	4262
query66	1168	430	323	323
query67	16045	15632	15359	15359
query68	7791	897	508	508
query69	462	305	260	260
query70	1206	1113	1060	1060
query71	441	300	269	269
query72	5596	4791	4898	4791
query73	719	677	347	347
query74	8871	9124	8958	8958
query75	3752	3217	2734	2734
query76	3488	1191	773	773
query77	788	383	285	285
query78	9987	10314	9378	9378
query79	2030	828	570	570
query80	606	512	438	438
query81	478	258	223	223
query82	451	126	102	102
query83	182	172	160	160
query84	267	93	74	74
query85	860	367	312	312
query86	333	326	274	274
query87	4436	4503	4469	4469
query88	3597	2250	2259	2250
query89	381	313	283	283
query90	1931	212	210	210
query91	153	144	115	115
query92	77	58	59	58
query93	1246	1068	593	593
query94	675	427	302	302
query95	367	320	265	265
query96	485	564	284	284
query97	3157	3223	3132	3132
query98	236	235	204	204
query99	1480	1434	1278	1278
Total cold run time: 272910 ms
Total hot run time: 186600 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 352a21a39a349b9ce8c67309ecd734d6d897e483, data reload: false

query1	0.04	0.04	0.04
query2	0.12	0.10	0.11
query3	0.25	0.19	0.20
query4	1.59	0.20	0.19
query5	0.59	0.58	0.59
query6	1.19	0.72	0.72
query7	0.03	0.01	0.02
query8	0.04	0.04	0.04
query9	0.57	0.54	0.54
query10	0.57	0.59	0.57
query11	0.15	0.11	0.12
query12	0.15	0.11	0.11
query13	0.61	0.60	0.59
query14	2.68	2.69	2.81
query15	0.95	0.85	0.84
query16	0.38	0.39	0.39
query17	1.02	1.05	1.05
query18	0.22	0.21	0.20
query19	1.90	2.01	1.87
query20	0.01	0.01	0.01
query21	15.36	0.88	0.53
query22	0.76	1.11	0.73
query23	14.98	1.38	0.60
query24	7.09	1.23	1.08
query25	0.50	0.13	0.07
query26	0.70	0.16	0.14
query27	0.05	0.05	0.04
query28	8.76	0.86	0.43
query29	12.61	4.06	3.41
query30	0.25	0.09	0.07
query31	2.83	0.58	0.39
query32	3.22	0.54	0.47
query33	3.05	3.05	3.09
query34	15.81	5.12	4.50
query35	4.59	4.50	4.55
query36	0.68	0.50	0.49
query37	0.08	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.03	0.03
query40	0.17	0.14	0.12
query41	0.08	0.02	0.02
query42	0.04	0.03	0.02
query43	0.04	0.03	0.02
Total cold run time: 104.78 s
Total hot run time: 31.65 s

@wm1581066 wm1581066 added the usercase Important user case type label label Mar 26, 2025
@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 50.89% (13620/26763)
Line Coverage 40.27% (118216/293568)
Region Coverage 38.95% (60065/154209)
Branch Coverage 33.86% (30211/89224)

@morningman morningman marked this pull request as ready for review March 27, 2025 14:21
@morningman
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/8) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 50.89% (13624/26769)
Line Coverage 40.26% (118245/293693)
Region Coverage 38.94% (60071/154258)
Branch Coverage 33.86% (30213/89242)

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Mar 28, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

1 similar comment
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman
Copy link
Contributor Author

run performance

@doris-robot
Copy link

TPC-H: Total hot run time: 33959 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ba623a20a3bbe8f45248ccead5c8f18b77f49f6e, data reload: false

------ Round 1 ----------------------------------
q1	26081	5199	5057	5057
q2	2074	288	164	164
q3	10400	1206	680	680
q4	10226	994	542	542
q5	7594	2449	2302	2302
q6	185	164	130	130
q7	920	736	601	601
q8	9317	1330	1120	1120
q9	6801	5087	5041	5041
q10	6810	2289	1895	1895
q11	475	281	256	256
q12	348	351	227	227
q13	17755	3689	3051	3051
q14	220	215	220	215
q15	528	479	497	479
q16	617	604	579	579
q17	579	860	353	353
q18	7697	7137	7110	7110
q19	1215	955	575	575
q20	323	351	199	199
q21	4040	3375	2409	2409
q22	1068	1032	974	974
Total cold run time: 115273 ms
Total hot run time: 33959 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5211	5144	5184	5144
q2	240	340	229	229
q3	2165	2679	2221	2221
q4	1481	1800	1429	1429
q5	4551	4474	4359	4359
q6	208	164	123	123
q7	1988	1864	1719	1719
q8	2639	2543	2483	2483
q9	7133	7173	7099	7099
q10	2972	3176	2749	2749
q11	579	498	493	493
q12	731	767	579	579
q13	3628	3942	3268	3268
q14	270	281	271	271
q15	513	476	480	476
q16	648	674	636	636
q17	1178	1479	1417	1417
q18	7837	7441	7281	7281
q19	846	861	894	861
q20	1982	1999	1811	1811
q21	5162	4839	4571	4571
q22	1063	1021	973	973
Total cold run time: 53025 ms
Total hot run time: 50192 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186901 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ba623a20a3bbe8f45248ccead5c8f18b77f49f6e, data reload: false

query1	1027	490	483	483
query2	6554	1970	1949	1949
query3	6819	216	214	214
query4	26516	23090	23267	23090
query5	4326	675	466	466
query6	283	190	171	171
query7	4605	482	274	274
query8	278	230	224	224
query9	8604	2547	2551	2547
query10	461	316	254	254
query11	15755	15033	14822	14822
query12	163	109	108	108
query13	1666	534	405	405
query14	9205	6290	6461	6290
query15	221	186	176	176
query16	7182	647	456	456
query17	1256	727	587	587
query18	1968	407	314	314
query19	192	194	167	167
query20	122	116	117	116
query21	219	122	106	106
query22	4096	4196	4152	4152
query23	33957	32934	32896	32896
query24	8466	2386	2367	2367
query25	529	451	395	395
query26	1235	259	149	149
query27	2757	496	317	317
query28	4339	2422	2402	2402
query29	776	555	481	481
query30	286	222	188	188
query31	970	875	784	784
query32	74	65	63	63
query33	561	357	315	315
query34	783	854	510	510
query35	785	837	727	727
query36	973	981	912	912
query37	115	109	118	109
query38	4046	4096	4098	4096
query39	1452	1392	1397	1392
query40	213	116	102	102
query41	57	52	50	50
query42	119	105	109	105
query43	519	522	502	502
query44	1344	803	792	792
query45	183	172	167	167
query46	858	1042	655	655
query47	1755	1780	1709	1709
query48	370	412	305	305
query49	791	503	417	417
query50	739	735	421	421
query51	4186	4221	4175	4175
query52	107	99	97	97
query53	223	257	189	189
query54	495	508	406	406
query55	78	83	80	80
query56	278	287	279	279
query57	1119	1150	1070	1070
query58	247	254	239	239
query59	2678	2796	2664	2664
query60	298	276	261	261
query61	138	128	132	128
query62	781	744	667	667
query63	226	193	193	193
query64	4437	1035	735	735
query65	4357	4199	4322	4199
query66	1157	411	303	303
query67	15799	15672	15643	15643
query68	8164	895	515	515
query69	465	328	268	268
query70	1223	1123	1096	1096
query71	443	297	272	272
query72	5760	4690	4739	4690
query73	735	650	351	351
query74	8801	9067	9147	9067
query75	3815	3248	2700	2700
query76	3698	1176	742	742
query77	785	381	294	294
query78	9944	10214	9410	9410
query79	2019	806	575	575
query80	629	524	430	430
query81	470	269	223	223
query82	467	125	100	100
query83	208	184	164	164
query84	290	99	74	74
query85	868	373	317	317
query86	343	297	276	276
query87	4447	4376	4357	4357
query88	3483	2229	2252	2229
query89	379	314	278	278
query90	1929	215	220	215
query91	141	158	109	109
query92	76	61	55	55
query93	1465	1066	589	589
query94	665	404	312	312
query95	362	278	272	272
query96	486	569	277	277
query97	3157	3214	3126	3126
query98	234	232	203	203
query99	1464	1403	1288	1288
Total cold run time: 274589 ms
Total hot run time: 186901 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.77 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ba623a20a3bbe8f45248ccead5c8f18b77f49f6e, data reload: false

query1	0.03	0.04	0.03
query2	0.12	0.10	0.11
query3	0.24	0.19	0.19
query4	1.59	0.20	0.20
query5	0.60	0.57	0.58
query6	1.17	0.71	0.71
query7	0.02	0.02	0.02
query8	0.04	0.03	0.04
query9	0.58	0.51	0.51
query10	0.59	0.58	0.57
query11	0.15	0.10	0.10
query12	0.14	0.12	0.11
query13	0.60	0.60	0.60
query14	2.70	2.74	2.68
query15	0.90	0.84	0.84
query16	0.38	0.39	0.40
query17	1.00	1.00	1.06
query18	0.21	0.20	0.20
query19	1.96	1.92	1.83
query20	0.02	0.01	0.01
query21	15.37	0.88	0.54
query22	0.75	1.25	0.63
query23	14.91	1.35	0.58
query24	6.71	1.85	0.65
query25	0.50	0.13	0.14
query26	0.66	0.16	0.14
query27	0.05	0.04	0.04
query28	9.28	0.88	0.42
query29	12.61	3.97	3.33
query30	0.24	0.08	0.06
query31	2.83	0.57	0.39
query32	3.23	0.55	0.47
query33	3.00	3.09	3.01
query34	15.73	5.05	4.45
query35	4.47	4.49	4.46
query36	0.65	0.48	0.47
query37	0.08	0.07	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.17	0.13	0.13
query41	0.09	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.53 s
Total hot run time: 30.77 s

@morningman morningman merged commit 25a9aea into apache:master Mar 28, 2025
31 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 28, 2025
### What problem does this PR solve?

Problem Summary:

1.
If there is an empty block in lzo file, the decompress may fail:
`Lzo decompression failed: MalformedInputException at :13`

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

2.
Fix a bug that compressed size may be read wrong.
github-actions bot pushed a commit that referenced this pull request Mar 28, 2025
### What problem does this PR solve?

Problem Summary:

1.
If there is an empty block in lzo file, the decompress may fail:
`Lzo decompression failed: MalformedInputException at :13`

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

2.
Fix a bug that compressed size may be read wrong.
yiguolei pushed a commit that referenced this pull request Apr 17, 2025
Cherry-picked from #49538

---------

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
Co-authored-by: morningman <yunyou@selectdb.com>
morningman added a commit that referenced this pull request Apr 22, 2025
### What problem does this PR solve?

Problem Summary:

1.
If there is an empty block in lzo file, the decompress may fail:
`Lzo decompression failed: MalformedInputException at :13`

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

2.
Fix a bug that compressed size may be read wrong.
morningman added a commit that referenced this pull request Apr 23, 2025
### What problem does this PR solve?

Problem Summary:

1.
If there is an empty block in lzo file, the decompress may fail:
`Lzo decompression failed: MalformedInputException at :13`

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

2.
Fix a bug that compressed size may be read wrong.
morningman added a commit to morningman/doris that referenced this pull request Apr 23, 2025
### What problem does this PR solve?

Problem Summary:

1.
If there is an empty block in lzo file, the decompress may fail:
`Lzo decompression failed: MalformedInputException at :13`

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

2.
Fix a bug that compressed size may be read wrong.
@yiguolei yiguolei mentioned this pull request May 13, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Problem Summary:

1.
If there is an empty block in lzo file, the decompress may fail:
`Lzo decompression failed: MalformedInputException at :13`

This is because if the next block's uncompressed size is 0,
we thought it is the last block, which is not right.

This PR fix it by removing the peek for the next block.

2.
Fix a bug that compressed size may be read wrong.
@gavinchou gavinchou mentioned this pull request Jun 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.6-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.