Skip to content

Conversation

@Yukang-Lian
Copy link
Collaborator

@Yukang-Lian Yukang-Lian commented May 23, 2024

Proposed changes

Issue Number: close #xxx

Problem: When group commit=async_mode and NULL data is imported into a variant type column, it causes incorrect memory statistics for group commit backpressure, leading to a stuck issue.
Cause: In group commit mode, blocks are first added to a queue in batches using add block, and then blocks are retrieved from the queue using get block. To track memory usage during backpressure, we add the block size to the memory statistics during add block and subtract the block size from the memory statistics during get block. However, for variant types, during the add block write to WAL, serialization occurs, which can merge types (e.g., merging int and bigint into bigint), thereby changing the block size. This results in a discrepancy between the block size during get block and add block, causing memory statistics to overflow.
Solution: Record the block size at the time of add block and use this recorded size during get block instead of the actual block size. This ensures consistency in the memory addition and subtraction.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Yukang-Lian
Copy link
Collaborator Author

test will be added soon

@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40651 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 91d8fcc5d5a9d0348f5e2582ebc7bd691c1e02a2, data reload: false

------ Round 1 ----------------------------------
q1	18044	4488	4310	4310
q2	2027	190	201	190
q3	10476	1199	1189	1189
q4	10148	781	752	752
q5	7485	2624	2789	2624
q6	216	129	130	129
q7	950	607	589	589
q8	9209	2056	2069	2056
q9	9138	6487	6479	6479
q10	8933	3744	3726	3726
q11	454	240	227	227
q12	490	225	226	225
q13	18725	3060	3019	3019
q14	261	210	218	210
q15	506	470	476	470
q16	501	373	393	373
q17	972	677	759	677
q18	8174	7576	7369	7369
q19	7596	1551	1550	1550
q20	662	311	306	306
q21	5002	3908	3925	3908
q22	349	273	280	273
Total cold run time: 120318 ms
Total hot run time: 40651 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4382	4220	4231	4220
q2	377	278	266	266
q3	2986	2798	2739	2739
q4	1856	1592	1645	1592
q5	5250	5298	5317	5298
q6	210	125	125	125
q7	2144	1688	1697	1688
q8	3230	3330	3319	3319
q9	8306	8349	8401	8349
q10	3881	3669	3716	3669
q11	574	488	489	488
q12	767	608	600	600
q13	16600	3036	2988	2988
q14	287	261	273	261
q15	518	481	466	466
q16	499	419	429	419
q17	1788	1507	1590	1507
q18	7686	7578	7415	7415
q19	1684	1625	1526	1526
q20	1996	1781	1787	1781
q21	4896	4845	4702	4702
q22	580	456	492	456
Total cold run time: 70497 ms
Total hot run time: 53874 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 167833 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 91d8fcc5d5a9d0348f5e2582ebc7bd691c1e02a2, data reload: false

query1	902	375	367	367
query2	6439	2332	2382	2332
query3	6658	206	211	206
query4	19793	17391	17416	17391
query5	4132	411	416	411
query6	250	162	152	152
query7	4575	295	293	293
query8	241	189	186	186
query9	8517	2362	2346	2346
query10	443	280	268	268
query11	10488	10067	9914	9914
query12	140	91	85	85
query13	1637	361	358	358
query14	9361	6102	6787	6102
query15	224	167	171	167
query16	7182	267	256	256
query17	1365	521	514	514
query18	1918	266	265	265
query19	194	160	162	160
query20	89	86	87	86
query21	202	134	129	129
query22	4233	3976	3823	3823
query23	33471	32988	32992	32988
query24	12356	2828	2858	2828
query25	690	369	362	362
query26	1890	151	151	151
query27	3030	311	320	311
query28	7707	2009	2019	2009
query29	1359	600	591	591
query30	296	171	177	171
query31	945	760	763	760
query32	94	52	56	52
query33	759	269	264	264
query34	996	474	474	474
query35	764	588	616	588
query36	1056	937	887	887
query37	200	72	71	71
query38	2915	2778	2775	2775
query39	839	781	791	781
query40	282	125	122	122
query41	43	41	45	41
query42	100	97	95	95
query43	595	540	566	540
query44	1230	720	728	720
query45	184	163	162	162
query46	1076	707	719	707
query47	1840	1757	1763	1757
query48	354	289	304	289
query49	1189	382	385	382
query50	777	381	386	381
query51	6883	6846	6697	6697
query52	98	89	119	89
query53	345	283	286	283
query54	982	430	427	427
query55	79	74	71	71
query56	259	239	246	239
query57	1137	996	1059	996
query58	238	203	213	203
query59	3449	3030	3120	3030
query60	270	256	250	250
query61	90	86	88	86
query62	645	462	473	462
query63	315	283	288	283
query64	10045	2351	1857	1857
query65	3184	3122	3135	3122
query66	1444	329	327	327
query67	15292	14776	14867	14776
query68	4577	538	548	538
query69	435	268	264	264
query70	1142	1105	1087	1087
query71	414	270	264	264
query72	7478	2739	2552	2552
query73	707	322	318	318
query74	6044	5550	5706	5550
query75	3315	2610	2570	2570
query76	2444	949	924	924
query77	454	267	274	267
query78	10563	9805	9727	9727
query79	2295	514	516	514
query80	1909	428	435	428
query81	548	245	244	244
query82	654	96	92	92
query83	283	175	173	173
query84	271	84	84	84
query85	1945	265	261	261
query86	519	295	308	295
query87	3314	3106	3119	3106
query88	4225	2327	2344	2327
query89	475	384	390	384
query90	2000	185	186	185
query91	190	94	94	94
query92	66	48	48	48
query93	2481	509	496	496
query94	1233	190	182	182
query95	387	305	300	300
query96	587	267	264	264
query97	3229	3034	2975	2975
query98	242	224	208	208
query99	1211	853	844	844
Total cold run time: 276412 ms
Total hot run time: 167833 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.65% (9018/25295)
Line Coverage: 27.31% (74571/273096)
Region Coverage: 26.53% (38602/145494)
Branch Coverage: 23.40% (19690/84136)
Coverage Report: http://coverage.selectdb-in.cc/coverage/91d8fcc5d5a9d0348f5e2582ebc7bd691c1e02a2_91d8fcc5d5a9d0348f5e2582ebc7bd691c1e02a2/report/index.html

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@Yukang-Lian Yukang-Lian force-pushed the Fix-Group-Commit-Block-Queue-Mem-Estimate-Fault branch from 6e3fe51 to c8b3869 Compare May 27, 2024 10:05
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40068 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c8b38692c97c5f0128f8ad690310340868f79395, data reload: false

------ Round 1 ----------------------------------
q1	17844	4945	4436	4436
q2	2731	195	196	195
q3	11463	1180	1227	1180
q4	10912	762	759	759
q5	7484	2805	2675	2675
q6	224	130	134	130
q7	954	609	610	609
q8	9355	2111	2076	2076
q9	9035	6541	6510	6510
q10	8924	3720	3721	3720
q11	446	250	229	229
q12	519	228	218	218
q13	18739	2960	2987	2960
q14	263	216	209	209
q15	513	464	472	464
q16	534	389	379	379
q17	963	737	751	737
q18	8177	7419	7411	7411
q19	4183	1570	1491	1491
q20	644	315	305	305
q21	5214	3103	3212	3103
q22	345	278	272	272
Total cold run time: 119466 ms
Total hot run time: 40068 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4368	4252	4206	4206
q2	360	277	268	268
q3	3000	2790	2732	2732
q4	1843	1586	1578	1578
q5	5286	5279	5315	5279
q6	210	122	122	122
q7	2119	1711	1750	1711
q8	3171	3317	3319	3317
q9	8391	8433	8351	8351
q10	3885	3737	3701	3701
q11	581	482	478	478
q12	743	576	585	576
q13	16481	2984	2992	2984
q14	292	264	281	264
q15	507	480	471	471
q16	485	419	418	418
q17	1790	1489	1472	1472
q18	7783	7522	7555	7522
q19	2324	1588	1573	1573
q20	1994	1755	1785	1755
q21	4859	4741	4828	4741
q22	554	476	484	476
Total cold run time: 71026 ms
Total hot run time: 53995 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169998 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c8b38692c97c5f0128f8ad690310340868f79395, data reload: false

query1	915	387	382	382
query2	6467	2555	2313	2313
query3	6649	214	210	210
query4	19850	17639	17354	17354
query5	4170	414	417	414
query6	258	160	154	154
query7	4593	298	301	298
query8	244	188	184	184
query9	8512	2425	2422	2422
query10	448	276	262	262
query11	10542	10101	10182	10101
query12	136	91	87	87
query13	1649	363	364	363
query14	9706	7429	7369	7369
query15	226	170	170	170
query16	7803	256	256	256
query17	1861	524	512	512
query18	1973	280	267	267
query19	202	159	152	152
query20	95	88	86	86
query21	198	130	128	128
query22	4306	3875	3895	3875
query23	33561	32968	33119	32968
query24	12095	2913	2893	2893
query25	662	351	356	351
query26	1788	152	154	152
query27	2990	315	321	315
query28	7270	2073	2063	2063
query29	1114	596	592	592
query30	292	152	156	152
query31	946	758	758	758
query32	92	52	52	52
query33	764	264	259	259
query34	999	475	490	475
query35	737	596	591	591
query36	1067	883	909	883
query37	219	66	67	66
query38	2939	2813	2772	2772
query39	844	789	787	787
query40	275	123	128	123
query41	46	46	46	46
query42	105	99	95	95
query43	584	548	522	522
query44	1221	747	760	747
query45	180	165	160	160
query46	1077	727	727	727
query47	1875	1793	1788	1788
query48	366	297	295	295
query49	1237	386	389	386
query50	769	384	386	384
query51	6775	6789	6746	6746
query52	113	88	95	88
query53	354	290	287	287
query54	956	435	423	423
query55	76	73	73	73
query56	264	241	244	241
query57	1156	1019	1014	1014
query58	241	211	221	211
query59	3312	3204	3080	3080
query60	277	258	256	256
query61	99	133	91	91
query62	645	454	441	441
query63	308	283	295	283
query64	9792	2211	1720	1720
query65	3246	3078	3120	3078
query66	1387	331	332	331
query67	15287	14782	15126	14782
query68	4788	542	535	535
query69	481	267	276	267
query70	1113	1117	1110	1110
query71	423	266	264	264
query72	7238	2757	2550	2550
query73	729	316	320	316
query74	6045	5664	5677	5664
query75	3780	2645	2643	2643
query76	3195	1068	1006	1006
query77	612	267	271	267
query78	10280	10099	9782	9782
query79	2818	516	521	516
query80	2021	443	430	430
query81	530	221	225	221
query82	1245	102	100	100
query83	343	178	202	178
query84	270	90	90	90
query85	1635	330	323	323
query86	470	312	301	301
query87	3317	3122	3125	3122
query88	3866	2342	2323	2323
query89	484	394	381	381
query90	1948	189	184	184
query91	134	110	108	108
query92	58	49	50	49
query93	2698	520	512	512
query94	1257	199	195	195
query95	407	321	316	316
query96	594	268	270	268
query97	3265	3004	3018	3004
query98	252	223	221	221
query99	1262	846	857	846
Total cold run time: 278763 ms
Total hot run time: 169998 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.65 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c8b38692c97c5f0128f8ad690310340868f79395, data reload: false

query1	0.04	0.04	0.03
query2	0.09	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.08	0.08
query5	0.51	0.51	0.49
query6	1.12	0.73	0.73
query7	0.02	0.02	0.01
query8	0.05	0.04	0.05
query9	0.55	0.50	0.48
query10	0.55	0.55	0.54
query11	0.16	0.12	0.12
query12	0.15	0.12	0.12
query13	0.60	0.59	0.60
query14	0.80	0.77	0.77
query15	0.82	0.82	0.80
query16	0.37	0.36	0.37
query17	1.02	1.01	1.02
query18	0.24	0.25	0.22
query19	1.81	1.81	1.72
query20	0.01	0.02	0.01
query21	15.51	0.69	0.66
query22	4.15	7.35	2.00
query23	18.29	1.35	1.28
query24	1.59	0.34	0.20
query25	0.13	0.08	0.08
query26	0.26	0.16	0.17
query27	0.08	0.08	0.07
query28	13.34	1.01	0.99
query29	12.72	3.31	3.28
query30	0.25	0.05	0.06
query31	2.88	0.39	0.38
query32	3.27	0.46	0.47
query33	2.91	2.88	2.89
query34	17.21	4.42	4.39
query35	4.52	4.55	4.49
query36	0.65	0.47	0.47
query37	0.18	0.16	0.16
query38	0.15	0.15	0.14
query39	0.04	0.04	0.04
query40	0.16	0.13	0.14
query41	0.09	0.05	0.04
query42	0.05	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 109.29 s
Total hot run time: 30.65 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.75% (9006/25193)
Line Coverage: 27.35% (74543/272531)
Region Coverage: 26.55% (38554/145189)
Branch Coverage: 23.44% (19662/83894)
Coverage Report: http://coverage.selectdb-in.cc/coverage/c8b38692c97c5f0128f8ad690310340868f79395_c8b38692c97c5f0128f8ad690310340868f79395/report/index.html

… into a `variant` type column, it causes incorrect memory statistics for group commit backpressure, leading to a stuck issue.

**Cause:** In group commit mode, blocks are first added to a queue in batches using `add block`, and then blocks are retrieved from the queue using `get block`. To track memory usage during backpressure, we add the block size to the memory statistics during `add block` and subtract the block size from the memory statistics during `get block`. However, for `variant` types, during the `add block` write to WAL, serialization occurs, which can merge types (e.g., merging `int` and `bigint` into `bigint`), thereby changing the block size. This results in a discrepancy between the block size during `get block` and `add block`, causing memory statistics to overflow.
**Solution:** Record the block size at the time of `add block` and use this recorded size during `get block` instead of the actual block size. This ensures consistency in the memory addition and subtraction.
@Yukang-Lian Yukang-Lian force-pushed the Fix-Group-Commit-Block-Queue-Mem-Estimate-Fault branch from c8b3869 to 1a44791 Compare May 28, 2024 07:14
@Yukang-Lian
Copy link
Collaborator Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 41449 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1a44791c8c0103d16c45e6f5c72bebb1e62e40d9, data reload: false

------ Round 1 ----------------------------------
q1	17588	4851	4275	4275
q2	2020	199	201	199
q3	10456	1226	1220	1220
q4	10188	851	859	851
q5	7467	2697	2678	2678
q6	236	132	129	129
q7	959	614	616	614
q8	9230	2128	2105	2105
q9	8995	6705	6714	6705
q10	9271	3907	3951	3907
q11	495	244	245	244
q12	410	237	231	231
q13	17223	3217	3180	3180
q14	274	211	215	211
q15	526	472	473	472
q16	519	387	393	387
q17	991	647	724	647
q18	8515	7952	7856	7856
q19	5560	1546	1546	1546
q20	638	311	315	311
q21	5200	3397	3411	3397
q22	365	285	284	284
Total cold run time: 117126 ms
Total hot run time: 41449 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4526	4353	4319	4319
q2	390	273	277	273
q3	3174	2914	2847	2847
q4	1972	1686	1745	1686
q5	5556	5317	5489	5317
q6	212	121	134	121
q7	2211	1803	1779	1779
q8	3229	3428	3391	3391
q9	8679	8586	8626	8586
q10	4085	3879	3885	3879
q11	585	504	481	481
q12	773	608	610	608
q13	17343	3105	3126	3105
q14	312	276	280	276
q15	516	474	482	474
q16	489	439	436	436
q17	1810	1523	1490	1490
q18	8189	7611	7301	7301
q19	5446	1540	1522	1522
q20	2049	1793	1761	1761
q21	9080	4762	4700	4700
q22	577	497	508	497
Total cold run time: 81203 ms
Total hot run time: 54849 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.77% (9005/25174)
Line Coverage: 27.39% (74568/272208)
Region Coverage: 26.61% (38576/144991)
Branch Coverage: 23.49% (19682/83778)
Coverage Report: http://coverage.selectdb-in.cc/coverage/1a44791c8c0103d16c45e6f5c72bebb1e62e40d9_1a44791c8c0103d16c45e6f5c72bebb1e62e40d9/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 169895 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1a44791c8c0103d16c45e6f5c72bebb1e62e40d9, data reload: false

query1	922	373	392	373
query2	6489	2483	2393	2393
query3	6641	208	210	208
query4	19411	17260	17349	17260
query5	4105	423	434	423
query6	246	161	154	154
query7	4587	313	293	293
query8	237	184	184	184
query9	8591	2470	2450	2450
query10	460	285	279	279
query11	10487	10049	9982	9982
query12	135	95	87	87
query13	1634	361	367	361
query14	10144	7552	7645	7552
query15	229	168	162	162
query16	7668	263	260	260
query17	1305	525	512	512
query18	1891	270	270	270
query19	190	156	164	156
query20	89	98	83	83
query21	197	129	134	129
query22	4238	3836	3873	3836
query23	33643	32952	33082	32952
query24	11238	2786	2815	2786
query25	578	375	365	365
query26	713	159	160	159
query27	2233	321	327	321
query28	5655	2082	2079	2079
query29	862	611	590	590
query30	242	148	152	148
query31	973	766	753	753
query32	84	53	55	53
query33	750	275	265	265
query34	972	492	496	492
query35	727	631	620	620
query36	1068	894	908	894
query37	105	67	74	67
query38	2922	2766	2788	2766
query39	853	794	791	791
query40	197	128	129	128
query41	47	44	46	44
query42	101	95	96	95
query43	568	544	534	534
query44	1280	740	751	740
query45	177	162	165	162
query46	1078	732	713	713
query47	1816	1745	1735	1735
query48	379	293	307	293
query49	844	382	394	382
query50	778	399	397	397
query51	6807	6759	6710	6710
query52	100	92	93	92
query53	353	290	303	290
query54	856	433	442	433
query55	74	74	75	74
query56	292	256	257	256
query57	1093	1047	1039	1039
query58	232	216	229	216
query59	3350	3315	3003	3003
query60	273	266	259	259
query61	88	86	88	86
query62	613	443	475	443
query63	313	295	295	295
query64	8517	2271	1732	1732
query65	3211	3118	3092	3092
query66	768	335	334	334
query67	15254	15096	14836	14836
query68	4504	557	559	557
query69	428	278	278	278
query70	1145	1137	1119	1119
query71	423	279	285	279
query72	7854	5962	2742	2742
query73	712	328	326	326
query74	6032	5638	5574	5574
query75	3316	2605	2629	2605
query76	2709	1054	987	987
query77	404	269	276	269
query78	10230	9834	9585	9585
query79	2074	528	514	514
query80	1183	444	457	444
query81	536	219	226	219
query82	641	95	94	94
query83	250	174	183	174
query84	242	93	86	86
query85	1981	259	287	259
query86	500	280	302	280
query87	3314	3117	3130	3117
query88	4073	2432	2494	2432
query89	479	391	380	380
query90	2050	188	191	188
query91	126	99	97	97
query92	60	50	49	49
query93	2422	538	524	524
query94	1272	197	191	191
query95	405	310	315	310
query96	589	268	265	265
query97	3181	3013	2958	2958
query98	253	225	218	218
query99	1145	854	855	854
Total cold run time: 267506 ms
Total hot run time: 169895 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1a44791c8c0103d16c45e6f5c72bebb1e62e40d9, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.50	0.50	0.51
query6	1.12	0.72	0.72
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.50
query10	0.55	0.55	0.55
query11	0.15	0.11	0.10
query12	0.14	0.12	0.11
query13	0.59	0.59	0.60
query14	0.78	0.78	0.77
query15	0.83	0.79	0.81
query16	0.36	0.37	0.38
query17	1.01	1.02	1.02
query18	0.22	0.25	0.24
query19	1.79	1.77	1.69
query20	0.01	0.01	0.02
query21	15.73	0.64	0.65
query22	4.01	7.52	1.82
query23	18.32	1.32	1.27
query24	1.92	0.25	0.19
query25	0.13	0.08	0.08
query26	0.27	0.17	0.16
query27	0.08	0.07	0.08
query28	13.29	1.02	1.00
query29	13.76	3.30	3.23
query30	0.24	0.06	0.05
query31	2.90	0.38	0.37
query32	3.27	0.46	0.46
query33	2.91	2.94	2.87
query34	17.08	4.41	4.58
query35	4.52	4.51	4.71
query36	0.66	0.47	0.46
query37	0.18	0.16	0.15
query38	0.16	0.14	0.14
query39	0.05	0.03	0.03
query40	0.16	0.13	0.14
query41	0.09	0.05	0.05
query42	0.06	0.05	0.04
query43	0.04	0.03	0.03
Total cold run time: 110.52 s
Total hot run time: 30.35 s

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 28, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@dataroaring dataroaring merged commit a2fa3d8 into apache:master May 28, 2024
dataroaring pushed a commit that referenced this pull request May 31, 2024
…35314)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

**Problem:** When `group commit=async_mode` and NULL data is imported
into a `variant` type column, it causes incorrect memory statistics for
group commit backpressure, leading to a stuck issue.
**Cause:** In group commit mode, blocks are first added to a queue in
batches using `add block`, and then blocks are retrieved from the queue
using `get block`. To track memory usage during backpressure, we add the
block size to the memory statistics during `add block` and subtract the
block size from the memory statistics during `get block`. However, for
`variant` types, during the `add block` write to WAL, serialization
occurs, which can merge types (e.g., merging `int` and `bigint` into
`bigint`), thereby changing the block size. This results in a
discrepancy between the block size during `get block` and `add block`,
causing memory statistics to overflow.
**Solution:** Record the block size at the time of `add block` and use
this recorded size during `get block` instead of the actual block size.
This ensures consistency in the memory addition and subtraction.

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
Yukang-Lian added a commit to Yukang-Lian/doris that referenced this pull request Jul 7, 2024
…pache#35314)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

**Problem:** When `group commit=async_mode` and NULL data is imported
into a `variant` type column, it causes incorrect memory statistics for
group commit backpressure, leading to a stuck issue.
**Cause:** In group commit mode, blocks are first added to a queue in
batches using `add block`, and then blocks are retrieved from the queue
using `get block`. To track memory usage during backpressure, we add the
block size to the memory statistics during `add block` and subtract the
block size from the memory statistics during `get block`. However, for
`variant` types, during the `add block` write to WAL, serialization
occurs, which can merge types (e.g., merging `int` and `bigint` into
`bigint`), thereby changing the block size. This results in a
discrepancy between the block size during `get block` and `add block`,
causing memory statistics to overflow.
**Solution:** Record the block size at the time of `add block` and use
this recorded size during `get block` instead of the actual block size.
This ensures consistency in the memory addition and subtraction.

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
dataroaring pushed a commit that referenced this pull request Jul 7, 2024
…lock queue mem estimate fault" (#37379)

Pick [Fix](group commit) Fix group commit block queue mem estimate faule
#35314

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

**Problem:** When `group commit=async_mode` and NULL data is imported
into a `variant` type column, it causes incorrect memory statistics for
group commit backpressure, leading to a stuck issue. **Cause:** In group
commit mode, blocks are first added to a queue in batches using `add
block`, and then blocks are retrieved from the queue using `get block`.
To track memory usage during backpressure, we add the block size to the
memory statistics during `add block` and subtract the block size from
the memory statistics during `get block`. However, for `variant` types,
during the `add block` write to WAL, serialization occurs, which can
merge types (e.g., merging `int` and `bigint` into `bigint`), thereby
changing the block size. This results in a discrepancy between the block
size during `get block` and `add block`, causing memory statistics to
overflow.
**Solution:** Record the block size at the time of `add block` and use
this recorded size during `get block` instead of the actual block size.
This ensures consistency in the memory addition and subtraction.

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants