Skip to content

Conversation

@suxiaogang223
Copy link
Contributor

What problem does this PR solve?

Problem Summary:
As this issue describes apache/hudi#12918
While debugging a memory leak issue, I noticed a steady increase in the number of BitCaskDiskMap instances through jmap logs. Using VisualVM, I found that a memory leak may exist in BitCaskDiskMap due to a circular reference.
image

  • The cleanup method is an instance method that implicitly depends on the this pointer
  • When cleanup is called from shutdownThread, it holds a reference to this
  • Meanwhile, the DiskMap class holds a reference to shutdownThread
    image

This creates a circular reference:

  DiskMap (this) -> shutdownThread -> cleanup -> DiskMap (this)

Release note

This pr use a new DiskMap.java to shade the original DiskMap.java to prevent the problem.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33249 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 109ec3560ba5f21a0ede9f0935f15fbc8d3d011e, data reload: false

------ Round 1 ----------------------------------
q1	17583	5250	5111	5111
q2	2041	290	166	166
q3	11821	1291	790	790
q4	10212	1055	539	539
q5	7567	2349	2433	2349
q6	193	167	134	134
q7	926	779	621	621
q8	9687	1385	1097	1097
q9	5108	4613	4659	4613
q10	6831	2337	1901	1901
q11	474	280	262	262
q12	349	356	224	224
q13	17954	3686	3148	3148
q14	222	233	211	211
q15	535	482	485	482
q16	622	621	605	605
q17	592	885	345	345
q18	7241	6930	6906	6906
q19	3449	1000	552	552
q20	312	339	197	197
q21	2847	2332	1997	1997
q22	1040	1057	999	999
Total cold run time: 107606 ms
Total hot run time: 33249 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5243	5182	5274	5182
q2	235	332	235	235
q3	2239	2727	2296	2296
q4	1439	1780	1394	1394
q5	4217	4137	4372	4137
q6	224	178	138	138
q7	2002	1906	1774	1774
q8	2644	2624	2592	2592
q9	7249	7190	7303	7190
q10	3014	3108	2723	2723
q11	593	522	486	486
q12	689	781	626	626
q13	3557	3895	3303	3303
q14	291	308	271	271
q15	531	490	496	490
q16	667	668	647	647
q17	1138	1654	1355	1355
q18	7768	7650	7555	7555
q19	834	877	822	822
q20	1967	2073	1919	1919
q21	5371	4801	4953	4801
q22	1100	1076	1072	1072
Total cold run time: 53012 ms
Total hot run time: 51008 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192224 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 109ec3560ba5f21a0ede9f0935f15fbc8d3d011e, data reload: false

query1	1369	1020	986	986
query2	6200	1889	1865	1865
query3	10984	4435	4382	4382
query4	53692	24519	23525	23525
query5	5253	498	479	479
query6	432	205	190	190
query7	5374	513	300	300
query8	325	258	239	239
query9	7384	2629	2656	2629
query10	431	309	266	266
query11	15343	15096	14902	14902
query12	169	117	104	104
query13	1297	521	408	408
query14	10286	6892	6952	6892
query15	200	202	190	190
query16	7050	685	467	467
query17	1079	714	580	580
query18	1506	417	316	316
query19	196	186	159	159
query20	120	120	119	119
query21	209	123	112	112
query22	4436	4576	4289	4289
query23	33885	33493	33218	33218
query24	5715	2431	2405	2405
query25	482	468	399	399
query26	708	292	160	160
query27	1748	513	336	336
query28	2771	2490	2444	2444
query29	620	600	448	448
query30	276	233	199	199
query31	867	868	809	809
query32	70	68	66	66
query33	470	381	318	318
query34	756	881	502	502
query35	833	851	752	752
query36	988	1009	894	894
query37	125	111	79	79
query38	4215	4131	4073	4073
query39	1573	1479	1421	1421
query40	212	118	104	104
query41	54	52	51	51
query42	125	110	112	110
query43	513	512	493	493
query44	1389	818	806	806
query45	186	175	165	165
query46	848	1025	661	661
query47	1852	1852	1810	1810
query48	396	434	308	308
query49	731	512	415	415
query50	716	748	431	431
query51	4277	4323	4239	4239
query52	112	124	99	99
query53	235	266	191	191
query54	492	548	418	418
query55	88	78	80	78
query56	264	254	258	254
query57	1166	1175	1125	1125
query58	251	259	241	241
query59	2628	2761	2533	2533
query60	275	300	264	264
query61	129	122	121	121
query62	735	752	689	689
query63	234	189	192	189
query64	1526	1052	681	681
query65	4559	4466	4347	4347
query66	723	391	306	306
query67	15952	15633	15436	15436
query68	7337	887	496	496
query69	561	317	266	266
query70	1198	1122	1135	1122
query71	499	299	291	291
query72	5866	3611	3759	3611
query73	1122	738	348	348
query74	9156	9162	9021	9021
query75	3702	3175	2707	2707
query76	4356	1186	783	783
query77	577	372	277	277
query78	9979	10303	9290	9290
query79	2387	825	573	573
query80	638	532	459	459
query81	490	258	220	220
query82	637	129	95	95
query83	345	171	154	154
query84	284	103	78	78
query85	797	374	310	310
query86	425	307	291	291
query87	4449	4453	4333	4333
query88	3469	2223	2251	2223
query89	399	320	279	279
query90	1932	209	205	205
query91	135	147	105	105
query92	73	59	58	58
query93	1346	1108	573	573
query94	659	419	295	295
query95	352	281	255	255
query96	480	568	271	271
query97	3296	3395	3263	3263
query98	225	204	207	204
query99	1433	1398	1281	1281
Total cold run time: 298709 ms
Total hot run time: 192224 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.14 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 109ec3560ba5f21a0ede9f0935f15fbc8d3d011e, data reload: false

query1	0.04	0.04	0.03
query2	0.06	0.04	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.11
query5	0.56	0.55	0.56
query6	1.20	0.72	0.72
query7	0.02	0.02	0.01
query8	0.06	0.04	0.04
query9	0.59	0.51	0.55
query10	0.58	0.60	0.61
query11	0.17	0.10	0.10
query12	0.15	0.11	0.11
query13	0.62	0.60	0.60
query14	2.83	2.68	2.69
query15	0.92	0.87	0.87
query16	0.38	0.37	0.38
query17	1.01	1.00	1.05
query18	0.21	0.19	0.19
query19	1.86	1.83	1.96
query20	0.02	0.01	0.01
query21	15.38	0.87	0.57
query22	0.75	1.15	0.67
query23	14.96	1.42	0.60
query24	6.96	2.09	1.09
query25	0.51	0.19	0.19
query26	0.61	0.16	0.14
query27	0.05	0.04	0.05
query28	9.71	0.85	0.43
query29	12.55	3.98	3.25
query30	0.26	0.09	0.06
query31	2.86	0.59	0.38
query32	3.22	0.55	0.46
query33	3.05	3.17	3.03
query34	15.68	5.14	4.53
query35	4.53	4.52	4.49
query36	0.69	0.50	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.39 s
Total hot run time: 31.14 s

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32242 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1a91eda23b0c3de96085557eb1fabb93cdf6949f, data reload: false

------ Round 1 ----------------------------------
q1	17648	5106	5021	5021
q2	2041	285	161	161
q3	10429	1215	731	731
q4	10210	1016	505	505
q5	7523	2375	2285	2285
q6	184	162	130	130
q7	883	734	612	612
q8	9315	1314	1071	1071
q9	4854	4701	4642	4642
q10	6807	2308	1879	1879
q11	467	268	256	256
q12	343	373	207	207
q13	17794	3682	3102	3102
q14	220	238	211	211
q15	535	493	480	480
q16	634	619	568	568
q17	576	845	339	339
q18	6967	6515	6381	6381
q19	1394	961	561	561
q20	308	341	188	188
q21	2823	2191	1934	1934
q22	1055	1046	978	978
Total cold run time: 103010 ms
Total hot run time: 32242 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5533	5117	5371	5117
q2	242	326	228	228
q3	2275	2741	2418	2418
q4	1518	1917	1446	1446
q5	4333	4245	4224	4224
q6	204	160	121	121
q7	1904	1922	1751	1751
q8	2606	2672	2536	2536
q9	7506	7588	7495	7495
q10	3106	3278	2880	2880
q11	600	522	515	515
q12	724	783	685	685
q13	3795	4177	3381	3381
q14	286	294	272	272
q15	520	509	476	476
q16	658	671	632	632
q17	1135	1565	1419	1419
q18	7725	7657	7429	7429
q19	821	819	918	819
q20	1982	1987	1844	1844
q21	5506	4969	4910	4910
q22	1073	1047	1021	1021
Total cold run time: 54052 ms
Total hot run time: 51619 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192210 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1a91eda23b0c3de96085557eb1fabb93cdf6949f, data reload: false

query1	1388	1026	979	979
query2	6536	1860	1826	1826
query3	11090	4614	4652	4614
query4	26199	23570	23346	23346
query5	5038	664	481	481
query6	304	200	199	199
query7	3984	506	296	296
query8	292	252	238	238
query9	8546	2604	2592	2592
query10	493	324	255	255
query11	15486	15200	14859	14859
query12	167	119	103	103
query13	1566	535	398	398
query14	9643	6704	6666	6666
query15	202	191	176	176
query16	7592	736	506	506
query17	1485	768	559	559
query18	2003	405	306	306
query19	189	218	156	156
query20	122	123	125	123
query21	205	122	107	107
query22	4481	4644	4311	4311
query23	34268	33471	33409	33409
query24	7183	2410	2387	2387
query25	492	455	385	385
query26	812	280	158	158
query27	2055	501	337	337
query28	4310	2463	2412	2412
query29	615	557	443	443
query30	273	222	201	201
query31	920	871	799	799
query32	76	63	61	61
query33	528	360	305	305
query34	795	874	506	506
query35	834	848	745	745
query36	958	1030	904	904
query37	121	101	78	78
query38	4173	4367	4207	4207
query39	1495	1458	1446	1446
query40	213	120	106	106
query41	54	56	50	50
query42	123	99	103	99
query43	494	501	483	483
query44	1325	801	788	788
query45	184	175	169	169
query46	846	1044	650	650
query47	1870	1884	1862	1862
query48	377	424	298	298
query49	768	511	423	423
query50	695	752	420	420
query51	4311	4423	4263	4263
query52	110	108	102	102
query53	250	276	199	199
query54	522	530	435	435
query55	87	83	85	83
query56	277	275	268	268
query57	1159	1207	1124	1124
query58	250	245	234	234
query59	2732	2775	2671	2671
query60	297	293	276	276
query61	156	127	118	118
query62	780	731	676	676
query63	233	194	183	183
query64	3275	1046	676	676
query65	4512	4442	4473	4442
query66	793	413	301	301
query67	16391	15581	15613	15581
query68	8684	889	490	490
query69	498	304	262	262
query70	1221	1145	1072	1072
query71	453	289	263	263
query72	5208	3552	3724	3552
query73	776	740	354	354
query74	8993	9360	8695	8695
query75	3975	3130	2667	2667
query76	3758	1198	757	757
query77	791	365	272	272
query78	9896	10155	9264	9264
query79	1186	814	583	583
query80	604	520	454	454
query81	464	259	231	231
query82	194	129	96	96
query83	168	164	161	161
query84	246	96	76	76
query85	744	388	374	374
query86	322	314	308	308
query87	4407	4731	4302	4302
query88	2976	2276	2253	2253
query89	406	313	280	280
query90	1977	219	217	217
query91	137	138	112	112
query92	75	59	55	55
query93	1126	1065	574	574
query94	656	393	318	318
query95	345	273	266	266
query96	488	573	281	281
query97	3295	3430	3267	3267
query98	227	208	203	203
query99	1409	1405	1292	1292
Total cold run time: 275708 ms
Total hot run time: 192210 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.2 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1a91eda23b0c3de96085557eb1fabb93cdf6949f, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.04
query3	0.23	0.07	0.07
query4	1.63	0.10	0.10
query5	0.57	0.53	0.55
query6	1.20	0.71	0.72
query7	0.02	0.02	0.01
query8	0.04	0.04	0.04
query9	0.58	0.53	0.51
query10	0.58	0.63	0.59
query11	0.16	0.11	0.11
query12	0.15	0.12	0.12
query13	0.62	0.60	0.60
query14	2.66	2.82	2.82
query15	0.93	0.85	0.87
query16	0.38	0.38	0.41
query17	1.04	1.00	1.02
query18	0.21	0.19	0.20
query19	1.95	1.81	2.01
query20	0.01	0.02	0.01
query21	15.35	0.91	0.55
query22	0.75	1.32	0.67
query23	14.82	1.41	0.58
query24	6.79	1.97	0.95
query25	0.48	0.33	0.07
query26	0.59	0.17	0.14
query27	0.05	0.05	0.05
query28	8.97	0.82	0.43
query29	12.65	4.04	3.33
query30	0.26	0.09	0.07
query31	2.84	0.59	0.39
query32	3.22	0.55	0.46
query33	2.98	3.09	3.11
query34	15.88	5.13	4.55
query35	4.57	4.54	4.60
query36	0.71	0.51	0.49
query37	0.09	0.06	0.07
query38	0.05	0.05	0.04
query39	0.03	0.03	0.03
query40	0.18	0.14	0.13
query41	0.09	0.04	0.02
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 104.5 s
Total hot run time: 31.2 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 16, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 5f47b89 into apache:master Mar 16, 2025
28 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 16, 2025
…ce (#48955)

### What problem does this PR solve?

Problem Summary:
As this issue describes apache/hudi#12918
While debugging a memory leak issue, I noticed a steady increase in the
number of BitCaskDiskMap instances through jmap logs. Using VisualVM, I
found that a memory leak may exist in BitCaskDiskMap due to a circular
reference.

![image](https://github.com/user-attachments/assets/0af6fa66-4c61-46dc-9c67-df1ae18c6607)
- The cleanup method is an instance method that implicitly depends on
the this pointer
- When cleanup is called from shutdownThread, it holds a reference to
this
- Meanwhile, the DiskMap class holds a reference to shutdownThread

![image](https://github.com/user-attachments/assets/3a762e6d-4ebe-4177-a6ff-c01afa5b8180)

This creates a circular reference:
```text
  DiskMap (this) -> shutdownThread -> cleanup -> DiskMap (this)
```
github-actions bot pushed a commit that referenced this pull request Mar 16, 2025
…ce (#48955)

### What problem does this PR solve?

Problem Summary:
As this issue describes apache/hudi#12918
While debugging a memory leak issue, I noticed a steady increase in the
number of BitCaskDiskMap instances through jmap logs. Using VisualVM, I
found that a memory leak may exist in BitCaskDiskMap due to a circular
reference.

![image](https://github.com/user-attachments/assets/0af6fa66-4c61-46dc-9c67-df1ae18c6607)
- The cleanup method is an instance method that implicitly depends on
the this pointer
- When cleanup is called from shutdownThread, it holds a reference to
this
- Meanwhile, the DiskMap class holds a reference to shutdownThread

![image](https://github.com/user-attachments/assets/3a762e6d-4ebe-4177-a6ff-c01afa5b8180)

This creates a circular reference:
```text
  DiskMap (this) -> shutdownThread -> cleanup -> DiskMap (this)
```
dataroaring pushed a commit that referenced this pull request Mar 17, 2025
…ce (#48955)

### What problem does this PR solve?

Problem Summary:
As this issue describes apache/hudi#12918
While debugging a memory leak issue, I noticed a steady increase in the
number of BitCaskDiskMap instances through jmap logs. Using VisualVM, I
found that a memory leak may exist in BitCaskDiskMap due to a circular
reference.

![image](https://github.com/user-attachments/assets/0af6fa66-4c61-46dc-9c67-df1ae18c6607)
- The cleanup method is an instance method that implicitly depends on
the this pointer
- When cleanup is called from shutdownThread, it holds a reference to
this
- Meanwhile, the DiskMap class holds a reference to shutdownThread

![image](https://github.com/user-attachments/assets/3a762e6d-4ebe-4177-a6ff-c01afa5b8180)

This creates a circular reference:
```text
  DiskMap (this) -> shutdownThread -> cleanup -> DiskMap (this)
```
dataroaring pushed a commit that referenced this pull request Mar 19, 2025
…ular Reference #48955 (#49114)

Cherry-picked from #48955

Co-authored-by: Socrates <suyiteng@selectdb.com>
yiguolei pushed a commit that referenced this pull request Mar 21, 2025
…ular Reference #48955 (#49115)

Cherry-picked from #48955

Co-authored-by: Socrates <suyiteng@selectdb.com>
@gavinchou gavinchou mentioned this pull request Apr 23, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…ce (apache#48955)

### What problem does this PR solve?

Problem Summary:
As this issue describes apache/hudi#12918
While debugging a memory leak issue, I noticed a steady increase in the
number of BitCaskDiskMap instances through jmap logs. Using VisualVM, I
found that a memory leak may exist in BitCaskDiskMap due to a circular
reference.

![image](https://github.com/user-attachments/assets/0af6fa66-4c61-46dc-9c67-df1ae18c6607)
- The cleanup method is an instance method that implicitly depends on
the this pointer
- When cleanup is called from shutdownThread, it holds a reference to
this
- Meanwhile, the DiskMap class holds a reference to shutdownThread

![image](https://github.com/user-attachments/assets/3a762e6d-4ebe-4177-a6ff-c01afa5b8180)

This creates a circular reference:
```text
  DiskMap (this) -> shutdownThread -> cleanup -> DiskMap (this)
```
@suxiaogang223 suxiaogang223 deleted the fix_hudi_memory_leak_master branch July 10, 2025 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.9-merged dev/3.0.5-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants