Skip to content

Conversation

@hubgeter
Copy link
Contributor

@hubgeter hubgeter commented Nov 20, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #42507

Problem Summary:
fix jvm metrics memory leak.

when you set enable_jvm_monitor=true in be.conf, you can find that be jvm memory is slowly growing.
By analyzing the hprof file, we can find that there are a large number of java.lang.management.ThreadInfo objects.
The specific cause of the memory leak is: jni does not manually delete the local reference after getting the object from the array, resulting in the object not being GC.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
        just fix memory leak no logic has been changed
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?
fix jvm metrics memory leak.before pr apache#42507

when you set `enable_jvm_monitor=true` in be.conf, you can find that be
jvm memory is slowly growing.
By analyzing the hprof file, we can find that there are a large number
of `java.lang.management.ThreadInfo` objects.
The specific cause of the memory leak is: jni does not manually delete
the local reference after getting the object from the array, resulting
in the object not being GC.
@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hubgeter
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 45171 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2cbc1124bb708318ba8a5714386ab7fe143f481f, data reload: false

------ Round 1 ----------------------------------
q1	17566	7494	7321	7321
q2	2259	1166	1179	1166
q3	9962	1162	1161	1161
q4	10234	769	700	700
q5	7606	2733	2735	2733
q6	238	151	146	146
q7	980	643	604	604
q8	9348	2358	2368	2358
q9	6596	6413	6463	6413
q10	7069	2287	2325	2287
q11	489	265	252	252
q12	459	221	219	219
q13	17777	3036	3082	3036
q14	241	209	213	209
q15	588	521	519	519
q16	653	591	582	582
q17	994	570	508	508
q18	7197	6782	6654	6654
q19	1339	1027	983	983
q20	2907	2703	2660	2660
q21	3930	3431	3314	3314
q22	1399	1355	1346	1346
Total cold run time: 109831 ms
Total hot run time: 45171 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7342	7299	7421	7299
q2	333	231	241	231
q3	3063	3033	2977	2977
q4	2082	1811	1788	1788
q5	5608	5664	5729	5664
q6	219	143	138	138
q7	2194	1798	1880	1798
q8	3314	3532	3518	3518
q9	8930	8880	8889	8880
q10	3613	3607	3586	3586
q11	591	515	520	515
q12	831	632	635	632
q13	10045	3282	3278	3278
q14	300	284	270	270
q15	570	533	503	503
q16	684	657	650	650
q17	1883	1640	1617	1617
q18	8329	7654	7794	7654
q19	1696	1556	1498	1498
q20	2091	1896	1964	1896
q21	5678	5429	5414	5414
q22	674	566	551	551
Total cold run time: 70070 ms
Total hot run time: 60357 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.02% (9899/26035)
Line Coverage: 29.22% (82841/283540)
Region Coverage: 28.34% (42521/150058)
Branch Coverage: 24.90% (21563/86584)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2cbc1124bb708318ba8a5714386ab7fe143f481f_2cbc1124bb708318ba8a5714386ab7fe143f481f/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 32.15 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2cbc1124bb708318ba8a5714386ab7fe143f481f, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.25	0.07	0.07
query4	1.62	0.10	0.10
query5	0.41	0.43	0.40
query6	1.19	0.65	0.66
query7	0.02	0.01	0.02
query8	0.04	0.04	0.03
query9	0.60	0.49	0.51
query10	0.57	0.54	0.56
query11	0.15	0.11	0.12
query12	0.14	0.11	0.11
query13	0.61	0.60	0.61
query14	2.84	2.86	2.86
query15	0.90	0.84	0.83
query16	0.39	0.40	0.38
query17	1.04	0.99	1.02
query18	0.20	0.21	0.21
query19	1.98	1.86	1.99
query20	0.01	0.02	0.01
query21	15.37	0.59	0.58
query22	2.59	2.47	1.80
query23	17.02	1.07	0.69
query24	2.84	0.59	1.63
query25	0.15	0.26	0.05
query26	0.49	0.14	0.14
query27	0.05	0.05	0.03
query28	10.70	1.10	1.08
query29	12.55	3.27	3.22
query30	0.26	0.06	0.06
query31	2.85	0.37	0.38
query32	3.27	0.46	0.49
query33	3.02	2.97	3.05
query34	17.13	4.50	4.49
query35	4.58	4.57	4.52
query36	0.66	0.50	0.47
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.02	0.03
query40	0.16	0.13	0.13
query41	0.08	0.04	0.02
query42	0.04	0.03	0.02
query43	0.03	0.04	0.03
Total cold run time: 107.08 s
Total hot run time: 32.15 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 21, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit 0ef2c37 into apache:master Nov 21, 2024
github-actions bot pushed a commit that referenced this pull request Nov 21, 2024
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #42507

Problem Summary:
fix jvm metrics memory leak.

when you set `enable_jvm_monitor=true` in be.conf, you can find that be
jvm memory is slowly growing.
By analyzing the hprof file, we can find that there are a large number
of `java.lang.management.ThreadInfo` objects.
The specific cause of the memory leak is: jni does not manually delete
the local reference after getting the object from the array, resulting
in the object not being GC.
yiguolei pushed a commit that referenced this pull request Nov 22, 2024
Cherry-picked from #44311

Co-authored-by: daidai <changyuwei@selectdb.com>
@yiguolei yiguolei mentioned this pull request Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants