Skip to content

Conversation

@924060929
Copy link
Contributor

@924060929 924060929 commented Mar 24, 2025

What problem does this PR solve?

the http://<fe_ip>:<fe_http_port>/metrics maybe made the frontends hung, when the threads num too large, this pr replace ThreadMXBean.getThreadInfos to ThreadMXBean.dumpAllThreads to optimize performance

see also:

  1. https://issues.apache.org/jira/browse/HADOOP-16850
  2. https://bugs.openjdk.org/browse/JDK-8185005
  3. [Bug] Too many threads will cause blocked skywalking#9190

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@924060929
Copy link
Contributor Author

run buildall

@924060929 924060929 requested a review from morningman March 24, 2025 03:52
@doris-robot
Copy link

TPC-H: Total hot run time: 34040 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b5aea5fc2a885ec762d650b422790b19040350a2, data reload: false

------ Round 1 ----------------------------------
q1	24395	5038	4987	4987
q2	2040	296	193	193
q3	10382	1230	686	686
q4	10233	999	527	527
q5	7533	2427	2304	2304
q6	181	168	130	130
q7	901	741	597	597
q8	9311	1305	1058	1058
q9	6900	5123	5145	5123
q10	6793	2321	1911	1911
q11	487	284	276	276
q12	341	350	213	213
q13	17769	3667	3082	3082
q14	238	240	215	215
q15	544	485	474	474
q16	609	648	605	605
q17	579	849	343	343
q18	7576	7154	7106	7106
q19	1549	957	576	576
q20	320	318	185	185
q21	3853	3438	2477	2477
q22	1030	993	972	972
Total cold run time: 113564 ms
Total hot run time: 34040 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5154	5128	5130	5128
q2	233	324	230	230
q3	2187	2675	2288	2288
q4	1400	1802	1377	1377
q5	4387	4388	4401	4388
q6	210	166	128	128
q7	1995	1927	1793	1793
q8	2586	2611	2519	2519
q9	7353	7296	7061	7061
q10	3027	3198	2762	2762
q11	570	506	507	506
q12	668	792	641	641
q13	3510	3926	3270	3270
q14	290	298	270	270
q15	543	483	470	470
q16	647	709	660	660
q17	1125	1584	1325	1325
q18	7759	7560	7528	7528
q19	804	795	875	795
q20	1984	1957	1831	1831
q21	5227	4977	4896	4896
q22	1071	1026	1020	1020
Total cold run time: 52730 ms
Total hot run time: 50886 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193686 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b5aea5fc2a885ec762d650b422790b19040350a2, data reload: false

query1	1436	1069	1026	1026
query2	6144	1943	1953	1943
query3	11039	4622	4569	4569
query4	54398	25494	23035	23035
query5	5058	567	445	445
query6	337	191	183	183
query7	4925	486	279	279
query8	326	258	239	239
query9	6073	2618	2631	2618
query10	424	331	247	247
query11	15206	15129	14813	14813
query12	164	117	106	106
query13	1078	523	402	402
query14	10140	6451	8045	6451
query15	206	206	188	188
query16	7113	660	501	501
query17	1114	751	592	592
query18	1537	428	326	326
query19	206	203	170	170
query20	131	133	128	128
query21	218	126	102	102
query22	4315	4402	4503	4402
query23	34115	33466	33502	33466
query24	5736	2434	2413	2413
query25	460	450	423	423
query26	728	281	140	140
query27	1923	498	327	327
query28	2816	2456	2446	2446
query29	607	583	467	467
query30	278	224	196	196
query31	886	870	767	767
query32	70	64	65	64
query33	476	363	317	317
query34	878	869	494	494
query35	828	824	782	782
query36	947	1015	926	926
query37	123	96	72	72
query38	4275	4207	4121	4121
query39	1499	1449	1424	1424
query40	210	119	111	111
query41	53	51	51	51
query42	128	106	106	106
query43	525	511	508	508
query44	1380	835	829	829
query45	180	178	169	169
query46	871	1037	645	645
query47	1869	1843	1829	1829
query48	395	428	309	309
query49	708	519	461	461
query50	695	747	414	414
query51	4297	4386	4209	4209
query52	113	110	95	95
query53	231	265	184	184
query54	491	512	420	420
query55	81	83	82	82
query56	281	279	249	249
query57	1176	1183	1196	1183
query58	256	266	251	251
query59	2895	2988	2882	2882
query60	287	286	285	285
query61	132	132	130	130
query62	750	733	676	676
query63	230	189	194	189
query64	2118	1068	678	678
query65	4459	4354	4322	4322
query66	744	387	294	294
query67	15816	15614	15299	15299
query68	6979	890	506	506
query69	539	296	268	268
query70	1223	1126	1135	1126
query71	498	292	266	266
query72	5920	5027	5073	5027
query73	1337	601	346	346
query74	9142	9078	8722	8722
query75	3923	3245	2746	2746
query76	4228	1199	747	747
query77	616	363	292	292
query78	10226	10176	9240	9240
query79	2790	812	579	579
query80	763	523	471	471
query81	488	263	229	229
query82	642	124	97	97
query83	273	176	151	151
query84	294	96	78	78
query85	783	360	308	308
query86	372	290	282	282
query87	4478	4624	4662	4624
query88	3510	2273	2271	2271
query89	406	312	278	278
query90	1835	215	207	207
query91	151	148	114	114
query92	76	108	57	57
query93	2202	1045	580	580
query94	670	410	316	316
query95	358	279	266	266
query96	496	564	277	277
query97	3306	3399	3271	3271
query98	234	202	202	202
query99	1411	1406	1272	1272
Total cold run time: 299773 ms
Total hot run time: 193686 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.57 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b5aea5fc2a885ec762d650b422790b19040350a2, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.10
query3	0.24	0.19	0.20
query4	1.59	0.20	0.19
query5	0.60	0.60	0.60
query6	1.19	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.57	0.53	0.53
query10	0.58	0.58	0.56
query11	0.16	0.12	0.11
query12	0.16	0.11	0.12
query13	0.62	0.60	0.60
query14	2.78	2.81	2.68
query15	0.93	0.84	0.85
query16	0.38	0.39	0.38
query17	1.06	1.01	1.00
query18	0.20	0.19	0.20
query19	1.91	1.95	1.86
query20	0.01	0.01	0.02
query21	15.36	0.90	0.55
query22	0.77	1.25	0.98
query23	14.70	1.39	0.64
query24	7.51	1.61	0.70
query25	0.51	0.23	0.24
query26	0.68	0.16	0.14
query27	0.05	0.05	0.05
query28	9.13	0.89	0.43
query29	12.53	3.97	3.32
query30	0.24	0.09	0.06
query31	2.81	0.60	0.38
query32	3.23	0.54	0.47
query33	3.00	3.06	3.11
query34	15.73	5.12	4.54
query35	4.52	4.50	4.47
query36	0.66	0.50	0.48
query37	0.09	0.07	0.06
query38	0.05	0.03	0.04
query39	0.03	0.03	0.03
query40	0.18	0.13	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.03
query43	0.03	0.03	0.03
Total cold run time: 105.13 s
Total hot run time: 31.57 s

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 24, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@924060929
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34480 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b5aea5fc2a885ec762d650b422790b19040350a2, data reload: false

------ Round 1 ----------------------------------
q1	25825	5347	5070	5070
q2	2067	287	174	174
q3	10398	1237	690	690
q4	10227	998	571	571
q5	7671	2389	2362	2362
q6	188	164	132	132
q7	913	770	618	618
q8	9345	1218	1209	1209
q9	6796	5151	5135	5135
q10	6822	2328	1893	1893
q11	478	268	248	248
q12	344	351	221	221
q13	17762	3735	3066	3066
q14	230	227	208	208
q15	539	481	485	481
q16	639	626	601	601
q17	546	849	351	351
q18	7633	7216	7189	7189
q19	1214	961	547	547
q20	318	337	186	186
q21	4465	2735	2529	2529
q22	1039	999	1019	999
Total cold run time: 115459 ms
Total hot run time: 34480 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5430	5455	5494	5455
q2	237	334	230	230
q3	2188	2652	2335	2335
q4	1412	1820	1491	1491
q5	4533	4429	4382	4382
q6	207	167	126	126
q7	2013	1908	1733	1733
q8	2569	2662	2513	2513
q9	7146	7119	7177	7119
q10	2989	3204	2738	2738
q11	589	495	499	495
q12	676	740	652	652
q13	3636	3959	3257	3257
q14	280	317	270	270
q15	530	474	461	461
q16	628	680	610	610
q17	1133	1543	1372	1372
q18	7709	7530	7547	7530
q19	792	810	823	810
q20	1959	2028	1941	1941
q21	5253	4820	4752	4752
q22	1096	1051	976	976
Total cold run time: 53005 ms
Total hot run time: 51248 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184348 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b5aea5fc2a885ec762d650b422790b19040350a2, data reload: false

query1	1006	483	485	483
query2	6552	2018	1969	1969
query3	6795	221	217	217
query4	26237	23770	22910	22910
query5	4338	621	486	486
query6	292	204	195	195
query7	4605	490	280	280
query8	296	256	242	242
query9	8663	2596	2579	2579
query10	489	314	260	260
query11	15959	15104	14884	14884
query12	168	110	105	105
query13	1658	543	404	404
query14	9935	6061	6081	6061
query15	206	195	171	171
query16	7642	639	452	452
query17	1178	699	545	545
query18	2010	400	306	306
query19	188	176	164	164
query20	117	117	118	117
query21	211	159	100	100
query22	4275	4317	4074	4074
query23	33927	32933	33008	32933
query24	8533	2385	2330	2330
query25	520	458	392	392
query26	1233	265	144	144
query27	2755	501	320	320
query28	4363	2417	2381	2381
query29	722	559	437	437
query30	283	219	189	189
query31	952	875	741	741
query32	70	64	64	64
query33	571	386	309	309
query34	772	842	484	484
query35	817	815	767	767
query36	977	995	899	899
query37	120	98	83	83
query38	4186	4144	4012	4012
query39	1459	1393	1394	1393
query40	214	122	110	110
query41	59	58	57	57
query42	119	105	109	105
query43	504	503	476	476
query44	1308	807	800	800
query45	178	176	172	172
query46	853	1032	646	646
query47	1788	1812	1781	1781
query48	381	417	297	297
query49	811	536	443	443
query50	687	726	411	411
query51	4125	4181	4201	4181
query52	107	107	101	101
query53	214	249	178	178
query54	482	496	435	435
query55	84	81	83	81
query56	267	283	277	277
query57	1142	1160	1085	1085
query58	256	235	235	235
query59	2869	2804	2738	2738
query60	289	281	254	254
query61	133	132	133	132
query62	793	756	660	660
query63	226	181	181	181
query64	4276	1077	699	699
query65	4434	4349	4353	4349
query66	1051	417	330	330
query67	15732	15744	15375	15375
query68	8672	876	508	508
query69	463	310	258	258
query70	1197	1130	1080	1080
query71	468	306	264	264
query72	5275	4940	2640	2640
query73	692	562	404	404
query74	8893	9043	8947	8947
query75	4115	3198	2736	2736
query76	3697	1151	744	744
query77	792	354	281	281
query78	9985	10153	9309	9309
query79	2529	827	564	564
query80	667	528	460	460
query81	473	257	224	224
query82	473	127	98	98
query83	222	185	162	162
query84	287	106	83	83
query85	801	363	319	319
query86	334	292	297	292
query87	4544	4550	4424	4424
query88	3387	2263	2263	2263
query89	395	317	274	274
query90	1950	218	207	207
query91	147	149	113	113
query92	81	64	62	62
query93	1378	1097	590	590
query94	677	412	306	306
query95	362	278	270	270
query96	494	582	277	277
query97	3434	3386	3243	3243
query98	235	214	206	206
query99	1487	1418	1269	1269
Total cold run time: 277068 ms
Total hot run time: 184348 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b5aea5fc2a885ec762d650b422790b19040350a2, data reload: false

query1	0.04	0.04	0.03
query2	0.12	0.11	0.10
query3	0.25	0.19	0.19
query4	1.60	0.20	0.19
query5	0.59	0.61	0.61
query6	1.20	0.71	0.71
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.59	0.53	0.53
query10	0.57	0.58	0.56
query11	0.15	0.11	0.11
query12	0.14	0.11	0.11
query13	0.62	0.60	0.60
query14	2.73	2.70	2.72
query15	0.95	0.84	0.85
query16	0.39	0.38	0.38
query17	1.03	1.07	1.03
query18	0.21	0.19	0.19
query19	1.95	1.97	1.87
query20	0.01	0.01	0.02
query21	15.36	0.91	0.54
query22	0.78	1.22	0.68
query23	14.88	1.37	0.60
query24	7.41	1.45	0.52
query25	0.52	0.20	0.11
query26	0.61	0.16	0.13
query27	0.05	0.05	0.04
query28	9.82	0.87	0.45
query29	12.55	3.96	3.27
query30	0.25	0.08	0.06
query31	2.84	0.60	0.38
query32	3.23	0.54	0.48
query33	3.04	3.05	3.11
query34	15.80	5.14	4.60
query35	4.57	4.57	4.50
query36	0.69	0.50	0.48
query37	0.09	0.06	0.06
query38	0.06	0.05	0.04
query39	0.03	0.03	0.03
query40	0.18	0.13	0.13
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.05	0.04	0.03
Total cold run time: 106.13 s
Total hot run time: 31.04 s

@924060929 924060929 merged commit a65e080 into apache:master Mar 25, 2025
31 of 33 checks passed
@924060929 924060929 deleted the opt-jvm-metrics branch March 25, 2025 06:46
github-actions bot pushed a commit that referenced this pull request Mar 25, 2025
the `http://<fe_ip>:<fe_http_port>/metrics` maybe made the frontends
hung, when the threads num too large, this pr replace
`ThreadMXBean.getThreadInfos` to `ThreadMXBean.dumpAllThreads` to
optimize performance

see also:
1. https://issues.apache.org/jira/browse/HADOOP-16850
2. https://bugs.openjdk.org/browse/JDK-8185005
3. apache/skywalking#9190
github-actions bot pushed a commit that referenced this pull request Mar 25, 2025
the `http://<fe_ip>:<fe_http_port>/metrics` maybe made the frontends
hung, when the threads num too large, this pr replace
`ThreadMXBean.getThreadInfos` to `ThreadMXBean.dumpAllThreads` to
optimize performance

see also:
1. https://issues.apache.org/jira/browse/HADOOP-16850
2. https://bugs.openjdk.org/browse/JDK-8185005
3. apache/skywalking#9190
@yiguolei yiguolei added the usercase Important user case type label label Mar 25, 2025
yiguolei pushed a commit that referenced this pull request Mar 26, 2025
…49380 (#49455)

Cherry-picked from #49380

Co-authored-by: 924060929 <lanhuajian@selectdb.com>
dataroaring pushed a commit that referenced this pull request Mar 27, 2025
…49380 (#49454)

Cherry-picked from #49380

Co-authored-by: 924060929 <lanhuajian@selectdb.com>
@gavinchou gavinchou mentioned this pull request Apr 23, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
the `http://<fe_ip>:<fe_http_port>/metrics` maybe made the frontends
hung, when the threads num too large, this pr replace
`ThreadMXBean.getThreadInfos` to `ThreadMXBean.dumpAllThreads` to
optimize performance

see also:
1. https://issues.apache.org/jira/browse/HADOOP-16850
2. https://bugs.openjdk.org/browse/JDK-8185005
3. apache/skywalking#9190
deardeng pushed a commit to deardeng/incubator-doris that referenced this pull request Dec 19, 2025
the `http://<fe_ip>:<fe_http_port>/metrics` maybe made the frontends
hung, when the threads num too large, this pr replace
`ThreadMXBean.getThreadInfos` to `ThreadMXBean.dumpAllThreads` to
optimize performance

see also:
1. https://issues.apache.org/jira/browse/HADOOP-16850
2. https://bugs.openjdk.org/browse/JDK-8185005
3. apache/skywalking#9190

(cherry picked from commit a65e080)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.9-merged dev/3.0.5-merged reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants