Skip to content

Conversation

@BePPPower
Copy link
Contributor

@BePPPower BePPPower commented Nov 1, 2024

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
There are 2 locks, lock of ExportMgr and lock of ExportJob(synchronized lock).
Previously, the lock order is wrong:

  1. When cancelling job, it will first get job lock, then getting mgr lock.
  2. When removing old job, it will first get mgr lock, then getting job lock.

This PR fix it by always getting job lock after mgr lock, to avoid dead lock.

Check List (For Committer)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No colde files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.
  • Release note

    None

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@BePPPower
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41236 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6c99ab9e5c17ba5e6a62a59b45e3298d7638c8bc, data reload: false

------ Round 1 ----------------------------------
q1	17957	8161	7282	7282
q2	2063	167	160	160
q3	10684	1069	1200	1069
q4	10377	851	917	851
q5	7783	3101	3049	3049
q6	230	142	142	142
q7	996	603	595	595
q8	9347	1988	2026	1988
q9	6683	6471	6500	6471
q10	7018	2454	2459	2454
q11	458	247	260	247
q12	414	211	210	210
q13	17776	2991	3022	2991
q14	243	223	209	209
q15	563	522	528	522
q16	631	590	578	578
q17	992	524	588	524
q18	7339	6616	6700	6616
q19	1323	1070	995	995
q20	474	177	179	177
q21	3951	3120	3185	3120
q22	1077	1029	986	986
Total cold run time: 108379 ms
Total hot run time: 41236 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7309	7235	7204	7204
q2	339	224	222	222
q3	3070	2918	2956	2918
q4	2098	1811	1803	1803
q5	5693	5807	5760	5760
q6	239	148	145	145
q7	2287	1830	1822	1822
q8	3427	3539	3441	3441
q9	9030	9040	8937	8937
q10	3584	3570	3590	3570
q11	615	499	497	497
q12	825	632	646	632
q13	10561	3155	3183	3155
q14	311	270	268	268
q15	582	522	511	511
q16	684	643	660	643
q17	1893	1639	1629	1629
q18	8371	7844	7676	7676
q19	1750	1512	1593	1512
q20	2119	1860	1848	1848
q21	5667	5411	5384	5384
q22	1232	1041	1019	1019
Total cold run time: 71686 ms
Total hot run time: 60596 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 195437 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6c99ab9e5c17ba5e6a62a59b45e3298d7638c8bc, data reload: false

query1	1200	913	877	877
query2	6230	2086	2031	2031
query3	10796	4006	3942	3942
query4	67076	27023	23611	23611
query5	4916	449	440	440
query6	398	172	184	172
query7	5598	296	289	289
query8	309	229	218	218
query9	8959	2734	2695	2695
query10	464	269	253	253
query11	17430	15118	15889	15118
query12	165	100	103	100
query13	1554	431	423	423
query14	9828	6747	7519	6747
query15	207	188	202	188
query16	7193	493	462	462
query17	999	571	556	556
query18	1815	295	290	290
query19	187	148	145	145
query20	118	109	108	108
query21	200	110	104	104
query22	4807	4390	4528	4390
query23	34283	34060	34030	34030
query24	6055	2786	2759	2759
query25	504	391	392	391
query26	645	157	154	154
query27	1693	285	293	285
query28	4214	2473	2430	2430
query29	684	419	424	419
query30	228	155	155	155
query31	994	799	819	799
query32	66	53	67	53
query33	410	277	294	277
query34	924	513	529	513
query35	864	735	727	727
query36	1063	949	951	949
query37	120	70	76	70
query38	4509	4198	4287	4198
query39	1488	1427	1433	1427
query40	196	97	98	97
query41	49	44	44	44
query42	114	101	97	97
query43	536	481	490	481
query44	1146	807	809	807
query45	177	162	163	162
query46	1139	682	692	682
query47	1961	1866	1907	1866
query48	431	317	318	317
query49	744	405	397	397
query50	798	382	397	382
query51	7282	7096	7148	7096
query52	99	91	87	87
query53	255	179	182	179
query54	519	400	404	400
query55	76	74	76	74
query56	263	246	254	246
query57	1305	1186	1155	1155
query58	213	210	202	202
query59	3130	2910	3034	2910
query60	265	239	235	235
query61	98	97	97	97
query62	791	669	693	669
query63	211	186	186	186
query64	1338	615	624	615
query65	3290	3211	3227	3211
query66	728	302	318	302
query67	16081	15970	15786	15786
query68	3485	594	610	594
query69	403	249	244	244
query70	1174	1136	1163	1136
query71	348	254	257	254
query72	6170	3912	3951	3912
query73	752	363	365	363
query74	10081	9038	9000	9000
query75	3404	2666	2655	2655
query76	1775	1050	1198	1050
query77	478	273	264	264
query78	10409	9384	9353	9353
query79	1428	596	611	596
query80	873	417	444	417
query81	508	239	239	239
query82	1297	116	115	115
query83	173	133	154	133
query84	278	71	70	70
query85	873	294	275	275
query86	329	291	285	285
query87	4919	4804	4637	4637
query88	3463	2211	2158	2158
query89	416	303	288	288
query90	2023	187	184	184
query91	130	97	98	97
query92	62	48	47	47
query93	1909	546	551	546
query94	778	289	289	289
query95	340	251	235	235
query96	616	272	293	272
query97	2917	2671	2709	2671
query98	209	200	198	198
query99	2056	1283	1334	1283
Total cold run time: 316504 ms
Total hot run time: 195437 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.94 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6c99ab9e5c17ba5e6a62a59b45e3298d7638c8bc, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.06	0.06
query4	1.66	0.07	0.08
query5	0.40	0.39	0.41
query6	1.17	0.68	0.66
query7	0.02	0.02	0.02
query8	0.06	0.05	0.04
query9	0.55	0.50	0.48
query10	0.56	0.56	0.56
query11	0.16	0.12	0.12
query12	0.16	0.13	0.13
query13	0.61	0.59	0.60
query14	2.72	2.76	2.72
query15	0.90	0.83	0.84
query16	0.38	0.38	0.38
query17	1.06	1.04	1.06
query18	0.18	0.18	0.19
query19	1.97	1.82	1.91
query20	0.01	0.02	0.01
query21	15.36	0.67	0.66
query22	4.49	7.11	1.77
query23	18.28	1.36	1.22
query24	2.05	0.24	0.22
query25	0.14	0.09	0.08
query26	0.28	0.18	0.18
query27	0.08	0.08	0.08
query28	13.29	1.17	1.13
query29	12.62	3.43	3.40
query30	0.24	0.06	0.06
query31	2.86	0.41	0.40
query32	3.24	0.49	0.47
query33	2.95	3.08	3.05
query34	16.95	4.53	4.48
query35	4.56	4.53	4.49
query36	0.66	0.50	0.47
query37	0.18	0.15	0.15
query38	0.16	0.16	0.16
query39	0.05	0.04	0.04
query40	0.16	0.13	0.13
query41	0.09	0.04	0.04
query42	0.06	0.05	0.05
query43	0.04	0.04	0.04
Total cold run time: 111.7 s
Total hot run time: 32.94 s

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR approved by anyone and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 1, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2024

PR approved by at least one committer and no changes requested.

morningman pushed a commit that referenced this pull request Nov 1, 2024
### What problem does this PR solve?
bp: #43083
@morningman morningman merged commit 399b437 into apache:master Nov 2, 2024
morningman pushed a commit to morningman/doris that referenced this pull request Nov 7, 2024
Problem Summary:
There are 2 locks, lock of ExportMgr and lock of ExportJob(synchronized
lock).
Previously, the lock order is wrong:
1. When cancelling job, it will first get job lock, then getting mgr
lock.
2. When removing old job, it will first get mgr lock, then getting job
lock.

This PR fix it by always getting job lock after mgr lock, to avoid dead lock.
morningman added a commit that referenced this pull request Nov 7, 2024
cherry-pick #43083

Co-authored-by: Tiewei Fang <43782773+BePPPower@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants