Skip to content

Conversation

@whutpencil
Copy link
Contributor

@whutpencil whutpencil commented Jun 11, 2024

Proposed changes

Issue Number: close #xxx

When I add computing nodes to the cluster, a large number of warning logs will appear in the FE log, as shown below:

2024-06-11 17:50:04,360 WARN (InternalSchemaInitializer|137) [InternalSchemaInitializer.modifyTblReplicaCount():146] Failed to scale replica of stats tbl:column_statistics to 3
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = errCode = 2, detailMessage = Failed to find enough backend, please check the replication num,replication tag and storage medium and avail capacity of backends.
Create failed replications:
replication tag: {"location" : "default"}, replication num: 3, storage medium: null
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocationImpl(PropertyAnalyzer.java:1217) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocation(PropertyAnalyzer.java:1136) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.Env.modifyTableReplicaAllocation(Env.java:4868) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.modifyTblReplicaCount(InternalSchemaInitializer.java:123) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.run(InternalSchemaInitializer.java:93) ~[doris-fe.jar:1.2-SNAPSHOT]

image

This is because the computing node is also considered a replica in the code, and the computing node does not have storage resources, resulting in an error message.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman
Copy link
Contributor

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 11, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morrySnow
Copy link
Contributor

please replace screenshot with text

Copy link
Contributor

@Jibing-Li Jibing-Li left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-H: Total hot run time: 40050 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 7b75c7015db16e09761a29b96f4d80a3d40756b1, data reload: false

------ Round 1 ----------------------------------
q1	17720	4705	4287	4287
q2	2041	203	196	196
q3	10552	1128	1049	1049
q4	10217	848	791	791
q5	7450	2706	2673	2673
q6	223	140	137	137
q7	947	611	609	609
q8	9222	2086	2096	2086
q9	9060	6499	6509	6499
q10	9054	3730	3704	3704
q11	442	235	235	235
q12	442	230	223	223
q13	17955	2984	3028	2984
q14	269	215	218	215
q15	515	464	485	464
q16	533	387	380	380
q17	989	591	690	591
q18	8123	7650	7400	7400
q19	7912	1573	1563	1563
q20	674	317	327	317
q21	4898	3893	3313	3313
q22	391	334	345	334
Total cold run time: 119629 ms
Total hot run time: 40050 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4443	4210	4238	4210
q2	364	261	271	261
q3	2996	2773	2728	2728
q4	1858	1587	1613	1587
q5	5272	5303	5300	5300
q6	219	127	128	127
q7	2156	1765	1771	1765
q8	3193	3347	3336	3336
q9	8393	8412	8361	8361
q10	3861	3742	3668	3668
q11	561	490	466	466
q12	757	595	611	595
q13	16409	3003	3014	3003
q14	287	258	260	258
q15	520	489	469	469
q16	471	404	422	404
q17	1787	1483	1458	1458
q18	7690	7543	7386	7386
q19	1708	1501	1651	1501
q20	1952	1759	1776	1759
q21	9501	4620	4654	4620
q22	644	532	590	532
Total cold run time: 75042 ms
Total hot run time: 53794 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172180 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 7b75c7015db16e09761a29b96f4d80a3d40756b1, data reload: false

query1	918	390	374	374
query2	6457	2390	2239	2239
query3	6679	209	212	209
query4	20768	17340	17373	17340
query5	4139	462	449	449
query6	257	160	155	155
query7	4594	300	295	295
query8	328	297	293	293
query9	8499	2347	2336	2336
query10	604	332	285	285
query11	10761	10108	9986	9986
query12	130	94	88	88
query13	1649	385	366	366
query14	9340	7546	6692	6692
query15	236	190	193	190
query16	7803	273	261	261
query17	1835	544	532	532
query18	1966	276	277	276
query19	201	151	147	147
query20	92	82	86	82
query21	207	129	130	129
query22	4355	4026	4055	4026
query23	33719	32952	33054	32952
query24	11935	2805	2846	2805
query25	687	355	360	355
query26	1782	150	154	150
query27	2986	317	326	317
query28	7652	2047	2006	2006
query29	1100	624	610	610
query30	264	152	149	149
query31	954	712	766	712
query32	89	52	55	52
query33	767	284	273	273
query34	986	479	475	475
query35	759	620	634	620
query36	1080	954	942	942
query37	188	72	71	71
query38	2935	2757	2744	2744
query39	853	793	785	785
query40	283	125	126	125
query41	49	47	52	47
query42	112	96	98	96
query43	591	530	553	530
query44	1213	732	738	732
query45	193	169	170	169
query46	1092	727	713	713
query47	1861	1751	1809	1751
query48	372	303	306	303
query49	1169	409	398	398
query50	766	386	390	386
query51	6834	6691	6704	6691
query52	103	92	100	92
query53	362	283	281	281
query54	1010	448	436	436
query55	75	74	75	74
query56	290	257	249	249
query57	1160	1099	1047	1047
query58	261	234	254	234
query59	3331	3041	3087	3041
query60	291	273	271	271
query61	91	88	89	88
query62	654	439	449	439
query63	313	286	291	286
query64	9908	2272	1701	1701
query65	3127	3249	3117	3117
query66	1354	323	338	323
query67	15459	15111	15007	15007
query68	4551	549	542	542
query69	449	292	297	292
query70	1193	1141	1094	1094
query71	386	270	268	268
query72	6969	5924	5485	5485
query73	739	321	328	321
query74	5985	5509	5553	5509
query75	3403	2640	2671	2640
query76	2418	940	893	893
query77	435	297	298	297
query78	10540	9892	9752	9752
query79	2667	517	518	517
query80	1059	468	466	466
query81	577	216	220	216
query82	694	96	100	96
query83	225	173	169	169
query84	241	89	86	86
query85	1924	285	265	265
query86	500	289	342	289
query87	3287	3136	3065	3065
query88	3949	2444	2439	2439
query89	462	398	379	379
query90	1822	190	198	190
query91	127	106	107	106
query92	64	49	55	49
query93	2019	525	505	505
query94	1247	192	193	192
query95	403	387	315	315
query96	584	269	268	268
query97	3225	3022	3024	3022
query98	210	205	195	195
query99	1079	863	865	863
Total cold run time: 276311 ms
Total hot run time: 172180 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 7b75c7015db16e09761a29b96f4d80a3d40756b1, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.05	0.05
query4	1.68	0.07	0.08
query5	0.50	0.50	0.49
query6	1.13	0.73	0.72
query7	0.01	0.01	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.51
query10	0.55	0.56	0.53
query11	0.15	0.11	0.12
query12	0.14	0.12	0.12
query13	0.59	0.58	0.60
query14	0.78	0.78	0.77
query15	0.82	0.82	0.81
query16	0.36	0.40	0.37
query17	0.96	0.95	1.00
query18	0.24	0.21	0.29
query19	1.76	1.67	1.68
query20	0.02	0.01	0.01
query21	15.41	0.65	0.66
query22	4.81	6.85	1.82
query23	18.32	1.39	1.28
query24	2.14	0.25	0.22
query25	0.15	0.10	0.09
query26	0.27	0.17	0.17
query27	0.08	0.08	0.09
query28	13.17	1.02	1.00
query29	12.60	3.31	3.30
query30	0.25	0.06	0.08
query31	2.83	0.39	0.38
query32	3.29	0.47	0.48
query33	2.91	2.88	2.86
query34	17.16	4.38	4.46
query35	4.45	4.45	4.45
query36	0.66	0.47	0.47
query37	0.18	0.16	0.16
query38	0.14	0.14	0.15
query39	0.04	0.03	0.04
query40	0.17	0.14	0.15
query41	0.09	0.05	0.04
query42	0.06	0.04	0.05
query43	0.04	0.03	0.04
Total cold run time: 109.85 s
Total hot run time: 30.34 s

@yiguolei
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40188 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit eb9514de11a62e7d21a3efa9b5b808cc653e0863, data reload: false

------ Round 1 ----------------------------------
q1	17633	4518	4277	4277
q2	2024	192	185	185
q3	10451	1164	1148	1148
q4	10204	882	813	813
q5	7545	2716	2723	2716
q6	229	141	142	141
q7	981	602	606	602
q8	9223	2114	2078	2078
q9	8786	6582	6570	6570
q10	8609	3731	3784	3731
q11	452	236	228	228
q12	386	228	229	228
q13	17765	2963	2999	2963
q14	276	239	240	239
q15	529	477	495	477
q16	490	395	375	375
q17	985	626	711	626
q18	8251	7556	7624	7556
q19	4368	1472	1412	1412
q20	710	325	326	325
q21	4916	3217	3295	3217
q22	352	288	281	281
Total cold run time: 115165 ms
Total hot run time: 40188 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4356	4269	4256	4256
q2	374	269	269	269
q3	3024	2810	2886	2810
q4	1973	1730	1735	1730
q5	5672	5629	5590	5590
q6	225	136	136	136
q7	2190	1804	1863	1804
q8	3327	3438	3425	3425
q9	8849	8813	8889	8813
q10	4104	3949	3869	3869
q11	589	482	501	482
q12	796	693	653	653
q13	15885	3240	3228	3228
q14	349	300	281	281
q15	562	484	500	484
q16	498	442	446	442
q17	1827	1547	1503	1503
q18	8129	8049	7905	7905
q19	3812	1705	1617	1617
q20	2113	1925	1876	1876
q21	8651	4906	4833	4833
q22	590	513	506	506
Total cold run time: 77895 ms
Total hot run time: 56512 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 173627 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit eb9514de11a62e7d21a3efa9b5b808cc653e0863, data reload: false

query1	942	371	371	371
query2	6457	1889	1826	1826
query3	6643	207	223	207
query4	28008	17464	17265	17265
query5	3699	509	484	484
query6	259	195	171	171
query7	4595	290	289	289
query8	249	199	190	190
query9	8645	2384	2382	2382
query10	430	269	283	269
query11	11514	9993	9933	9933
query12	117	90	80	80
query13	1645	379	357	357
query14	10240	7726	8083	7726
query15	223	165	169	165
query16	7417	314	314	314
query17	1795	558	540	540
query18	1417	289	284	284
query19	204	176	148	148
query20	90	81	79	79
query21	207	134	124	124
query22	4381	4059	3938	3938
query23	34164	33745	33611	33611
query24	11984	2913	2899	2899
query25	661	419	396	396
query26	1535	152	150	150
query27	2861	289	286	286
query28	7344	2048	2032	2032
query29	1001	666	624	624
query30	256	153	159	153
query31	975	806	742	742
query32	103	58	60	58
query33	758	327	336	327
query34	918	491	519	491
query35	713	577	590	577
query36	1123	944	964	944
query37	159	91	85	85
query38	2899	2905	2875	2875
query39	923	888	822	822
query40	257	126	121	121
query41	47	47	44	44
query42	111	99	98	98
query43	500	468	457	457
query44	1254	734	732	732
query45	197	162	164	162
query46	1093	745	724	724
query47	1879	1775	1767	1767
query48	375	293	299	293
query49	902	408	419	408
query50	793	393	399	393
query51	6950	6818	6800	6800
query52	105	97	89	89
query53	364	288	295	288
query54	874	446	454	446
query55	74	75	75	75
query56	289	266	276	266
query57	1114	1039	1061	1039
query58	240	243	249	243
query59	2822	2669	2555	2555
query60	302	313	270	270
query61	93	93	95	93
query62	808	633	643	633
query63	328	294	301	294
query64	10254	2209	1632	1632
query65	3190	3134	3149	3134
query66	1156	340	350	340
query67	15575	15004	14839	14839
query68	4567	535	535	535
query69	617	508	371	371
query70	1192	1110	1197	1110
query71	426	286	291	286
query72	7295	5695	5572	5572
query73	757	324	326	324
query74	6188	5818	5740	5740
query75	3433	2704	2707	2704
query76	3183	965	919	919
query77	645	329	339	329
query78	10175	9032	8975	8975
query79	3020	518	534	518
query80	1837	482	488	482
query81	584	224	221	221
query82	732	139	131	131
query83	318	173	165	165
query84	272	93	86	86
query85	1936	326	299	299
query86	462	343	319	319
query87	3288	3116	3137	3116
query88	4361	2449	2483	2449
query89	496	400	384	384
query90	1786	204	199	199
query91	136	103	102	102
query92	62	49	121	49
query93	4592	507	503	503
query94	982	217	219	217
query95	415	322	325	322
query96	619	272	282	272
query97	3212	3053	3062	3053
query98	221	204	192	192
query99	1669	1246	1234	1234
Total cold run time: 288978 ms
Total hot run time: 173627 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit eb9514de11a62e7d21a3efa9b5b808cc653e0863, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.23	0.06	0.05
query4	1.67	0.08	0.08
query5	0.49	0.51	0.50
query6	1.16	0.73	0.72
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.54	0.50	0.49
query10	0.54	0.54	0.55
query11	0.16	0.12	0.11
query12	0.15	0.12	0.13
query13	0.60	0.59	0.59
query14	0.76	0.79	0.78
query15	0.85	0.81	0.81
query16	0.34	0.37	0.38
query17	1.03	1.03	0.98
query18	0.24	0.22	0.22
query19	1.90	1.78	1.83
query20	0.01	0.01	0.03
query21	15.39	0.75	0.67
query22	4.71	7.11	1.70
query23	18.30	1.34	1.34
query24	2.08	0.22	0.21
query25	0.14	0.09	0.09
query26	0.31	0.21	0.20
query27	0.45	0.24	0.23
query28	13.39	1.01	0.99
query29	12.60	3.34	3.32
query30	0.25	0.06	0.06
query31	2.88	0.38	0.39
query32	3.26	0.48	0.47
query33	2.90	2.93	2.94
query34	16.98	4.40	4.38
query35	4.48	4.45	4.40
query36	0.66	0.46	0.51
query37	0.19	0.16	0.16
query38	0.15	0.15	0.14
query39	0.05	0.03	0.04
query40	0.16	0.12	0.13
query41	0.10	0.06	0.05
query42	0.06	0.04	0.05
query43	0.04	0.03	0.03
Total cold run time: 110.38 s
Total hot run time: 30.68 s

@yiguolei yiguolei merged commit 62913e0 into apache:master Jul 15, 2024
morningman pushed a commit to morningman/doris that referenced this pull request Jul 17, 2024
…chema three replica (apache#36130)

## Proposed changes

Issue Number: close #xxx

When I add computing nodes to the cluster, a large number of warning
logs will appear in the FE log, as shown below:

```
2024-06-11 17:50:04,360 WARN (InternalSchemaInitializer|137) [InternalSchemaInitializer.modifyTblReplicaCount():146] Failed to scale replica of stats tbl:column_statistics to 3
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = errCode = 2, detailMessage = Failed to find enough backend, please check the replication num,replication tag and storage medium and avail capacity of backends.
Create failed replications:
replication tag: {"location" : "default"}, replication num: 3, storage medium: null
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocationImpl(PropertyAnalyzer.java:1217) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocation(PropertyAnalyzer.java:1136) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.Env.modifyTableReplicaAllocation(Env.java:4868) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.modifyTblReplicaCount(InternalSchemaInitializer.java:123) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.run(InternalSchemaInitializer.java:93) ~[doris-fe.jar:1.2-SNAPSHOT]
```


![image](https://github.com/apache/doris/assets/24907215/53ea2f34-1012-4f96-a2c9-be9c2b39b772)

This is because the computing node is also considered a replica in the
code, and the computing node does not have storage resources, resulting
in an error message.

---------

Co-authored-by: camby <104178625@qq.com>
yiguolei pushed a commit that referenced this pull request Jul 17, 2024
…chema three replica (#36130) (#37961)

bp #36130

Co-authored-by: HB <137497191@qq.com>
Co-authored-by: camby <104178625@qq.com>
seawinde pushed a commit to seawinde/doris that referenced this pull request Jul 17, 2024
…chema three replica (apache#36130)

## Proposed changes

Issue Number: close #xxx

When I add computing nodes to the cluster, a large number of warning
logs will appear in the FE log, as shown below:

```
2024-06-11 17:50:04,360 WARN (InternalSchemaInitializer|137) [InternalSchemaInitializer.modifyTblReplicaCount():146] Failed to scale replica of stats tbl:column_statistics to 3
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = errCode = 2, detailMessage = Failed to find enough backend, please check the replication num,replication tag and storage medium and avail capacity of backends.
Create failed replications:
replication tag: {"location" : "default"}, replication num: 3, storage medium: null
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocationImpl(PropertyAnalyzer.java:1217) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocation(PropertyAnalyzer.java:1136) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.Env.modifyTableReplicaAllocation(Env.java:4868) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.modifyTblReplicaCount(InternalSchemaInitializer.java:123) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.run(InternalSchemaInitializer.java:93) ~[doris-fe.jar:1.2-SNAPSHOT]
```


![image](https://github.com/apache/doris/assets/24907215/53ea2f34-1012-4f96-a2c9-be9c2b39b772)

This is because the computing node is also considered a replica in the
code, and the computing node does not have storage resources, resulting
in an error message.

---------

Co-authored-by: camby <104178625@qq.com>
dataroaring pushed a commit that referenced this pull request Jul 17, 2024
…chema three replica (#36130)

## Proposed changes

Issue Number: close #xxx

When I add computing nodes to the cluster, a large number of warning
logs will appear in the FE log, as shown below:

```
2024-06-11 17:50:04,360 WARN (InternalSchemaInitializer|137) [InternalSchemaInitializer.modifyTblReplicaCount():146] Failed to scale replica of stats tbl:column_statistics to 3
org.apache.doris.common.AnalysisException: errCode = 2, detailMessage = errCode = 2, detailMessage = Failed to find enough backend, please check the replication num,replication tag and storage medium and avail capacity of backends.
Create failed replications:
replication tag: {"location" : "default"}, replication num: 3, storage medium: null
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocationImpl(PropertyAnalyzer.java:1217) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.common.util.PropertyAnalyzer.analyzeReplicaAllocation(PropertyAnalyzer.java:1136) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.Env.modifyTableReplicaAllocation(Env.java:4868) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.modifyTblReplicaCount(InternalSchemaInitializer.java:123) ~[doris-fe.jar:1.2-SNAPSHOT]
    at org.apache.doris.catalog.InternalSchemaInitializer.run(InternalSchemaInitializer.java:93) ~[doris-fe.jar:1.2-SNAPSHOT]
```


![image](https://github.com/apache/doris/assets/24907215/53ea2f34-1012-4f96-a2c9-be9c2b39b772)

This is because the computing node is also considered a replica in the
code, and the computing node does not have storage resources, resulting
in an error message.

---------

Co-authored-by: camby <104178625@qq.com>
@gavinchou gavinchou mentioned this pull request Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants