Skip to content

Conversation

@zhannngchen
Copy link
Contributor

@zhannngchen zhannngchen commented Jun 28, 2024

Proposed changes

Issue Number: close #xxx

introduced by #31268

full clone failure may produce duplicate keys in mow table
the bug would be triggered in the following condition:

  1. replica 0 miss version
  2. replica 0 try to do full clone from other replicas
  3. the full clone failed and the delete bitmap is overrided incorrectly
  4. replica 0 try to do incremental clone again and this time the clone succeed
  5. incremental clone can't fix the delete bitmap overrided by previous failed full clone
  6. duplicate key occurred

solution:
for full clone, don't override the delete bitmap, use merge() method instead.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zhannngchen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 39816 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b3ce80c177b3584ac509c8c60af876485a4ac8a0, data reload: false

------ Round 1 ----------------------------------
q1	17603	4632	4287	4287
q2	2017	197	199	197
q3	10471	1177	1115	1115
q4	10202	819	870	819
q5	7464	2693	2604	2604
q6	216	138	136	136
q7	955	602	602	602
q8	9218	2072	2077	2072
q9	8780	6491	6469	6469
q10	8990	3735	3725	3725
q11	461	247	236	236
q12	507	238	228	228
q13	17777	3028	3001	3001
q14	271	236	221	221
q15	520	466	483	466
q16	494	391	381	381
q17	962	745	675	675
q18	7966	7415	7284	7284
q19	5528	1506	1541	1506
q20	669	323	331	323
q21	4889	3131	3319	3131
q22	399	338	351	338
Total cold run time: 116359 ms
Total hot run time: 39816 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4350	4260	4213	4213
q2	382	263	275	263
q3	2993	2833	2926	2833
q4	1942	1739	1648	1648
q5	5681	5483	5488	5483
q6	224	134	131	131
q7	2191	1905	1854	1854
q8	3243	3421	3423	3421
q9	8703	8666	8866	8666
q10	4113	3903	3736	3736
q11	603	505	531	505
q12	823	649	631	631
q13	17198	3197	3176	3176
q14	300	267	295	267
q15	529	475	485	475
q16	496	440	428	428
q17	1815	1526	1517	1517
q18	8144	7970	7754	7754
q19	1849	1716	1613	1613
q20	2081	1853	1832	1832
q21	5294	4963	4738	4738
q22	633	590	542	542
Total cold run time: 73587 ms
Total hot run time: 55726 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 174671 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b3ce80c177b3584ac509c8c60af876485a4ac8a0, data reload: false

query1	920	387	370	370
query2	6450	2510	2348	2348
query3	6629	209	212	209
query4	18958	17470	17288	17288
query5	3693	486	502	486
query6	264	184	176	176
query7	4590	293	300	293
query8	327	308	299	299
query9	8771	2459	2459	2459
query10	554	294	280	280
query11	10637	10093	10144	10093
query12	121	88	84	84
query13	1655	373	380	373
query14	10300	7694	7732	7694
query15	251	186	194	186
query16	7943	275	270	270
query17	1880	567	537	537
query18	2073	283	290	283
query19	201	168	157	157
query20	90	82	88	82
query21	261	135	124	124
query22	4410	4085	3973	3973
query23	33912	33577	33580	33577
query24	10556	2988	2770	2770
query25	582	390	373	373
query26	712	154	155	154
query27	2260	321	333	321
query28	6081	2208	2204	2204
query29	890	628	622	622
query30	267	160	157	157
query31	978	755	754	754
query32	100	60	58	58
query33	687	298	286	286
query34	878	478	492	478
query35	734	640	634	634
query36	1102	1016	996	996
query37	155	75	73	73
query38	2949	2898	2863	2863
query39	892	836	825	825
query40	214	128	123	123
query41	58	52	54	52
query42	119	102	104	102
query43	604	561	547	547
query44	1114	762	743	743
query45	193	168	168	168
query46	1082	708	737	708
query47	1857	1771	1771	1771
query48	379	303	299	299
query49	839	408	415	408
query50	766	380	385	380
query51	7033	6676	6720	6676
query52	98	110	89	89
query53	360	289	295	289
query54	872	451	440	440
query55	73	73	74	73
query56	290	257	273	257
query57	1119	1031	1046	1031
query58	259	245	239	239
query59	3304	3218	3186	3186
query60	327	275	280	275
query61	94	94	91	91
query62	584	435	453	435
query63	322	293	294	293
query64	8505	2265	1749	1749
query65	3194	3105	3161	3105
query66	746	336	386	336
query67	15351	14932	14947	14932
query68	4608	531	535	531
query69	476	326	320	320
query70	1123	1074	1182	1074
query71	392	271	285	271
query72	8024	5279	6096	5279
query73	751	329	320	320
query74	5976	5518	5519	5518
query75	3445	2682	2631	2631
query76	2462	979	924	924
query77	665	313	309	309
query78	10532	9828	9770	9770
query79	2332	514	536	514
query80	1134	475	475	475
query81	587	223	222	222
query82	855	109	111	109
query83	340	173	177	173
query84	263	90	87	87
query85	1890	293	282	282
query86	487	327	318	318
query87	3288	3116	3086	3086
query88	4213	2378	2338	2338
query89	477	400	394	394
query90	1759	191	191	191
query91	135	169	101	101
query92	61	49	55	49
query93	2398	517	505	505
query94	1177	189	187	187
query95	410	323	323	323
query96	598	266	271	266
query97	3221	3070	3074	3070
query98	228	202	195	195
query99	1114	866	833	833
Total cold run time: 269890 ms
Total hot run time: 174671 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.97 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b3ce80c177b3584ac509c8c60af876485a4ac8a0, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.22	0.04	0.05
query4	1.67	0.07	0.07
query5	0.53	0.50	0.50
query6	1.13	0.72	0.72
query7	0.02	0.01	0.02
query8	0.06	0.04	0.04
query9	0.54	0.51	0.50
query10	0.54	0.54	0.55
query11	0.15	0.12	0.12
query12	0.15	0.12	0.12
query13	0.59	0.59	0.60
query14	0.75	0.77	0.78
query15	0.84	0.81	0.82
query16	0.35	0.36	0.37
query17	0.96	0.94	1.01
query18	0.25	0.22	0.27
query19	1.77	1.70	1.70
query20	0.02	0.01	0.01
query21	15.43	0.75	0.66
query22	3.56	8.42	1.56
query23	18.24	1.35	1.18
query24	2.09	0.22	0.22
query25	0.15	0.09	0.09
query26	0.27	0.18	0.18
query27	0.08	0.08	0.08
query28	13.21	1.03	1.00
query29	12.62	3.26	3.30
query30	0.26	0.06	0.06
query31	2.89	0.38	0.39
query32	3.25	0.47	0.48
query33	2.88	2.81	2.93
query34	16.95	4.41	4.41
query35	4.47	4.44	4.48
query36	0.65	0.46	0.49
query37	0.17	0.15	0.16
query38	0.15	0.15	0.14
query39	0.04	0.03	0.03
query40	0.18	0.15	0.14
query41	0.09	0.05	0.04
query42	0.05	0.05	0.04
query43	0.04	0.04	0.04
Total cold run time: 108.38 s
Total hot run time: 29.97 s

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 1, 2024
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2024

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit c972ea5 into apache:master Jul 2, 2024
dataroaring pushed a commit that referenced this pull request Jul 2, 2024
…ur (#37001)

## Proposed changes

Issue Number: close #xxx

introduced by #31268

full clone failure may produce duplicate keys in mow table
the bug would be triggered in the following condition:
1. replica 0 miss version
2. replica 0 try to do full clone from other replicas
3. the full clone failed and the delete bitmap is overrided incorrectly
4. replica 0 try to do incremental clone again and this time the clone
succeed
5. incremental clone can't fix the delete bitmap overrided by previous
failed full clone
6. duplicate key occurred

solution:
for full clone, don't override the delete bitmap, use `merge()` method
instead.
zhannngchen added a commit to zhannngchen/incubator-doris that referenced this pull request Jul 3, 2024
…ur (apache#37001)

## Proposed changes

Issue Number: close #xxx

introduced by apache#31268

full clone failure may produce duplicate keys in mow table
the bug would be triggered in the following condition:
1. replica 0 miss version
2. replica 0 try to do full clone from other replicas
3. the full clone failed and the delete bitmap is overrided incorrectly
4. replica 0 try to do incremental clone again and this time the clone
succeed
5. incremental clone can't fix the delete bitmap overrided by previous
failed full clone
6. duplicate key occurred

solution:
for full clone, don't override the delete bitmap, use `merge()` method
instead.
zhannngchen added a commit to zhannngchen/incubator-doris that referenced this pull request Jul 3, 2024
…ur (apache#37001)

## Proposed changes

Issue Number: close #xxx

introduced by apache#31268

full clone failure may produce duplicate keys in mow table
the bug would be triggered in the following condition:
1. replica 0 miss version
2. replica 0 try to do full clone from other replicas
3. the full clone failed and the delete bitmap is overrided incorrectly
4. replica 0 try to do incremental clone again and this time the clone
succeed
5. incremental clone can't fix the delete bitmap overrided by previous
failed full clone
6. duplicate key occurred

solution:
for full clone, don't override the delete bitmap, use `merge()` method
instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants