Skip to content

Conversation

@zhannngchen
Copy link
Contributor

@zhannngchen zhannngchen commented Apr 11, 2024

Proposed changes

Issue Number: close #xxx

#29386 fixed an issue that may cause duplicate key, but it breaks some schema change cases.
Then #29733 reverted some change of #29386, which also brought back this issue.

issue description:

  1. all rowsets written to the new tablet with TABLET_NOTREADY state, are skipped the step of delete bitmap calculation
  2. after the new tablet created, double write start writing rowsets to new tablet
  3. alter process will calculate a version on base tablet, for the rowsets that need to be converted to new schema, and clean all rowsets older than that version on the new tablet, before start converting the historical data
  4. but the max_version of new tablet may lag behind base tablet, e.g. new tablet's max version is 6, and base tablet's max convert version is 10. So after the rowset clear operation of step3, double write may still write version 7,8,9,10 into new tablet
  5. after the historical data conversion, alter process will add new converted rowsets into new tablet, but it will fail while adding rowsets between version 7 and 10, because these version already existed
  6. the delete bitmap of rowsets between 7 and 10 is not calculated, so duplicate key happened.

solution:

  1. [fix](merge-on-write) fix schema change may result in delete bitmap incorrect #29386 return failure when such situation happened, but it will break some schema change cases
  2. In this PR, I propose to update the variable real_alter_version to 6 (see previous case), recalculate delete bitmaps on all rowsets that older than real_alter_version

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zhannngchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.58% (8903/25020)
Line Coverage: 27.31% (73121/267781)
Region Coverage: 26.42% (37821/143136)
Branch Coverage: 23.20% (19275/83082)
Coverage Report: http://coverage.selectdb-in.cc/coverage/96b3fda38507a845f289e76b2029d94035ad25f6_96b3fda38507a845f289e76b2029d94035ad25f6/report/index.html

liaoxin01
liaoxin01 previously approved these changes Apr 11, 2024
Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 11, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@luwei16
Copy link
Contributor

luwei16 commented Apr 11, 2024

LGTM

@xiaokang xiaokang added the p0_w label Apr 11, 2024
@zhannngchen
Copy link
Contributor Author

run p0

@zhannngchen
Copy link
Contributor Author

run p1

@zhannngchen zhannngchen force-pushed the fix-sc-mow-inconsistent branch from 96b3fda to dd2744d Compare April 12, 2024 05:37
@zhannngchen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.59% (8902/25012)
Line Coverage: 27.31% (73091/267649)
Region Coverage: 26.43% (37794/142978)
Branch Coverage: 23.19% (19261/83058)
Coverage Report: http://coverage.selectdb-in.cc/coverage/dd2744d39eacfa9fe2bc75103376a709ebfb7255_dd2744d39eacfa9fe2bc75103376a709ebfb7255/report/index.html

@zhannngchen zhannngchen force-pushed the fix-sc-mow-inconsistent branch from dd2744d to 921c9e5 Compare April 12, 2024 09:16
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@zhannngchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.58% (8902/25017)
Line Coverage: 27.31% (73096/267678)
Region Coverage: 26.44% (37803/142984)
Branch Coverage: 23.19% (19263/83062)
Coverage Report: http://coverage.selectdb-in.cc/coverage/921c9e5d78a8f2c14e1b72a81d8e0613b718c41e_921c9e5d78a8f2c14e1b72a81d8e0613b718c41e/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38733 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 921c9e5d78a8f2c14e1b72a81d8e0613b718c41e, data reload: false

------ Round 1 ----------------------------------
q1	17604	4393	4251	4251
q2	2005	187	182	182
q3	10854	1327	1120	1120
q4	11274	820	835	820
q5	7550	2812	2702	2702
q6	219	144	133	133
q7	1026	653	620	620
q8	9432	2095	2072	2072
q9	8044	6747	6615	6615
q10	8584	3530	3508	3508
q11	462	229	231	229
q12	461	215	213	213
q13	18666	2978	2947	2947
q14	274	224	228	224
q15	520	477	486	477
q16	513	383	386	383
q17	981	681	677	677
q18	7436	6938	6704	6704
q19	1710	1538	1522	1522
q20	674	304	313	304
q21	3521	2731	2812	2731
q22	365	299	300	299
Total cold run time: 112175 ms
Total hot run time: 38733 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4241	4212	4211	4211
q2	380	273	264	264
q3	2953	2738	2805	2738
q4	1884	1596	1574	1574
q5	5364	5315	5276	5276
q6	206	123	123	123
q7	2210	1878	1907	1878
q8	3196	3366	3327	3327
q9	8645	8532	8567	8532
q10	3875	3689	3765	3689
q11	596	499	499	499
q12	767	613	601	601
q13	16399	2938	2933	2933
q14	301	272	293	272
q15	515	470	463	463
q16	465	421	418	418
q17	1750	1458	1432	1432
q18	7699	7378	7507	7378
q19	1646	1514	1533	1514
q20	1950	1762	1745	1745
q21	4912	4839	4817	4817
q22	542	477	465	465
Total cold run time: 70496 ms
Total hot run time: 54149 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182934 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 921c9e5d78a8f2c14e1b72a81d8e0613b718c41e, data reload: false

query1	899	1128	1108	1108
query2	7823	2511	2503	2503
query3	6654	210	217	210
query4	37533	21827	21481	21481
query5	4151	403	419	403
query6	245	187	196	187
query7	4048	287	284	284
query8	221	179	188	179
query9	5786	2316	2326	2316
query10	556	256	251	251
query11	14584	14138	14177	14138
query12	143	98	90	90
query13	999	368	365	365
query14	9057	6821	6684	6684
query15	207	186	183	183
query16	7167	258	258	258
query17	1683	588	564	564
query18	1520	296	279	279
query19	201	151	151	151
query20	96	92	89	89
query21	208	125	130	125
query22	5011	4902	4934	4902
query23	33776	32929	32855	32855
query24	12641	2963	2913	2913
query25	544	369	377	369
query26	1864	153	156	153
query27	3062	309	312	309
query28	7678	2032	2008	2008
query29	855	605	576	576
query30	312	162	161	161
query31	934	693	702	693
query32	57	52	55	52
query33	601	246	235	235
query34	883	465	480	465
query35	827	692	699	692
query36	981	896	861	861
query37	279	70	72	70
query38	3554	3394	3424	3394
query39	1571	1534	1524	1524
query40	275	121	124	121
query41	46	44	49	44
query42	103	95	94	94
query43	568	521	549	521
query44	1357	695	700	695
query45	287	253	253	253
query46	1051	721	721	721
query47	1926	1862	1915	1862
query48	376	287	296	287
query49	1165	372	366	366
query50	748	387	389	387
query51	6635	6570	6608	6570
query52	103	95	90	90
query53	368	284	277	277
query54	283	237	232	232
query55	78	74	73	73
query56	253	227	228	227
query57	1247	1151	1137	1137
query58	231	206	207	206
query59	3287	3203	3082	3082
query60	254	240	246	240
query61	109	112	111	111
query62	638	446	444	444
query63	303	275	275	275
query64	4851	3915	3915	3915
query65	3104	3021	3013	3013
query66	1368	340	320	320
query67	15782	14932	14949	14932
query68	8324	541	553	541
query69	602	315	310	310
query70	1235	1160	1090	1090
query71	489	276	274	274
query72	6643	2752	2472	2472
query73	818	321	319	319
query74	7110	6336	6483	6336
query75	3444	2324	2318	2318
query76	4430	1078	1122	1078
query77	616	246	243	243
query78	10879	10123	10115	10115
query79	9669	514	530	514
query80	2124	433	407	407
query81	520	234	224	224
query82	1274	92	95	92
query83	282	171	175	171
query84	261	85	85	85
query85	1178	291	271	271
query86	462	307	293	293
query87	3674	3478	3544	3478
query88	6049	2267	2265	2265
query89	538	368	367	367
query90	1972	175	179	175
query91	124	95	98	95
query92	56	49	46	46
query93	7051	524	503	503
query94	1202	177	180	177
query95	374	296	293	293
query96	579	259	255	255
query97	2680	2472	2477	2472
query98	231	214	210	210
query99	1166	883	870	870
Total cold run time: 312063 ms
Total hot run time: 182934 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.48 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 921c9e5d78a8f2c14e1b72a81d8e0613b718c41e, data reload: false

query1	0.03	0.04	0.03
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.07	0.06
query5	0.50	0.48	0.48
query6	1.12	0.65	0.64
query7	0.02	0.02	0.01
query8	0.05	0.04	0.04
query9	0.54	0.49	0.49
query10	0.55	0.55	0.55
query11	0.15	0.11	0.11
query12	0.14	0.13	0.12
query13	0.60	0.58	0.58
query14	0.76	0.76	0.80
query15	0.84	0.80	0.81
query16	0.36	0.37	0.36
query17	0.99	1.01	1.02
query18	0.22	0.21	0.26
query19	1.78	1.67	1.82
query20	0.01	0.01	0.01
query21	15.48	0.64	0.64
query22	4.12	7.21	2.06
query23	18.32	1.46	1.37
query24	1.79	0.25	0.21
query25	0.14	0.08	0.08
query26	0.27	0.16	0.16
query27	0.08	0.08	0.08
query28	13.39	1.02	0.97
query29	12.62	3.27	3.23
query30	0.26	0.06	0.06
query31	2.85	0.37	0.38
query32	3.29	0.48	0.47
query33	2.80	2.84	2.86
query34	17.05	4.48	4.43
query35	4.46	4.49	4.44
query36	0.65	0.47	0.46
query37	0.19	0.15	0.16
query38	0.15	0.14	0.15
query39	0.04	0.04	0.04
query40	0.17	0.15	0.14
query41	0.10	0.04	0.04
query42	0.06	0.05	0.05
query43	0.05	0.04	0.03
Total cold run time: 108.96 s
Total hot run time: 30.48 s

@doris-robot
Copy link

Load test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'

Load test result on commit 921c9e5d78a8f2c14e1b72a81d8e0613b718c41e with default session variables
Stream load json:         19 seconds loaded 2358488459 Bytes, about 118 MB/s
Stream load orc:          59 seconds loaded 1101869774 Bytes, about 17 MB/s
Stream load parquet:      33 seconds loaded 861443392 Bytes, about 24 MB/s
Insert into select:       13.4 seconds inserted 10000000 Rows, about 746K ops/s

@zhannngchen
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Apr 12, 2024
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 35.60% (8905/25017)
Line Coverage: 27.32% (73124/267684)
Region Coverage: 26.45% (37830/143007)
Branch Coverage: 23.20% (19274/83066)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9f8b878541a5642d97d5d40609545bfc738306e3_9f8b878541a5642d97d5d40609545bfc738306e3/report/index.html

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 13, 2024
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@zhannngchen zhannngchen merged commit 8a94b01 into apache:master Apr 13, 2024
zhannngchen added a commit to zhannngchen/incubator-doris that referenced this pull request Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.9-merged dev/3.0.0-merged p0_w reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants