Skip to content

Conversation

@hust-hhb
Copy link
Contributor

@hust-hhb hust-hhb commented Jun 4, 2024

Proposed changes

Issue Number: close #xxx

My test show that get_delete_bitmap from metaservice may cost too much time while doing sync_rowset in cloud mode, which may lead to calculating delete bitmap slow on publish phase, we can use the existing delete bitmap cache to slove this problem.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@hust-hhb
Copy link
Contributor Author

hust-hhb commented Jun 4, 2024

run buildall

@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Jun 4, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.40% (9283/24824)
Line Coverage: 28.72% (76016/264659)
Region Coverage: 28.11% (39385/140105)
Branch Coverage: 24.52% (19920/81238)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0372ffc13f012d71c5999ffc6d687ce5a876c3a9_0372ffc13f012d71c5999ffc6d687ce5a876c3a9/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 41494 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0372ffc13f012d71c5999ffc6d687ce5a876c3a9, data reload: false

------ Round 1 ----------------------------------
q1	17618	4515	4419	4419
q2	2024	195	191	191
q3	10508	1222	1197	1197
q4	10190	745	915	745
q5	7504	2746	2733	2733
q6	224	133	132	132
q7	977	615	616	615
q8	9223	2171	2141	2141
q9	9114	6797	6798	6797
q10	9288	3905	3864	3864
q11	444	242	265	242
q12	443	230	234	230
q13	17206	3229	3203	3203
q14	292	227	229	227
q15	514	469	455	455
q16	474	369	387	369
q17	1011	638	605	605
q18	8523	7922	7851	7851
q19	7340	1483	1571	1483
q20	674	323	321	321
q21	5070	4032	3345	3345
q22	389	335	329	329
Total cold run time: 119050 ms
Total hot run time: 41494 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4596	4471	4463	4463
q2	366	259	259	259
q3	3147	2958	2957	2957
q4	1892	1577	1573	1573
q5	5448	5504	5487	5487
q6	228	128	128	128
q7	2176	1822	1819	1819
q8	3260	3397	3389	3389
q9	8614	8603	8668	8603
q10	4092	3869	3804	3804
q11	593	503	486	486
q12	772	642	621	621
q13	16661	3128	3129	3128
q14	301	265	275	265
q15	531	462	465	462
q16	477	435	438	435
q17	1835	1487	1478	1478
q18	8120	7662	7373	7373
q19	1728	1686	1618	1618
q20	3020	1784	1781	1781
q21	6275	4677	4676	4676
q22	619	531	539	531
Total cold run time: 74751 ms
Total hot run time: 55336 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 171359 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0372ffc13f012d71c5999ffc6d687ce5a876c3a9, data reload: false

query1	894	380	371	371
query2	6454	2439	2233	2233
query3	6654	210	210	210
query4	19746	17383	17168	17168
query5	4100	454	452	452
query6	251	157	159	157
query7	4584	295	290	290
query8	303	281	275	275
query9	8641	2492	2451	2451
query10	431	298	277	277
query11	10646	10370	10081	10081
query12	135	89	92	89
query13	1625	363	364	363
query14	9484	7574	6118	6118
query15	225	186	177	177
query16	7807	286	268	268
query17	1577	528	506	506
query18	1957	273	303	273
query19	203	154	150	150
query20	92	83	86	83
query21	209	130	124	124
query22	4475	4376	4189	4189
query23	33653	33003	32783	32783
query24	11062	2806	2789	2789
query25	641	373	377	373
query26	1582	156	157	156
query27	3012	329	328	328
query28	7571	2099	2119	2099
query29	1022	629	627	627
query30	295	152	163	152
query31	980	745	720	720
query32	94	56	56	56
query33	775	294	300	294
query34	920	487	484	484
query35	745	618	616	616
query36	1101	941	911	911
query37	163	71	72	71
query38	2869	2757	2764	2757
query39	864	788	792	788
query40	246	128	127	127
query41	58	57	54	54
query42	120	102	102	102
query43	607	548	528	528
query44	1255	726	753	726
query45	199	159	166	159
query46	1073	736	715	715
query47	1850	1780	1805	1780
query48	447	309	305	305
query49	1072	401	407	401
query50	781	387	389	387
query51	6873	6715	6900	6715
query52	100	90	87	87
query53	361	291	283	283
query54	985	451	440	440
query55	74	75	74	74
query56	274	253	267	253
query57	1169	1059	1084	1059
query58	242	248	248	248
query59	3216	3162	2901	2901
query60	276	274	264	264
query61	92	92	89	89
query62	643	456	455	455
query63	341	292	298	292
query64	9881	2203	1680	1680
query65	3202	3090	3131	3090
query66	1406	332	316	316
query67	15286	15079	14860	14860
query68	4524	552	539	539
query69	477	357	330	330
query70	1169	1040	1108	1040
query71	407	283	289	283
query72	7049	5307	5740	5307
query73	748	330	331	330
query74	5955	5529	5474	5474
query75	3335	2645	2649	2645
query76	2416	963	978	963
query77	448	304	292	292
query78	10486	10019	9728	9728
query79	1719	535	526	526
query80	2480	468	468	468
query81	608	233	217	217
query82	907	105	99	99
query83	329	181	233	181
query84	272	81	90	81
query85	1397	272	270	270
query86	395	310	304	304
query87	3319	3080	3131	3080
query88	3275	2459	2460	2459
query89	476	383	385	383
query90	1792	195	188	188
query91	124	97	96	96
query92	60	47	49	47
query93	1397	524	506	506
query94	1234	191	187	187
query95	400	316	316	316
query96	588	273	276	273
query97	3185	3011	3071	3011
query98	239	220	210	210
query99	1130	827	835	827
Total cold run time: 272311 ms
Total hot run time: 171359 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.95 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0372ffc13f012d71c5999ffc6d687ce5a876c3a9, data reload: false

query1	0.04	0.04	0.03
query2	0.08	0.03	0.04
query3	0.22	0.05	0.05
query4	1.69	0.07	0.07
query5	0.51	0.50	0.49
query6	1.12	0.73	0.72
query7	0.02	0.01	0.02
query8	0.06	0.04	0.04
query9	0.54	0.48	0.49
query10	0.54	0.56	0.54
query11	0.15	0.11	0.12
query12	0.14	0.12	0.12
query13	0.60	0.59	0.59
query14	0.76	0.77	0.77
query15	0.82	0.80	0.80
query16	0.34	0.37	0.38
query17	0.98	0.95	0.99
query18	0.24	0.23	0.24
query19	1.77	1.67	1.66
query20	0.02	0.01	0.02
query21	15.74	0.67	0.66
query22	4.45	8.22	1.32
query23	18.28	1.33	1.31
query24	1.50	0.31	0.21
query25	0.15	0.09	0.08
query26	0.26	0.18	0.17
query27	0.08	0.07	0.07
query28	13.43	1.01	1.01
query29	13.47	3.41	3.29
query30	0.24	0.06	0.06
query31	2.88	0.40	0.39
query32	3.24	0.47	0.48
query33	2.88	2.89	2.90
query34	17.02	4.43	4.41
query35	4.52	4.50	4.54
query36	0.68	0.46	0.46
query37	0.18	0.15	0.15
query38	0.15	0.15	0.15
query39	0.04	0.03	0.03
query40	0.17	0.14	0.15
query41	0.09	0.05	0.04
query42	0.05	0.04	0.04
query43	0.04	0.03	0.04
Total cold run time: 110.18 s
Total hot run time: 29.95 s

@hust-hhb hust-hhb force-pushed the use-delete-bitmap-cache branch from 0372ffc to 753f892 Compare June 13, 2024 10:08
@hust-hhb
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@hust-hhb
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40248 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 6f2f281fe96d9979739ead9055ccb14eaa7c34f2, data reload: false

------ Round 1 ----------------------------------
q1	17877	5258	4407	4407
q2	3043	205	200	200
q3	11267	1159	1105	1105
q4	10312	742	804	742
q5	7656	2756	2656	2656
q6	224	145	141	141
q7	983	628	619	619
q8	9257	2122	2111	2111
q9	8925	6479	6543	6479
q10	8917	3760	3734	3734
q11	450	247	250	247
q12	454	240	250	240
q13	17773	3041	2974	2974
q14	269	222	232	222
q15	524	494	471	471
q16	517	389	380	380
q17	986	731	726	726
q18	8113	7493	7416	7416
q19	4308	1632	1537	1537
q20	646	312	340	312
q21	4998	4002	3191	3191
q22	384	338	343	338
Total cold run time: 117883 ms
Total hot run time: 40248 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4608	4282	4239	4239
q2	373	278	271	271
q3	3040	2744	2799	2744
q4	1882	1654	1666	1654
q5	5321	5306	5304	5304
q6	219	131	135	131
q7	2177	1788	1747	1747
q8	3231	3337	3334	3334
q9	8364	8362	8343	8343
q10	3933	3711	3735	3711
q11	605	491	479	479
q12	767	622	576	576
q13	16422	3021	3027	3021
q14	299	261	264	261
q15	517	478	470	470
q16	491	431	418	418
q17	1794	1490	1490	1490
q18	7795	7570	7436	7436
q19	1735	1666	1504	1504
q20	2005	1785	1760	1760
q21	5070	4853	4719	4719
q22	625	531	546	531
Total cold run time: 71273 ms
Total hot run time: 54143 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.44% (8987/24664)
Line Coverage: 28.01% (73665/262996)
Region Coverage: 27.49% (38268/139219)
Branch Coverage: 24.19% (19512/80664)
Coverage Report: http://coverage.selectdb-in.cc/coverage/6f2f281fe96d9979739ead9055ccb14eaa7c34f2_6f2f281fe96d9979739ead9055ccb14eaa7c34f2/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 172534 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 6f2f281fe96d9979739ead9055ccb14eaa7c34f2, data reload: false

query1	934	385	377	377
query2	6465	2333	2209	2209
query3	6655	209	212	209
query4	18837	17357	17087	17087
query5	4167	514	475	475
query6	254	157	161	157
query7	4592	312	303	303
query8	320	283	268	268
query9	8448	2442	2392	2392
query10	612	303	283	283
query11	10522	10022	10137	10022
query12	134	99	83	83
query13	1637	363	366	363
query14	9946	7314	7059	7059
query15	219	184	192	184
query16	7821	264	263	263
query17	1886	544	511	511
query18	1948	268	271	268
query19	186	154	160	154
query20	94	84	80	80
query21	207	129	125	125
query22	4334	4119	4097	4097
query23	33700	33036	33082	33036
query24	11884	2789	2851	2789
query25	662	351	359	351
query26	1797	153	154	153
query27	3003	326	312	312
query28	7618	2077	2043	2043
query29	1178	631	608	608
query30	264	147	150	147
query31	934	701	733	701
query32	95	52	58	52
query33	767	280	276	276
query34	972	482	463	463
query35	742	620	617	617
query36	1108	938	975	938
query37	286	71	73	71
query38	2876	2746	2748	2746
query39	867	797	797	797
query40	282	129	126	126
query41	56	54	50	50
query42	124	104	107	104
query43	585	549	540	540
query44	1199	736	741	736
query45	202	167	164	164
query46	1083	731	720	720
query47	1869	1775	1758	1758
query48	372	296	298	296
query49	1194	402	394	394
query50	761	388	389	388
query51	6765	6762	6759	6759
query52	106	91	100	91
query53	356	294	293	293
query54	977	435	448	435
query55	79	73	75	73
query56	291	260	261	260
query57	1133	1022	1056	1022
query58	255	250	253	250
query59	3256	3065	3259	3065
query60	299	274	302	274
query61	105	89	92	89
query62	677	438	447	438
query63	324	294	305	294
query64	9847	2212	1767	1767
query65	3202	3118	3147	3118
query66	1392	372	334	334
query67	15584	15039	14900	14900
query68	4598	540	537	537
query69	446	309	295	295
query70	1201	1093	1117	1093
query71	405	272	273	272
query72	7167	5807	5610	5610
query73	756	321	327	321
query74	6072	5573	5424	5424
query75	3493	2675	2649	2649
query76	2934	1033	879	879
query77	446	298	302	298
query78	10497	9746	9657	9657
query79	2461	516	520	516
query80	964	494	473	473
query81	581	222	239	222
query82	731	103	100	100
query83	237	173	184	173
query84	249	87	82	82
query85	1990	348	373	348
query86	495	323	334	323
query87	3258	3098	3077	3077
query88	4192	2344	2351	2344
query89	479	381	372	372
query90	1817	194	196	194
query91	132	99	97	97
query92	68	50	50	50
query93	2350	514	513	513
query94	1257	192	188	188
query95	418	321	324	321
query96	584	265	267	265
query97	3219	3077	3002	3002
query98	230	200	191	191
query99	1290	841	853	841
Total cold run time: 276363 ms
Total hot run time: 172534 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.58 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 6f2f281fe96d9979739ead9055ccb14eaa7c34f2, data reload: false

query1	0.03	0.03	0.04
query2	0.08	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.07	0.07
query5	0.50	0.48	0.47
query6	1.11	0.74	0.72
query7	0.01	0.01	0.01
query8	0.05	0.04	0.05
query9	0.54	0.49	0.48
query10	0.53	0.54	0.53
query11	0.16	0.11	0.11
query12	0.15	0.12	0.12
query13	0.60	0.60	0.61
query14	0.80	0.78	0.80
query15	0.83	0.81	0.82
query16	0.36	0.36	0.37
query17	0.99	1.03	0.99
query18	0.22	0.25	0.25
query19	1.76	1.75	1.73
query20	0.01	0.01	0.01
query21	15.44	0.66	0.65
query22	3.97	6.80	1.88
query23	18.26	1.37	1.24
query24	2.18	0.23	0.21
query25	0.14	0.09	0.09
query26	0.27	0.18	0.17
query27	0.08	0.08	0.08
query28	13.18	1.01	1.00
query29	12.63	3.25	3.24
query30	0.27	0.06	0.06
query31	2.86	0.40	0.39
query32	3.27	0.48	0.47
query33	2.91	2.97	2.90
query34	16.90	4.45	4.45
query35	4.49	4.51	4.55
query36	0.65	0.46	0.49
query37	0.19	0.16	0.15
query38	0.15	0.15	0.14
query39	0.05	0.03	0.04
query40	0.19	0.17	0.14
query41	0.09	0.05	0.05
query42	0.06	0.04	0.04
query43	0.05	0.03	0.04
Total cold run time: 108.9 s
Total hot run time: 30.58 s

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 14, 2024
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 9aa3d1e into apache:master Jun 14, 2024
dataroaring pushed a commit that referenced this pull request Jun 21, 2024
…35856)


My test show that get_delete_bitmap from metaservice may cost too much
time while doing sync_rowset in cloud mode, which may lead to
calculating delete bitmap slow on publish phase, we can use the existing
delete bitmap cache to slove this problem.
dataroaring pushed a commit that referenced this pull request Sep 26, 2024
…map from cache failed (#41309)

## Proposed changes

Issue Number: close #xxx

To accelerate the speed of sync latest delete bitmap, #35856 try to get
the delete bitmap from `CloudTxnDeleteBitmapCache` first.
In the following situation, compaction may get empty delete bitmap and
cause duplicate key:
1. compaction started
2. several load succeed during the compaction
3. compaction finished data merging and start to calculate delete bitmap
generated by latest load tasks
4. compaction try to sync rowset and delete bitmap, it get delete bitmap
first from `CloudTxnDeleteBitmapCache`
5. `CloudTxnDeleteBitmapCache::get_delete_bitmap()` can get txn infos
from it's inner map, but cache missed when it try to get delete bitmap
from LRU cache, it don't report error but returned an empty delete
bitmap
6. compaction used wrong delete bitmap, duplicate key occured.
dataroaring pushed a commit that referenced this pull request Sep 26, 2024
…map from cache failed (#41309)

## Proposed changes

Issue Number: close #xxx

To accelerate the speed of sync latest delete bitmap, #35856 try to get
the delete bitmap from `CloudTxnDeleteBitmapCache` first.
In the following situation, compaction may get empty delete bitmap and
cause duplicate key:
1. compaction started
2. several load succeed during the compaction
3. compaction finished data merging and start to calculate delete bitmap
generated by latest load tasks
4. compaction try to sync rowset and delete bitmap, it get delete bitmap
first from `CloudTxnDeleteBitmapCache`
5. `CloudTxnDeleteBitmapCache::get_delete_bitmap()` can get txn infos
from it's inner map, but cache missed when it try to get delete bitmap
from LRU cache, it don't report error but returned an empty delete
bitmap
6. compaction used wrong delete bitmap, duplicate key occured.
cjj2010 pushed a commit to cjj2010/doris that referenced this pull request Oct 12, 2024
…map from cache failed (apache#41309)

## Proposed changes

Issue Number: close #xxx

To accelerate the speed of sync latest delete bitmap, apache#35856 try to get
the delete bitmap from `CloudTxnDeleteBitmapCache` first.
In the following situation, compaction may get empty delete bitmap and
cause duplicate key:
1. compaction started
2. several load succeed during the compaction
3. compaction finished data merging and start to calculate delete bitmap
generated by latest load tasks
4. compaction try to sync rowset and delete bitmap, it get delete bitmap
first from `CloudTxnDeleteBitmapCache`
5. `CloudTxnDeleteBitmapCache::get_delete_bitmap()` can get txn infos
from it's inner map, but cache missed when it try to get delete bitmap
from LRU cache, it don't report error but returned an empty delete
bitmap
6. compaction used wrong delete bitmap, duplicate key occured.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants