Skip to content

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Jun 30, 2025

What problem does this PR solve?

#49710 add a check in MS to forbid stale calc delete bitmap task to wrongly update delete bitmaps in MS. But this may lead to load fail due to the check on FE.
This PR let FE retry to commit the txn when encounter stale calc delete bitmap response regardless of task's status code to avoid the problem.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jun 30, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33984 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit bdaadf478893751a2175825200d141beba9197fa, data reload: false

------ Round 1 ----------------------------------
q1	17596	5191	5044	5044
q2	1928	285	191	191
q3	10310	1352	738	738
q4	10210	1028	531	531
q5	7542	2376	2367	2367
q6	178	161	131	131
q7	903	772	595	595
q8	9308	1299	1065	1065
q9	7060	5111	5041	5041
q10	6894	2379	1962	1962
q11	471	283	281	281
q12	346	368	209	209
q13	17763	3762	3123	3123
q14	228	238	212	212
q15	559	484	490	484
q16	434	436	383	383
q17	589	867	357	357
q18	7557	7212	7148	7148
q19	1218	981	561	561
q20	335	349	230	230
q21	4026	3172	2353	2353
q22	1072	1006	978	978
Total cold run time: 106527 ms
Total hot run time: 33984 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5159	5190	5133	5133
q2	254	334	219	219
q3	2208	2648	2284	2284
q4	1402	1770	1332	1332
q5	4235	4469	4377	4377
q6	210	165	129	129
q7	1974	1911	1833	1833
q8	2655	2588	2597	2588
q9	7157	7170	7142	7142
q10	3089	3268	2795	2795
q11	584	512	488	488
q12	680	781	594	594
q13	3619	3932	3308	3308
q14	273	292	286	286
q15	514	472	479	472
q16	428	470	464	464
q17	1202	1574	1342	1342
q18	7600	7209	7126	7126
q19	787	827	836	827
q20	1934	1960	1863	1863
q21	4924	4497	4295	4295
q22	1085	1039	996	996
Total cold run time: 51973 ms
Total hot run time: 49893 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184804 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit bdaadf478893751a2175825200d141beba9197fa, data reload: false

query1	980	408	389	389
query2	6515	1738	1680	1680
query3	6763	213	221	213
query4	26682	24163	23318	23318
query5	4699	600	434	434
query6	308	231	199	199
query7	4622	522	295	295
query8	269	218	213	213
query9	8627	2665	2648	2648
query10	469	325	278	278
query11	15480	15102	14951	14951
query12	157	106	104	104
query13	1657	524	404	404
query14	9071	5718	5644	5644
query15	202	177	174	174
query16	7517	635	482	482
query17	1166	688	551	551
query18	1988	399	291	291
query19	186	192	153	153
query20	125	115	104	104
query21	209	123	103	103
query22	4037	4112	3937	3937
query23	33983	32979	32964	32964
query24	8446	2372	2367	2367
query25	512	472	382	382
query26	1220	269	143	143
query27	2752	508	345	345
query28	4277	2149	2115	2115
query29	738	561	426	426
query30	283	215	184	184
query31	935	843	749	749
query32	70	63	60	60
query33	548	348	329	329
query34	794	826	519	519
query35	813	794	736	736
query36	948	983	872	872
query37	112	98	78	78
query38	4083	4115	4085	4085
query39	1464	1407	1420	1407
query40	207	115	105	105
query41	55	52	52	52
query42	127	134	105	105
query43	486	511	478	478
query44	1317	832	801	801
query45	175	171	162	162
query46	850	1006	630	630
query47	1732	1808	1753	1753
query48	391	417	309	309
query49	731	479	385	385
query50	633	683	408	408
query51	4139	4190	4086	4086
query52	116	110	101	101
query53	221	255	191	191
query54	581	561	492	492
query55	78	117	78	78
query56	291	293	298	293
query57	1193	1182	1147	1147
query58	261	245	250	245
query59	2662	2728	2601	2601
query60	326	315	299	299
query61	118	113	125	113
query62	803	706	651	651
query63	223	183	187	183
query64	4265	999	650	650
query65	4247	4143	4220	4143
query66	1088	412	308	308
query67	15911	15466	15313	15313
query68	7022	893	534	534
query69	488	300	278	278
query70	1152	1126	1113	1113
query71	397	323	288	288
query72	5624	4764	4817	4764
query73	659	638	343	343
query74	8944	9201	9005	9005
query75	3189	3199	2684	2684
query76	3232	1135	707	707
query77	455	380	291	291
query78	10026	10168	9346	9346
query79	2443	826	571	571
query80	653	498	429	429
query81	552	251	219	219
query82	477	126	100	100
query83	248	263	235	235
query84	255	105	82	82
query85	786	356	317	317
query86	381	328	284	284
query87	4477	4547	4353	4353
query88	3787	2242	2223	2223
query89	378	315	279	279
query90	1902	205	201	201
query91	139	139	107	107
query92	71	59	55	55
query93	2201	959	587	587
query94	681	409	294	294
query95	365	285	280	280
query96	510	578	275	275
query97	2697	2798	2662	2662
query98	232	208	202	202
query99	1370	1389	1288	1288
Total cold run time: 272727 ms
Total hot run time: 184804 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.22 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit bdaadf478893751a2175825200d141beba9197fa, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.24	0.07	0.08
query4	1.62	0.11	0.11
query5	0.45	0.45	0.42
query6	1.16	0.68	0.65
query7	0.02	0.02	0.01
query8	0.05	0.03	0.03
query9	0.60	0.50	0.52
query10	0.57	0.57	0.56
query11	0.16	0.11	0.11
query12	0.15	0.12	0.11
query13	0.63	0.60	0.62
query14	0.79	0.81	0.81
query15	0.91	0.86	0.85
query16	0.38	0.39	0.39
query17	1.02	1.08	1.07
query18	0.22	0.21	0.21
query19	1.96	1.89	1.81
query20	0.01	0.02	0.01
query21	15.41	0.88	0.53
query22	0.75	1.31	0.89
query23	14.71	1.39	0.60
query24	9.01	0.67	0.31
query25	0.30	0.09	0.09
query26	0.60	0.18	0.15
query27	0.06	0.05	0.04
query28	8.62	0.93	0.43
query29	12.61	4.00	3.33
query30	0.25	0.10	0.06
query31	2.83	0.61	0.38
query32	3.26	0.56	0.47
query33	3.10	3.05	3.19
query34	15.96	5.44	4.79
query35	4.79	4.87	4.88
query36	0.70	0.50	0.49
query37	0.09	0.07	0.07
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.15	0.14
query41	0.08	0.02	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.03
Total cold run time: 104.51 s
Total hot run time: 29.22 s

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 1, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Jul 1, 2025

PR approved by anyone and no changes requested.

calcDeleteBitmapTask.countDownToZero(request.getTaskStatus().getStatusCode(),
"backend: " + task.getBackendId() + ", error_tablet_size: " + request.getErrorTabletIdsSize()
+ ", error_tablets: " + request.getErrorTabletIds()
+ ", err_msg: " + request.getTaskStatus().getErrorMsgs().toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be willing to add a test case?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Jul 3, 2025

run p0

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 7efcf53 into apache:master Jul 7, 2025
32 of 35 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 7, 2025
… bitmap response regardless of status code (apache#52547)

apache#49710 add a check in MS to forbid
stale calc delete bitmap task to wrongly update delete bitmaps in MS.
But this may lead to load fail due to the check on FE.
This PR let FE retry to commit the txn when encounter stale calc delete
bitmap response regardless of task's status code to avoid the problem.
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 7, 2025
… bitmap response regardless of status code (apache#52547)

apache#49710 add a check in MS to forbid
stale calc delete bitmap task to wrongly update delete bitmaps in MS.
But this may lead to load fail due to the check on FE.
This PR let FE retry to commit the txn when encounter stale calc delete
bitmap response regardless of task's status code to avoid the problem.
dataroaring pushed a commit that referenced this pull request Jul 8, 2025
… calc delete bitmap response regardless of status code (#52547) (#52848)

pick #52547
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jul 8, 2025
… bitmap response regardless of status code (apache#52547)

apache#49710 add a check in MS to forbid
stale calc delete bitmap task to wrongly update delete bitmaps in MS.
But this may lead to load fail due to the check on FE.
This PR let FE retry to commit the txn when encounter stale calc delete
bitmap response regardless of task's status code to avoid the problem.
morrySnow pushed a commit that referenced this pull request Jul 8, 2025
… calc delete bitmap response regardless of status code #52547 (#52847)

pick #52547
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.7-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants