Skip to content

Conversation

@kaijchen
Copy link
Member

@kaijchen kaijchen commented Feb 28, 2025

What problem does this PR solve?

Issue Number: DORIS-18927

Related PR: #47860 and #47610

Problem Summary:

In memtable_on_sink_node mode, the failed DeltaWriter could still be written by other sinks.
We cannot simply reset MemTable to nullptr when error happens in write.

Fix coredump caused by MemTable nullptr.

#7  0x00005654bf075bf2 in doris::MemTable::insert (this=0x0, input_block=0x7fd13b4a7fd0, row_idxs=...) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:186
#8  0x00005654bf0898e6 in doris::MemTableWriter::write (this=0x7fcdf6452c00, block=0x7fd13b4a7fd0, row_idxs=...) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:118
#9  0x00005654c8e21dc1 in doris::DeltaWriterV2::write (this=0x7fcdf6500800, block=0x7fd13b4a7fd0, row_idxs=...) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/delta_writer_v2.cpp:166
#10 0x00005654c92832f5 in doris::vectorized::VTabletWriterV2::_write_memtable (this=this@entry=0x7fc995004400, block=..., tablet_id=1740267623826, rows=...) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:525
#11 0x00005654c9282acb in doris::vectorized::VTabletWriterV2::write (this=0x7fc995004400, state=<optimized out>, input_block=...) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/sink/writer/vtablet_writer_v2.cpp:468
#12 0x00005654c9228a30 in doris::vectorized::AsyncResultWriter::process_block (this=0x7fc995004400, state=0x7fcca8f3fc00, profile=<optimized out>) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/sink/writer/async_result_writer.cpp:134
#13 0x00005654c92292b9 in doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const (this=0x7fcd8c4248a0) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/sink/writer/async_result_writer.cpp:93

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Feb 28, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaijchen
Copy link
Member Author

run buildall

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added approved Indicates a PR has been approved by one committer. reviewed labels Feb 28, 2025
@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 31748 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aa9bb1b84b5657fe088cc34405908094a51d09fe, data reload: false

------ Round 1 ----------------------------------
q1	17599	5288	5102	5102
q2	2052	306	175	175
q3	10392	1343	722	722
q4	10215	1041	541	541
q5	7548	2443	2373	2373
q6	192	171	133	133
q7	927	758	615	615
q8	9329	1316	1153	1153
q9	5074	4600	4779	4600
q10	6807	2319	1899	1899
q11	468	276	259	259
q12	351	375	220	220
q13	17753	3745	3049	3049
q14	234	225	214	214
q15	501	481	465	465
q16	646	617	575	575
q17	579	870	345	345
q18	6883	6259	6346	6259
q19	1197	962	555	555
q20	337	335	189	189
q21	2917	2371	1995	1995
q22	371	340	310	310
Total cold run time: 102372 ms
Total hot run time: 31748 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5132	5155	5132	5132
q2	240	334	238	238
q3	2180	2697	2286	2286
q4	1428	1824	1355	1355
q5	4248	4146	4141	4141
q6	211	168	127	127
q7	1879	1820	1740	1740
q8	2653	2549	2544	2544
q9	7267	7194	7224	7194
q10	2982	3206	2734	2734
q11	578	517	488	488
q12	709	772	643	643
q13	3577	3947	3288	3288
q14	297	292	290	290
q15	498	476	455	455
q16	640	697	631	631
q17	1178	1591	1347	1347
q18	7512	7494	7371	7371
q19	812	839	905	839
q20	1997	1985	1846	1846
q21	5482	5018	4949	4949
q22	631	611	569	569
Total cold run time: 52131 ms
Total hot run time: 50207 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184102 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aa9bb1b84b5657fe088cc34405908094a51d09fe, data reload: false

query1	995	406	373	373
query2	6531	1949	1961	1949
query3	6797	209	213	209
query4	26418	23576	23362	23362
query5	4374	681	511	511
query6	339	203	191	191
query7	4616	505	295	295
query8	311	255	244	244
query9	8627	2501	2517	2501
query10	476	323	252	252
query11	15282	15081	14878	14878
query12	156	114	108	108
query13	1676	532	402	402
query14	9484	6960	6486	6486
query15	212	200	182	182
query16	7789	633	470	470
query17	1237	723	558	558
query18	1998	409	304	304
query19	197	188	170	170
query20	120	112	116	112
query21	205	125	104	104
query22	4012	4319	4231	4231
query23	34003	32923	32994	32923
query24	7962	2381	2384	2381
query25	529	451	386	386
query26	1229	271	153	153
query27	2150	539	330	330
query28	3974	2439	2414	2414
query29	721	546	422	422
query30	237	194	152	152
query31	942	854	804	804
query32	74	67	69	67
query33	559	358	331	331
query34	791	861	501	501
query35	781	835	736	736
query36	931	994	880	880
query37	120	95	74	74
query38	4106	4128	3990	3990
query39	1467	1390	1411	1390
query40	208	121	106	106
query41	55	54	51	51
query42	122	104	108	104
query43	506	518	490	490
query44	1317	786	788	786
query45	191	167	165	165
query46	887	1030	655	655
query47	1751	1775	1710	1710
query48	379	435	313	313
query49	795	535	399	399
query50	666	744	410	410
query51	4192	4129	4133	4129
query52	116	108	97	97
query53	233	276	193	193
query54	496	491	404	404
query55	90	85	85	85
query56	274	272	269	269
query57	1125	1121	1080	1080
query58	246	265	239	239
query59	2627	2682	2687	2682
query60	264	271	254	254
query61	126	117	115	115
query62	815	771	679	679
query63	235	188	195	188
query64	4276	1053	652	652
query65	3252	3157	3200	3157
query66	1070	403	353	353
query67	15837	15532	15191	15191
query68	8659	878	503	503
query69	466	290	281	281
query70	1208	1087	1088	1087
query71	452	296	279	279
query72	5662	3588	3843	3588
query73	783	745	349	349
query74	8918	9191	8692	8692
query75	3894	3236	2686	2686
query76	3726	1178	747	747
query77	782	367	279	279
query78	9932	10031	9277	9277
query79	5601	829	565	565
query80	699	523	440	440
query81	490	280	248	248
query82	729	122	98	98
query83	209	181	156	156
query84	289	96	74	74
query85	770	340	300	300
query86	335	297	261	261
query87	4442	4442	4412	4412
query88	3401	2186	2189	2186
query89	457	331	289	289
query90	1985	199	192	192
query91	143	145	106	106
query92	75	62	55	55
query93	2769	1062	577	577
query94	661	420	305	305
query95	369	263	278	263
query96	477	565	270	270
query97	3338	3359	3278	3278
query98	224	199	200	199
query99	1484	1398	1278	1278
Total cold run time: 278008 ms
Total hot run time: 184102 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.37 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit aa9bb1b84b5657fe088cc34405908094a51d09fe, data reload: false

query1	0.03	0.03	0.04
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.61	0.10	0.10
query5	0.54	0.54	0.54
query6	1.18	0.73	0.72
query7	0.03	0.02	0.01
query8	0.04	0.03	0.03
query9	0.62	0.52	0.52
query10	0.57	0.57	0.57
query11	0.16	0.11	0.11
query12	0.15	0.11	0.11
query13	0.62	0.61	0.61
query14	2.81	2.78	2.67
query15	0.93	0.86	0.85
query16	0.40	0.39	0.38
query17	1.05	1.02	1.05
query18	0.21	0.20	0.19
query19	1.91	1.83	1.94
query20	0.02	0.01	0.01
query21	15.44	0.89	0.56
query22	0.76	1.26	0.64
query23	14.91	1.39	0.61
query24	7.24	1.79	1.14
query25	0.55	0.26	0.11
query26	0.52	0.17	0.15
query27	0.06	0.05	0.05
query28	10.24	0.81	0.45
query29	12.61	3.99	3.32
query30	0.23	0.09	0.07
query31	2.81	0.63	0.40
query32	3.24	0.57	0.50
query33	3.06	3.03	3.11
query34	15.66	5.32	4.61
query35	4.60	4.53	4.56
query36	0.67	0.51	0.50
query37	0.09	0.06	0.06
query38	0.06	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.15	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 106.3 s
Total hot run time: 31.37 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 44.97% (12000/26686)
Line Coverage 34.46% (100817/292567)
Region Coverage 33.64% (51635/153493)
Branch Coverage 29.40% (26116/88836)

@dataroaring dataroaring merged commit ae78e19 into apache:master Mar 4, 2025
32 of 34 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 4, 2025
### What problem does this PR solve?

Issue Number: DORIS-18927

Related PR: #47860 and #47610
dataroaring pushed a commit that referenced this pull request Mar 10, 2025
Cherry-picked from #48489

Co-authored-by: Kaijie Chen <chenkaijie@selectdb.com>
dataroaring pushed a commit that referenced this pull request Mar 19, 2025
Related PR: #48489 

Problem Summary:

The _mem_table_ptr_lock will be locked in _reset_mem_table, so don't
need to be acquired in MemTableWriter::write.
github-actions bot pushed a commit that referenced this pull request Mar 19, 2025
Related PR: #48489 

Problem Summary:

The _mem_table_ptr_lock will be locked in _reset_mem_table, so don't
need to be acquired in MemTableWriter::write.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Issue Number: DORIS-18927

Related PR: apache#47860 and apache#47610
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
Related PR: apache#48489 

Problem Summary:

The _mem_table_ptr_lock will be locked in _reset_mem_table, so don't
need to be acquired in MemTableWriter::write.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.5-merged p0_c reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants