Skip to content

Conversation

@liaoxin01
Copy link
Contributor

@liaoxin01 liaoxin01 commented Feb 7, 2025

What problem does this PR solve?

*** Query id: 5447701417c13e4e-cea25b10f284c6a5 ***
*** is nereids: 0 ***
*** tablet id: 1738818748602 ***
*** Aborted at 1738820047 (unix time) try "date -d @1738820047" if you are using GNU date ***
*** Current BE git commitID: 512681c ***
*** SIGSEGV invalid permissions for mapped object (@0x7f112a5df53f) received by PID 6310 (TID 6765 OR 0x7f1384ed3640) from PID 710800703; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007F14815CC520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::ColumnVector::insert_indices_from(doris::vectorized::IColumn const&, unsigned int const*, unsigned int const*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_vector.cpp:323
5# doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block const*, unsigned int const*, unsigned int const*, std::vector<int, std::allocator > const*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/core/block.cpp:1036
6# doris::MemTable::_put_into_output(doris::vectorized::Block&) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:257
7# doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block, std::default_deletedoris::vectorized::Block >) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:513
8# doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block, std::default_deletedoris::vectorized::Block >
) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:532
9# doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:144
10# doris::FlushToken::_flush_memtable(std::shared_ptrdoris::MemTable, int, long) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
11# doris::MemtableFlushTask::run() at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:60
12# doris::ThreadPool::dispatch_thread() in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
13# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
14# start_thread at ./nptl/pthread_create.c:442
15# 0x00007F14816B0850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Problem Summary:

  • When memtable insert fails (e.g., due to memory allocation failure during add_rows),
    the memtable is left in an inconsistent state
  • Under memory pressure, the system might trigger a flush operation on this failed memtable,
    leading to crashes

Solution:

  • Reset memtable immediately after insert failure

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liaoxin01
Copy link
Contributor Author

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 7, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

github-actions bot commented Feb 7, 2025

PR approved by anyone and no changes requested.

@doris-robot
Copy link

TPC-H: Total hot run time: 31418 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 250a3422e7ab332f3964039912a96923c95180fe, data reload: false

------ Round 1 ----------------------------------
q1	17570	5215	5062	5062
q2	2045	303	176	176
q3	10397	1281	708	708
q4	10221	1050	539	539
q5	7528	2400	2344	2344
q6	184	166	141	141
q7	925	743	591	591
q8	9298	1339	1078	1078
q9	5097	4517	4610	4517
q10	6814	2327	1888	1888
q11	478	284	266	266
q12	348	351	218	218
q13	17768	3694	3150	3150
q14	235	230	208	208
q15	501	466	451	451
q16	596	615	586	586
q17	570	884	335	335
q18	6943	6286	6237	6237
q19	1216	964	527	527
q20	323	325	191	191
q21	2797	2124	1901	1901
q22	365	314	304	304
Total cold run time: 102219 ms
Total hot run time: 31418 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5102	5100	5158	5100
q2	239	329	233	233
q3	2179	2650	2287	2287
q4	1491	1856	1404	1404
q5	4276	4167	4178	4167
q6	212	161	124	124
q7	1862	1826	1638	1638
q8	2662	2611	2522	2522
q9	7369	7085	7233	7085
q10	3003	3208	2786	2786
q11	579	512	483	483
q12	696	787	614	614
q13	3487	3895	3325	3325
q14	279	296	270	270
q15	502	468	466	466
q16	625	673	634	634
q17	1139	1625	1329	1329
q18	7587	7399	7177	7177
q19	819	798	876	798
q20	1935	2015	1858	1858
q21	5339	4922	4916	4916
q22	683	583	559	559
Total cold run time: 52065 ms
Total hot run time: 49775 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 182924 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 250a3422e7ab332f3964039912a96923c95180fe, data reload: false

query1	985	372	370	370
query2	6524	1896	1871	1871
query3	6789	215	211	211
query4	26434	23758	22935	22935
query5	4324	663	492	492
query6	298	206	181	181
query7	4607	496	296	296
query8	291	228	214	214
query9	8625	2492	2488	2488
query10	468	329	252	252
query11	15298	15120	14994	14994
query12	156	108	110	108
query13	1689	528	382	382
query14	9556	6606	6100	6100
query15	212	192	184	184
query16	7121	610	470	470
query17	937	704	559	559
query18	1965	402	299	299
query19	196	186	155	155
query20	119	117	125	117
query21	207	121	102	102
query22	4267	4206	4616	4206
query23	34488	33279	32912	32912
query24	7619	2369	2383	2369
query25	520	437	380	380
query26	1233	261	148	148
query27	2141	492	328	328
query28	3903	2383	2348	2348
query29	738	528	457	457
query30	229	188	157	157
query31	923	847	792	792
query32	75	62	64	62
query33	560	355	287	287
query34	778	840	494	494
query35	787	834	728	728
query36	942	1005	887	887
query37	126	97	85	85
query38	4209	4120	4053	4053
query39	1424	1383	1417	1383
query40	210	117	103	103
query41	53	50	50	50
query42	122	101	101	101
query43	516	503	476	476
query44	1264	786	780	780
query45	177	174	164	164
query46	862	1052	644	644
query47	1728	1811	1706	1706
query48	389	397	301	301
query49	774	532	399	399
query50	680	716	417	417
query51	4150	4232	4128	4128
query52	106	102	90	90
query53	224	253	188	188
query54	501	484	407	407
query55	81	78	81	78
query56	274	277	247	247
query57	1131	1136	1061	1061
query58	270	244	241	241
query59	2591	2720	2552	2552
query60	277	275	253	253
query61	116	116	119	116
query62	806	723	651	651
query63	216	184	185	184
query64	4331	985	667	667
query65	3233	3106	3111	3106
query66	1144	409	302	302
query67	15936	15514	15426	15426
query68	8036	774	510	510
query69	525	292	261	261
query70	1193	1120	1111	1111
query71	414	287	262	262
query72	5819	3496	3636	3496
query73	704	715	344	344
query74	8988	9084	9163	9084
query75	3311	3145	2742	2742
query76	3263	1181	744	744
query77	554	380	278	278
query78	10044	10046	9353	9353
query79	2520	790	587	587
query80	648	532	450	450
query81	488	277	233	233
query82	680	147	119	119
query83	170	171	152	152
query84	286	94	74	74
query85	761	339	297	297
query86	337	292	284	284
query87	4449	4651	4485	4485
query88	3688	2160	2147	2147
query89	390	319	279	279
query90	1931	190	187	187
query91	131	135	111	111
query92	80	62	61	61
query93	1577	1014	575	575
query94	706	410	279	279
query95	350	266	254	254
query96	478	609	267	267
query97	2807	2812	2753	2753
query98	227	205	201	201
query99	1356	1421	1250	1250
Total cold run time: 270836 ms
Total hot run time: 182924 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 250a3422e7ab332f3964039912a96923c95180fe, data reload: false

query1	0.04	0.05	0.03
query2	0.07	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.41	0.41	0.41
query6	1.17	0.66	0.65
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.52	0.51
query10	0.58	0.58	0.57
query11	0.16	0.10	0.11
query12	0.14	0.11	0.11
query13	0.61	0.59	0.61
query14	2.76	2.80	2.68
query15	0.93	0.86	0.86
query16	0.38	0.37	0.40
query17	1.05	1.08	1.04
query18	0.22	0.20	0.20
query19	1.92	1.86	2.02
query20	0.01	0.02	0.01
query21	15.35	0.88	0.53
query22	0.75	1.39	0.80
query23	14.76	1.35	0.67
query24	7.12	1.13	0.75
query25	0.53	0.28	0.07
query26	0.55	0.15	0.14
query27	0.06	0.05	0.05
query28	9.74	0.88	0.43
query29	12.60	3.97	3.29
query30	0.25	0.09	0.06
query31	2.84	0.62	0.39
query32	3.23	0.54	0.47
query33	3.07	3.09	2.98
query34	15.82	5.15	4.51
query35	4.52	4.50	4.53
query36	0.67	0.49	0.49
query37	0.08	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.14	0.14
query41	0.08	0.03	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.29 s
Total hot run time: 30.68 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 519f0ee into apache:master Feb 8, 2025
27 of 29 checks passed
github-actions bot pushed a commit that referenced this pull request Feb 8, 2025
…t crash (#47610)

### What problem does this PR solve?

*** Query id: 5447701417c13e4e-cea25b10f284c6a5 ***
*** is nereids: 0 ***
*** tablet id: 1738818748602 ***
*** Aborted at 1738820047 (unix time) try "date -d @1738820047" if you
are using GNU date ***
*** Current BE git commitID: 512681c ***
*** SIGSEGV invalid permissions for mapped object (@0x7f112a5df53f)
received by PID 6310 (TID 6765 OR 0x7f1384ed3640) from PID 710800703;
stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F14815CC520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::ColumnVector<unsigned
char>::insert_indices_from(doris::vectorized::IColumn const&, unsigned
int const*, unsigned int const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_vector.cpp:323
5# doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block
const*, unsigned int const*, unsigned int const*, std::vector<int,
std::allocator<int> > const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/core/block.cpp:1036
6# doris::MemTable::_put_into_output(doris::vectorized::Block&) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:257
7# doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:513
8# doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:532
9# doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*)
at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:144
10# doris::FlushToken::_flush_memtable(std::shared_ptr<doris::MemTable>,
int, long) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
11# doris::MemtableFlushTask::run() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:60
12# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
14# start_thread at ./nptl/pthread_create.c:442
15# 0x00007F14816B0850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Problem Summary:
- When memtable insert fails (e.g., due to memory allocation failure
during add_rows),
  the memtable is left in an inconsistent state
- Under memory pressure, the system might trigger a flush operation on
this failed memtable,
  leading to crashes

Solution:
- Reset memtable immediately after insert failure
dataroaring pushed a commit that referenced this pull request Feb 10, 2025
…re to prevent crash #47610 (#47636)

Cherry-picked from #47610

Co-authored-by: Xin Liao <liaoxin@selectdb.com>
dataroaring pushed a commit that referenced this pull request Feb 13, 2025
…7860)

Related PR: #47610

Problem Summary:
SIGSEGV address not mapped to object (@0x0) received by PID 340906 (TID
341622 OR 0x7f7f38784640) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F80B37D0520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::MemTableWriter::_flush_memtable_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:157
5# doris::MemTableWriter::flush_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:187
6# doris::MemTableMemoryLimiter::_flush_active_memtables(long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:190
7# doris::MemTableMemoryLimiter::handle_memtable_flush() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:144
8# doris::LoadChannelMgr::add_batch(doris::PTabletWriterAddBlockRequest
const&, doris::PTabletWriterAddBlockResult*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/runtime/load_channel_mgr.cpp:154
9# std::_Function_handler<void (),
doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*,
doris::PTabletWriterAddBlockRequest const*,
doris::PTabletWriterAddBlockResult*,
google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
10# doris::WorkThreadPool<false>::work_thread(int) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/work_thread_pool.hpp:159
11# execute_native_thread_routine at
../../../../../libstdc++-v3/src/c++11/thread.cc:84
12# start_thread at ./nptl/pthread_create.c:442
13# 0x00007F80B38B4850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

This PR addresses potential null pointer dereference crashes that could
occur when write operations fail and the memtable is reset. The changes
add defensive null checks to ensure safe handling of the _mem_table
state during flush memtable.
github-actions bot pushed a commit that referenced this pull request Feb 13, 2025
…7860)

Related PR: #47610

Problem Summary:
SIGSEGV address not mapped to object (@0x0) received by PID 340906 (TID
341622 OR 0x7f7f38784640) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F80B37D0520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::MemTableWriter::_flush_memtable_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:157
5# doris::MemTableWriter::flush_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:187
6# doris::MemTableMemoryLimiter::_flush_active_memtables(long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:190
7# doris::MemTableMemoryLimiter::handle_memtable_flush() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:144
8# doris::LoadChannelMgr::add_batch(doris::PTabletWriterAddBlockRequest
const&, doris::PTabletWriterAddBlockResult*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/runtime/load_channel_mgr.cpp:154
9# std::_Function_handler<void (),
doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*,
doris::PTabletWriterAddBlockRequest const*,
doris::PTabletWriterAddBlockResult*,
google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
10# doris::WorkThreadPool<false>::work_thread(int) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/work_thread_pool.hpp:159
11# execute_native_thread_routine at
../../../../../libstdc++-v3/src/c++11/thread.cc:84
12# start_thread at ./nptl/pthread_create.c:442
13# 0x00007F80B38B4850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

This PR addresses potential null pointer dereference crashes that could
occur when write operations fail and the memtable is reset. The changes
add defensive null checks to ensure safe handling of the _mem_table
state during flush memtable.
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…t crash (apache#47610)

### What problem does this PR solve?

*** Query id: 5447701417c13e4e-cea25b10f284c6a5 ***
*** is nereids: 0 ***
*** tablet id: 1738818748602 ***
*** Aborted at 1738820047 (unix time) try "date -d @1738820047" if you
are using GNU date ***
*** Current BE git commitID: 512681c ***
*** SIGSEGV invalid permissions for mapped object (@0x7f112a5df53f)
received by PID 6310 (TID 6765 OR 0x7f1384ed3640) from PID 710800703;
stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F14815CC520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::ColumnVector<unsigned
char>::insert_indices_from(doris::vectorized::IColumn const&, unsigned
int const*, unsigned int const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_vector.cpp:323
5# doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block
const*, unsigned int const*, unsigned int const*, std::vector<int,
std::allocator<int> > const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/core/block.cpp:1036
6# doris::MemTable::_put_into_output(doris::vectorized::Block&) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:257
7# doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:513
8# doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:532
9# doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*)
at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:144
10# doris::FlushToken::_flush_memtable(std::shared_ptr<doris::MemTable>,
int, long) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
11# doris::MemtableFlushTask::run() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:60
12# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
14# start_thread at ./nptl/pthread_create.c:442
15# 0x00007F14816B0850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Problem Summary:
- When memtable insert fails (e.g., due to memory allocation failure
during add_rows),
  the memtable is left in an inconsistent state
- Under memory pressure, the system might trigger a flush operation on
this failed memtable,
  leading to crashes

Solution:
- Reset memtable immediately after insert failure
lzyy2024 pushed a commit to lzyy2024/doris that referenced this pull request Feb 21, 2025
…ache#47860)

Related PR: apache#47610

Problem Summary:
SIGSEGV address not mapped to object (@0x0) received by PID 340906 (TID
341622 OR 0x7f7f38784640) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F80B37D0520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::MemTableWriter::_flush_memtable_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:157
5# doris::MemTableWriter::flush_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:187
6# doris::MemTableMemoryLimiter::_flush_active_memtables(long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:190
7# doris::MemTableMemoryLimiter::handle_memtable_flush() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:144
8# doris::LoadChannelMgr::add_batch(doris::PTabletWriterAddBlockRequest
const&, doris::PTabletWriterAddBlockResult*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/runtime/load_channel_mgr.cpp:154
9# std::_Function_handler<void (),
doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*,
doris::PTabletWriterAddBlockRequest const*,
doris::PTabletWriterAddBlockResult*,
google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
10# doris::WorkThreadPool<false>::work_thread(int) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/work_thread_pool.hpp:159
11# execute_native_thread_routine at
../../../../../libstdc++-v3/src/c++11/thread.cc:84
12# start_thread at ./nptl/pthread_create.c:442
13# 0x00007F80B38B4850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

This PR addresses potential null pointer dereference crashes that could
occur when write operations fail and the memtable is reset. The changes
add defensive null checks to ensure safe handling of the _mem_table
state during flush memtable.
dataroaring pushed a commit that referenced this pull request Mar 4, 2025
### What problem does this PR solve?

Issue Number: DORIS-18927

Related PR: #47860 and #47610
github-actions bot pushed a commit that referenced this pull request Mar 4, 2025
### What problem does this PR solve?

Issue Number: DORIS-18927

Related PR: #47860 and #47610
dataroaring pushed a commit that referenced this pull request Mar 10, 2025
Cherry-picked from #48489

Co-authored-by: Kaijie Chen <chenkaijie@selectdb.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…t crash (apache#47610)

### What problem does this PR solve?

*** Query id: 5447701417c13e4e-cea25b10f284c6a5 ***
*** is nereids: 0 ***
*** tablet id: 1738818748602 ***
*** Aborted at 1738820047 (unix time) try "date -d @1738820047" if you
are using GNU date ***
*** Current BE git commitID: 512681c ***
*** SIGSEGV invalid permissions for mapped object (@0x7f112a5df53f)
received by PID 6310 (TID 6765 OR 0x7f1384ed3640) from PID 710800703;
stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F14815CC520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::ColumnVector<unsigned
char>::insert_indices_from(doris::vectorized::IColumn const&, unsigned
int const*, unsigned int const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/columns/column_vector.cpp:323
5# doris::vectorized::MutableBlock::add_rows(doris::vectorized::Block
const*, unsigned int const*, unsigned int const*, std::vector<int,
std::allocator<int> > const*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/vec/core/block.cpp:1036
6# doris::MemTable::_put_into_output(doris::vectorized::Block&) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:257
7# doris::MemTable::_to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:513
8# doris::MemTable::to_block(std::unique_ptr<doris::vectorized::Block,
std::default_delete<doris::vectorized::Block> >*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable.cpp:532
9# doris::FlushToken::_do_flush_memtable(doris::MemTable*, int, long*)
at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:144
10# doris::FlushToken::_flush_memtable(std::shared_ptr<doris::MemTable>,
int, long) in /mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
11# doris::MemtableFlushTask::run() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_flush_executor.cpp:60
12# doris::ThreadPool::dispatch_thread() in
/mnt/hdd01/PERFORMANCE_ENV/be/lib/doris_be
13# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/thread.cpp:499
14# start_thread at ./nptl/pthread_create.c:442
15# 0x00007F14816B0850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Problem Summary:
- When memtable insert fails (e.g., due to memory allocation failure
during add_rows),
  the memtable is left in an inconsistent state
- Under memory pressure, the system might trigger a flush operation on
this failed memtable,
  leading to crashes

Solution:
- Reset memtable immediately after insert failure
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…ache#47860)

Related PR: apache#47610

Problem Summary:
SIGSEGV address not mapped to object (@0x0) received by PID 340906 (TID
341622 OR 0x7f7f38784640) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F80B37D0520 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::MemTableWriter::_flush_memtable_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:157
5# doris::MemTableWriter::flush_async() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_writer.cpp:187
6# doris::MemTableMemoryLimiter::_flush_active_memtables(long) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:190
7# doris::MemTableMemoryLimiter::handle_memtable_flush() at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/olap/memtable_memory_limiter.cpp:144
8# doris::LoadChannelMgr::add_batch(doris::PTabletWriterAddBlockRequest
const&, doris::PTabletWriterAddBlockResult*) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/runtime/load_channel_mgr.cpp:154
9# std::_Function_handler<void (),
doris::PInternalService::tablet_writer_add_block(google::protobuf::RpcController*,
doris::PTabletWriterAddBlockRequest const*,
doris::PTabletWriterAddBlockResult*,
google::protobuf::Closure*)::$_0>::_M_invoke(std::_Any_data const&) at
/var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
10# doris::WorkThreadPool<false>::work_thread(int) at
/home/zcp/repo_center/doris_branch-3.0/doris/be/src/util/work_thread_pool.hpp:159
11# execute_native_thread_routine at
../../../../../libstdc++-v3/src/c++11/thread.cc:84
12# start_thread at ./nptl/pthread_create.c:442
13# 0x00007F80B38B4850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

This PR addresses potential null pointer dereference crashes that could
occur when write operations fail and the memtable is reset. The changes
add defensive null checks to ensure safe handling of the _mem_table
state during flush memtable.
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
### What problem does this PR solve?

Issue Number: DORIS-18927

Related PR: apache#47860 and apache#47610
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.4-merged p0_c reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants