Skip to content

Conversation

@Gabriel39
Copy link
Contributor

…epare execution (#51492)

Issue Number: close #51491

Problem Summary:
When the queue of the FragmentMgrAsync thread pool is full, newly submitted tasks are rejected and return early. However, previously submitted tasks may still be scheduled for execution later. This can lead to premature destruction of objects such as PipelineFragmentContext and TPipelineFragmentParams that are referenced by those tasks, resulting in null pointer exceptions during task execution and ultimately causing a coredump.

The pr policy is to wait until all previously submitted tasks are completed before returning.

*** SIGSEGV address not mapped to object (@0x1c8) received by PID 3941201 (TID 2115617 OR 0xfe1685bb97f0) from PID 456; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/common/signal_handler.h:421
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 4# 0x0000FFFF6B2A07C0 in linux-vdso.so.1
 5# doris::TUniqueId::TUniqueId(doris::TUniqueId const&) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/gensrc/build/gen_cpp/Types_types.cpp:2354
 6# doris::AttachTask::AttachTask(doris::QueryContext*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/runtime/thread_context.cpp:60
 7# std::_Function_handler<void (), doris::pipeline::PipelineXFragmentContext::_build_pipeline_x_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_0>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/aarch64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:290
 8# doris::ThreadPool::dispatch_thread() at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/util/threadpool.cpp:552
 9# doris::Thread::supervise_thread(void*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/util/thread.cpp:499
10# 0x0000FFFF6AF187AC in /lib64/libpthread.so.0
11# 0x0000FFFF6B16548C in /lib64/libc.so.6

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…epare execution (apache#51492)

Issue Number: close apache#51491

Problem Summary:
When the queue of the FragmentMgrAsync thread pool is full, newly
submitted tasks are rejected and return early. However, previously
submitted tasks may still be scheduled for execution later. This can
lead to premature destruction of objects such as PipelineFragmentContext
and TPipelineFragmentParams that are referenced by those tasks,
resulting in null pointer exceptions during task execution and
ultimately causing a coredump.

The pr policy is to wait until all previously submitted tasks are
completed before returning.

```
*** SIGSEGV address not mapped to object (@0x1c8) received by PID 3941201 (TID 2115617 OR 0xfe1685bb97f0) from PID 456; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/common/signal_handler.h:421
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 4# 0x0000FFFF6B2A07C0 in linux-vdso.so.1
 5# doris::TUniqueId::TUniqueId(doris::TUniqueId const&) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/gensrc/build/gen_cpp/Types_types.cpp:2354
 6# doris::AttachTask::AttachTask(doris::QueryContext*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/runtime/thread_context.cpp:60
 7# std::_Function_handler<void (), doris::pipeline::PipelineXFragmentContext::_build_pipeline_x_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_0>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/aarch64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:290
 8# doris::ThreadPool::dispatch_thread() at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/util/threadpool.cpp:552
 9# doris::Thread::supervise_thread(void*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/util/thread.cpp:499
10# 0x0000FFFF6AF187AC in /lib64/libpthread.so.0
11# 0x0000FFFF6B16548C in /lib64/libc.so.6
```

Co-authored-by: XLPE <weiwh1@chinatelecom.cn>
@Gabriel39 Gabriel39 requested a review from dataroaring as a code owner June 26, 2025 09:17
@Gabriel39
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@doris-robot
Copy link

TPC-H: Total hot run time: 41971 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 759542ad5e649e21bfde2b3974cf09d0543ea022, data reload: false

------ Round 1 ----------------------------------
q1	17614	7568	7449	7449
q2	2092	175	164	164
q3	10554	1143	1235	1143
q4	10579	822	887	822
q5	7787	3070	3067	3067
q6	236	132	133	132
q7	1081	612	607	607
q8	9350	2088	2143	2088
q9	7144	6818	6830	6818
q10	7025	2334	2402	2334
q11	469	269	254	254
q12	439	213	215	213
q13	17768	2952	3014	2952
q14	249	203	202	202
q15	534	465	477	465
q16	492	407	388	388
q17	1035	604	597	597
q18	7485	6586	6635	6586
q19	1389	1197	1157	1157
q20	513	208	205	205
q21	4169	3441	3351	3351
q22	1127	997	977	977
Total cold run time: 109131 ms
Total hot run time: 41971 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7383	7321	7314	7314
q2	342	235	236	235
q3	3192	3075	3095	3075
q4	2241	1901	1842	1842
q5	5946	5944	5943	5943
q6	230	129	129	129
q7	2313	1798	1761	1761
q8	3675	3806	3833	3806
q9	9116	9051	8995	8995
q10	3738	3757	3670	3670
q11	634	487	494	487
q12	843	654	564	564
q13	9644	3124	3148	3124
q14	314	267	291	267
q15	523	469	461	461
q16	529	443	429	429
q17	2046	1716	1718	1716
q18	8434	7846	7744	7744
q19	1851	1743	1804	1743
q20	2198	1878	1847	1847
q21	5385	5259	5224	5224
q22	1168	1022	1034	1022
Total cold run time: 71745 ms
Total hot run time: 61398 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197489 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 759542ad5e649e21bfde2b3974cf09d0543ea022, data reload: false

query1	1339	903	905	903
query2	6254	1922	1933	1922
query3	10882	4510	4176	4176
query4	65267	28726	24026	24026
query5	4970	455	463	455
query6	402	180	182	180
query7	5540	314	316	314
query8	318	223	220	220
query9	8603	2599	2570	2570
query10	440	274	265	265
query11	17364	15039	15814	15039
query12	154	107	108	107
query13	1477	453	464	453
query14	10468	7493	7326	7326
query15	206	186	187	186
query16	7307	505	503	503
query17	1089	614	614	614
query18	1854	329	321	321
query19	211	174	159	159
query20	115	110	116	110
query21	215	109	103	103
query22	4619	4521	4394	4394
query23	34761	33907	33824	33824
query24	6200	2971	3020	2971
query25	546	432	429	429
query26	651	174	171	171
query27	1699	364	361	361
query28	3877	2184	2134	2134
query29	725	449	456	449
query30	231	155	153	153
query31	996	818	836	818
query32	70	60	60	60
query33	413	322	296	296
query34	911	532	526	526
query35	816	741	733	733
query36	1087	973	952	952
query37	109	66	71	66
query38	4085	3957	4052	3957
query39	1537	1475	1490	1475
query40	198	103	103	103
query41	48	47	47	47
query42	114	105	98	98
query43	530	496	480	480
query44	1158	809	820	809
query45	185	169	172	169
query46	1141	744	760	744
query47	2007	1872	1902	1872
query48	536	400	384	384
query49	732	419	384	384
query50	847	430	449	430
query51	7429	7271	7314	7271
query52	103	103	94	94
query53	285	193	188	188
query54	590	468	468	468
query55	84	81	77	77
query56	280	253	249	249
query57	1295	1154	1140	1140
query58	217	207	206	206
query59	3049	2973	2875	2875
query60	274	253	257	253
query61	117	107	111	107
query62	788	668	652	652
query63	224	196	197	196
query64	1338	675	681	675
query65	3261	3185	3252	3185
query66	701	321	299	299
query67	15745	15696	15632	15632
query68	4190	594	572	572
query69	434	265	270	265
query70	1184	1134	1137	1134
query71	343	266	269	266
query72	6383	4033	3982	3982
query73	774	348	361	348
query74	10311	9121	9141	9121
query75	3366	2650	2638	2638
query76	1973	1082	1113	1082
query77	495	272	285	272
query78	10650	9627	9614	9614
query79	1251	606	606	606
query80	842	429	421	421
query81	476	218	221	218
query82	1299	92	89	89
query83	226	145	147	145
query84	284	84	76	76
query85	893	318	299	299
query86	328	301	290	290
query87	4406	4242	4234	4234
query88	3482	2408	2354	2354
query89	410	292	293	292
query90	1999	195	189	189
query91	182	155	146	146
query92	69	51	53	51
query93	1332	555	549	549
query94	789	291	288	288
query95	358	261	259	259
query96	622	280	284	280
query97	3381	3136	3147	3136
query98	214	202	198	198
query99	1629	1267	1289	1267
Total cold run time: 315659 ms
Total hot run time: 197489 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 759542ad5e649e21bfde2b3974cf09d0543ea022, data reload: false

query1	0.03	0.03	0.03
query2	0.06	0.03	0.03
query3	0.23	0.07	0.06
query4	1.64	0.11	0.11
query5	0.52	0.51	0.51
query6	1.14	0.73	0.73
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.56	0.49	0.49
query10	0.55	0.54	0.54
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.60	0.59
query14	0.76	0.81	0.81
query15	0.84	0.82	0.82
query16	0.36	0.38	0.37
query17	1.01	1.05	0.99
query18	0.24	0.21	0.21
query19	1.96	1.76	1.79
query20	0.02	0.01	0.01
query21	15.48	0.62	0.60
query22	2.42	1.98	1.66
query23	17.11	0.82	0.86
query24	2.62	1.13	0.80
query25	0.28	0.22	0.04
query26	0.43	0.14	0.15
query27	0.04	0.03	0.04
query28	10.81	0.52	0.46
query29	12.57	3.22	3.20
query30	0.25	0.06	0.07
query31	2.86	0.39	0.39
query32	3.24	0.46	0.46
query33	2.96	2.98	3.09
query34	17.23	4.49	4.44
query35	4.53	4.54	4.51
query36	0.65	0.48	0.47
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.03
query40	0.17	0.13	0.12
query41	0.08	0.02	0.03
query42	0.04	0.02	0.01
query43	0.04	0.03	0.03
Total cold run time: 104.86 s
Total hot run time: 29.68 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 27, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 35a2633 into apache:branch-3.0 Jun 27, 2025
23 of 26 checks passed
koarz pushed a commit to koarz/doris that referenced this pull request Jul 3, 2025
apache#52365)

…epare execution (apache#51492)

Issue Number: close apache#51491

Problem Summary:
When the queue of the FragmentMgrAsync thread pool is full, newly
submitted tasks are rejected and return early. However, previously
submitted tasks may still be scheduled for execution later. This can
lead to premature destruction of objects such as PipelineFragmentContext
and TPipelineFragmentParams that are referenced by those tasks,
resulting in null pointer exceptions during task execution and
ultimately causing a coredump.

The pr policy is to wait until all previously submitted tasks are
completed before returning.

```
*** SIGSEGV address not mapped to object (@0x1c8) received by PID 3941201 (TID 2115617 OR 0xfe1685bb97f0) from PID 456; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/common/signal_handler.h:421
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/jdk64/current/jre/lib/aarch64/server/libjvm.so
 4# 0x0000FFFF6B2A07C0 in linux-vdso.so.1
 5# doris::TUniqueId::TUniqueId(doris::TUniqueId const&) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/gensrc/build/gen_cpp/Types_types.cpp:2354
 6# doris::AttachTask::AttachTask(doris::QueryContext*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/runtime/thread_context.cpp:60
 7# std::_Function_handler<void (), doris::pipeline::PipelineXFragmentContext::_build_pipeline_x_tasks(doris::TPipelineFragmentParams const&, doris::ThreadPool*)::$_0>::_M_invoke(std::_Any_data const&) at /usr/lib/gcc/aarch64-linux-gnu/13/../../../../include/c++/13/bits/std_function.h:290
 8# doris::ThreadPool::dispatch_thread() at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/util/threadpool.cpp:552
 9# doris::Thread::supervise_thread(void*) at /home/jenkins_agent/workspace/BigDataComponent_doris-unified-arm-release/be/src/util/thread.cpp:499
10# 0x0000FFFF6AF187AC in /lib64/libpthread.so.0
11# 0x0000FFFF6B16548C in /lib64/libc.so.6
```

Co-authored-by: XLPE <crykix@gmail.com>
Co-authored-by: XLPE <weiwh1@chinatelecom.cn>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants