forked from ClickHouse/ClickHouse
-
Notifications
You must be signed in to change notification settings - Fork 14
Closed
Labels
Description
Describe the bug
Segfault on unexpected swarm node shutdown
To Reproduce
During long cluster request one of swarm nodes shutdown, segfault on initiator.
Additional context
2025.12.01 14:01:47.774942 [ 35 ] {} <Error> TCPHandler: Code: 394. DB::Exception: Received from clickhouse2:9000. DB::Exception: Query was cancelled. Stack trace:
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133dc59f
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c88738e
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c886e40
3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x000000000c895eeb
4. DB::QueryStatus::throwQueryWasCancelled() const @ 0x00000000180d90ef
5. DB::QueryStatus::throwProperExceptionIfNeeded(unsigned long const&, unsigned long const&) @ 0x00000000180d8fec
6. DB::PipelineExecutor::finalizeExecution() @ 0x0000000019c0c6e3
7. DB::PipelineExecutor::execute(unsigned long, bool) @ 0x0000000019c0c1fd
8. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000019c2669a
9. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000013538512
10. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001353ffda
11. ? @ 0x0000000000094ac3
12. ? @ 0x00000000001268c0
. (QUERY_WAS_CANCELLED), Stack trace (when copying this message, always include the lines below):
0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133dc59f
1. DB::Exception::Exception(String const&, int, String, bool) @ 0x00000000120bc3ce
2. DB::readException(DB::ReadBuffer&, String const&, bool) @ 0x00000000134e3833
3. DB::Connection::receiveException() const @ 0x00000000199ca475
4. DB::Connection::receivePacket() @ 0x00000000199d4559
5. DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x0000000019a1da9f
6. DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x0000000016f155c4
7. void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000016f14b03
2025.12.01 14:01:59.172140 [ 30 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2025.12.01 14:01:59.172152 [ 30 ] {} <Fatal> BaseDaemon: (version 25.8.9.20496.altinityantalya (altinity build), build id: 7B3059EF2805AEA3332657A247B0D61BC252306D, git hash: f5fb292ae0cc37a2f2f4bbdb10b21328ee363eae, architecture: x86_64) (from thread 759) Received signal 11
2025.12.01 14:01:59.172154 [ 30 ] {} <Fatal> BaseDaemon: Signal description: Segmentation fault
2025.12.01 14:01:59.172158 [ 30 ] {} <Fatal> BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
2025.12.01 14:01:59.172160 [ 30 ] {} <Fatal> BaseDaemon: Stack trace: 0x00000000199d3cad 0x0000000019a1da9f 0x0000000016f155c4 0x0000000016f14b03
2025.12.01 14:01:59.172163 [ 30 ] {} <Fatal> BaseDaemon: ########################################
2025.12.01 14:01:59.172220 [ 30 ] {} <Fatal> BaseDaemon: (version 25.8.9.20496.altinityantalya (altinity build), build id: 7B3059EF2805AEA3332657A247B0D61BC252306D, git hash: f5fb292ae0cc37a2f2f4bbdb10b21328ee363eae) (from thread 759) (query_id: 1de83611-3dfa-490a-aab1-b0d9c4ce0fd5) (query: SELECT count(), hostName()
FROM datalakecatalog_db_dfdfb603_ceb5_11f0_af75_e0c26496f172.`namespace_dfdfc02a_ceb5_11f0_88ee_e0c26496f172.table_dfdfc060_ceb5_11f0_af22_e0c26496f172`
WHERE NOT ignore(sleepEachRow(1))
GROUP BY hostName()
SETTINGS
object_storage_cluster='static_swarm_cluster',
max_threads=1
) Received signal Segmentation fault (11)
2025.12.01 14:01:59.172238 [ 30 ] {} <Fatal> BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
2025.12.01 14:01:59.172251 [ 30 ] {} <Fatal> BaseDaemon: Stack trace: 0x00000000199d3cad 0x0000000019a1da9f 0x0000000016f155c4 0x0000000016f14b03
2025.12.01 14:01:59.172292 [ 30 ] {} <Fatal> BaseDaemon: 2. DB::Connection::receivePacket() @ 0x00000000199d3cad
2025.12.01 14:01:59.172316 [ 30 ] {} <Fatal> BaseDaemon: 3. DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x0000000019a1da9f
2025.12.01 14:01:59.172336 [ 30 ] {} <Fatal> BaseDaemon: 4. DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x0000000016f155c4
2025.12.01 14:01:59.172353 [ 30 ] {} <Fatal> BaseDaemon: 5. void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000016f14b03
2025.12.01 14:01:59.321705 [ 30 ] {} <Fatal> BaseDaemon: Integrity check of the executable successfully passed (checksum: C4DC71C257DFD69671A014F759569F7D)
2025.12.01 14:01:59.321894 [ 30 ] {} <Fatal> BaseDaemon: Report this error to https://github.com/Altinity/ClickHouse/issues
2025.12.01 14:01:59.322044 [ 30 ] {} <Fatal> BaseDaemon: Changed settings: max_threads = 1, use_uncompressed_cache = false, load_balancing = 'random', max_memory_usage = 10000000000, parallel_replicas_for_cluster_engines = false, object_storage_cluster = 'static_swarm_cluster'