Skip to content

Segfault on unexpected swarm node shutdown #1200

@ianton-ru

Description

@ianton-ru

Describe the bug
Segfault on unexpected swarm node shutdown

To Reproduce
During long cluster request one of swarm nodes shutdown, segfault on initiator.

Additional context

2025.12.01 14:01:47.774942 [ 35 ] {} <Error> TCPHandler: Code: 394. DB::Exception: Received from clickhouse2:9000. DB::Exception: Query was cancelled. Stack trace:

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133dc59f
1. DB::Exception::Exception(String&&, int, String, bool) @ 0x000000000c88738e
2. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x000000000c886e40
3. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x000000000c895eeb
4. DB::QueryStatus::throwQueryWasCancelled() const @ 0x00000000180d90ef
5. DB::QueryStatus::throwProperExceptionIfNeeded(unsigned long const&, unsigned long const&) @ 0x00000000180d8fec
6. DB::PipelineExecutor::finalizeExecution() @ 0x0000000019c0c6e3
7. DB::PipelineExecutor::execute(unsigned long, bool) @ 0x0000000019c0c1fd
8. void std::__function::__policy_invoker<void ()>::__call_impl[abi:ne190107]<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true, true>::ThreadFromGlobalPoolImpl<DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0>(DB::PullingAsyncPipelineExecutor::pull(DB::Chunk&, unsigned long)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x0000000019c2669a
9. ThreadPoolImpl<std::thread>::ThreadFromThreadPool::worker() @ 0x0000000013538512
10. void* std::__thread_proxy[abi:ne190107]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void (ThreadPoolImpl<std::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::thread>::ThreadFromThreadPool*>>(void*) @ 0x000000001353ffda
11. ? @ 0x0000000000094ac3
12. ? @ 0x00000000001268c0
. (QUERY_WAS_CANCELLED), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x00000000133dc59f
1. DB::Exception::Exception(String const&, int, String, bool) @ 0x00000000120bc3ce
2. DB::readException(DB::ReadBuffer&, String const&, bool) @ 0x00000000134e3833
3. DB::Connection::receiveException() const @ 0x00000000199ca475
4. DB::Connection::receivePacket() @ 0x00000000199d4559
5. DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x0000000019a1da9f
6. DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x0000000016f155c4
7. void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000016f14b03

2025.12.01 14:01:59.172140 [ 30 ] {} <Fatal> BaseDaemon: ########## Short fault info ############
2025.12.01 14:01:59.172152 [ 30 ] {} <Fatal> BaseDaemon: (version 25.8.9.20496.altinityantalya (altinity build), build id: 7B3059EF2805AEA3332657A247B0D61BC252306D, git hash: f5fb292ae0cc37a2f2f4bbdb10b21328ee363eae, architecture: x86_64) (from thread 759) Received signal 11
2025.12.01 14:01:59.172154 [ 30 ] {} <Fatal> BaseDaemon: Signal description: Segmentation fault
2025.12.01 14:01:59.172158 [ 30 ] {} <Fatal> BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
2025.12.01 14:01:59.172160 [ 30 ] {} <Fatal> BaseDaemon: Stack trace: 0x00000000199d3cad 0x0000000019a1da9f 0x0000000016f155c4 0x0000000016f14b03
2025.12.01 14:01:59.172163 [ 30 ] {} <Fatal> BaseDaemon: ########################################
2025.12.01 14:01:59.172220 [ 30 ] {} <Fatal> BaseDaemon: (version 25.8.9.20496.altinityantalya (altinity build), build id: 7B3059EF2805AEA3332657A247B0D61BC252306D, git hash: f5fb292ae0cc37a2f2f4bbdb10b21328ee363eae) (from thread 759) (query_id: 1de83611-3dfa-490a-aab1-b0d9c4ce0fd5) (query: SELECT count(), hostName() 
            FROM datalakecatalog_db_dfdfb603_ceb5_11f0_af75_e0c26496f172.`namespace_dfdfc02a_ceb5_11f0_88ee_e0c26496f172.table_dfdfc060_ceb5_11f0_af22_e0c26496f172` 
            WHERE NOT ignore(sleepEachRow(1)) 
            GROUP BY hostName()
            SETTINGS 
                object_storage_cluster='static_swarm_cluster', 
                max_threads=1
        
) Received signal Segmentation fault (11)
2025.12.01 14:01:59.172238 [ 30 ] {} <Fatal> BaseDaemon: Address: 0x8. Access: read. Address not mapped to object.
2025.12.01 14:01:59.172251 [ 30 ] {} <Fatal> BaseDaemon: Stack trace: 0x00000000199d3cad 0x0000000019a1da9f 0x0000000016f155c4 0x0000000016f14b03
2025.12.01 14:01:59.172292 [ 30 ] {} <Fatal> BaseDaemon: 2. DB::Connection::receivePacket() @ 0x00000000199d3cad
2025.12.01 14:01:59.172316 [ 30 ] {} <Fatal> BaseDaemon: 3. DB::MultiplexedConnections::receivePacketUnlocked(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>) @ 0x0000000019a1da9f
2025.12.01 14:01:59.172336 [ 30 ] {} <Fatal> BaseDaemon: 4. DB::RemoteQueryExecutorReadContext::Task::run(std::function<void (int, Poco::Timespan, DB::AsyncEventTimeoutType, String const&, unsigned int)>, std::function<void ()>) @ 0x0000000016f155c4
2025.12.01 14:01:59.172353 [ 30 ] {} <Fatal> BaseDaemon: 5. void boost::context::detail::fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, FiberStack&, Fiber::RoutineImpl<DB::AsyncTaskExecutor::Routine>>>(boost::context::detail::transfer_t) @ 0x0000000016f14b03
2025.12.01 14:01:59.321705 [ 30 ] {} <Fatal> BaseDaemon: Integrity check of the executable successfully passed (checksum: C4DC71C257DFD69671A014F759569F7D)
2025.12.01 14:01:59.321894 [ 30 ] {} <Fatal> BaseDaemon: Report this error to https://github.com/Altinity/ClickHouse/issues
2025.12.01 14:01:59.322044 [ 30 ] {} <Fatal> BaseDaemon: Changed settings: max_threads = 1, use_uncompressed_cache = false, load_balancing = 'random', max_memory_usage = 10000000000, parallel_replicas_for_cluster_engines = false, object_storage_cluster = 'static_swarm_cluster'

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions