Skip to content

Bitmap过大导致Be宕掉 #5849

@xqinghu

Description

@xqinghu

Describe the bug
导入数据,某一个ad_channel_id的pv大约1.6亿, report_test1查询全部BE会宕掉,report_test2 compaction会宕

version: https://github.com/baidu-doris/incubator-doris/releases/tag/DORIS-0.13.15-release

report_test1:
SELECT ad_channel_id,bitmap_union_count(pv) FROM report_test1 WHERE ad_channel_id=111 GROUP BY ad_channel_id

表结构:
CREATE TABLEreport_test1(ad_channel_id int(11) NOT NULL COMMENT "",aid bitint(20) NOT NULL COMMENT "",n bigint(20) SUM NULL DEFAULT "0" COMMENT "",pv bitmap BITMAP_UNION NOT NULL COMMENT "" ) ENGINE=OLAP AGGREGATE KEY(ad_channel_id,aid) COMMENT "OLAP" DISTRIBUTED BY HASH(aid`) BUCKETS 32;

CREATE TABLE report_test2 (
ad_channel_id int(11) NOT NULL COMMENT "",
n bigint(20) SUM NULL DEFAULT "0" COMMENT "",
pv bitmap BITMAP_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(ad_channel_id)
COMMENT "OLAP"
DISTRIBUTED BY HASH(ad_channel_id) BUCKETS 32;
`

查询宕掉错误日志:
PC: @ 0x7fa43730b866 __memcpy_ssse3_back *** SIGSEGV (@0x0) received by PID 22647 (TID 0x7fa3ed56e700) from PID 0; stack trace: *** @ 0x1bb6aa1 google::(anonymous namespace)::FailureSignalHandler() @ 0x7fa437eb65d0 (unknown) @ 0x7fa43730b866 __memcpy_ssse3_back @ 0x2127b3f array_container_clone @ 0x2122e78 ra_copy @ 0xf2934f Roaring::Roaring() @ 0xf298e9 doris::BitmapValue::write() @ 0xf26a2c doris::BitmapFunctions::bitmap_serialize() @ 0x16d530f doris::NewAggFnEvaluator::SerializeOrFinalize() @ 0x164a495 doris::PartitionedAggregationNode::GetOutputTuple() @ 0x1650600 doris::PartitionedAggregationNode::GetRowsFromPartition() @ 0x1650ac6 doris::PartitionedAggregationNode::GetNextInternal() @ 0x1650c3f doris::PartitionedAggregationNode::get_next() @ 0x11cdcb9 doris::PlanFragmentExecutor::get_next_internal() @ 0x11ce778 doris::PlanFragmentExecutor::open_internal() @ 0x11ceeff doris::PlanFragmentExecutor::open() @ 0x115791e doris::FragmentExecState::execute() @ 0x115a556 doris::FragmentMgr::_exec_actual() @ 0x115fa1d std::_Function_handler<>::_M_invoke() @ 0x12a1f42 doris::ThreadPool::dispatch_thread() @ 0x129bc65 doris::Thread::supervise_thread() @ 0x7fa437eaedd5 start_thread @ 0x7fa4372b402d __clone

查询宕掉Core堆栈:
[New LWP 28096] [New LWP 28261] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by/usr/local/doris/be/lib/palo_be'.
Program terminated with signal 6, Aborted.
#0 0x00007f232aa7f2c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-39.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f232aa7f2c7 in raise () from /lib64/libc.so.6
#1 0x00007f232aa809b8 in abort () from /lib64/libc.so.6
#2 0x00007f232aa780e6 in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f232aa78192 in __assert_fail () from /lib64/libc.so.6
#4 0x0000000002129da5 in container_clone ()
#5 0x0000000002122e78 in ra_copy ()
#6 0x0000000000f2934f in Roaring::Roaring (this=0x7f22e0768eb8, r=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/thirdparty/installed/include/roaring/roaring.h:52
#7 0x0000000000f298e9 in pair<unsigned int const, Roaring> (__p=..., this=0x7f22e0768eb0) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:596
#8 for_each<std::_Rb_tree_const_iterator<std::pair<unsigned int const, Roaring> >, doris::detail::Roaring64Map::write(char*) const::<lambda(const std::pair<unsigned int, Roaring>&)> > (__f=..., __last=..., __first=...)
at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_algo.h:3882
#9 write (buf=0x1004ab2b "", this=0x10e9a508) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:603
#10 doris::BitmapValue::write (this=0x10e9a500, dst=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:1158
#11 0x0000000000f26a2c in serialize (value=0x10e9a500, ctx=0x12aa9390) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/bitmap_function.cpp:160
#12 doris::BitmapFunctions::bitmap_serialize (ctx=0x12aa9390, src=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/bitmap_function.cpp:380
#13 0x00000000016d530f in doris::NewAggFnEvaluator::SerializeOrFinalize (this=, src=src@entry=0x2ff42c000, dst_slot_desc=..., dst=dst@entry=0x2ff42c000, fn=)
at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/new_agg_fn_evaluator.cc:616
#14 0x000000000164a495 in Serialize (tuple=0x2ff42c000, this=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/agg_fn.h:118
#15 Serialize (dst=, evals=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exprs/new_agg_fn_evaluator.h:310
#16 doris::PartitionedAggregationNode::GetOutputTuple (this=this@entry=0x11c35600, agg_fn_evals=..., tuple=0x2ff42c000, pool=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exec/partitioned_aggregation_node.cc:1048
#17 0x0000000001650600 in doris::PartitionedAggregationNode::GetRowsFromPartition(doris::RuntimeState*, doris::RowBatch*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/row_batch.h:236
#18 0x0000000001650ac6 in doris::PartitionedAggregationNode::GetNextInternal(doris::RuntimeState*, doris::RowBatch*, bool*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exec/partitioned_aggregation_node.cc:429
#19 0x0000000001650c3f in doris::PartitionedAggregationNode::get_next(doris::RuntimeState*, doris::RowBatch*, bool*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/exec/partitioned_aggregation_node.cc:349
#20 0x00000000011cdcb9 in doris::PlanFragmentExecutor::get_next_internal(doris::RowBatch**) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:476
#21 0x00000000011ce778 in doris::PlanFragmentExecutor::open_internal() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:287
#22 0x00000000011ceeff in doris::PlanFragmentExecutor::open() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:253
#23 0x000000000115791e in doris::FragmentExecState::execute() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/fragment_mgr.cpp:219
#24 0x000000000115a556 in doris::FragmentMgr::_exec_actual(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/fragment_mgr.cpp:422
#25 0x000000000115fa1d in __invoke_impl<void, void (doris::FragmentMgr::&)(std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>), doris::FragmentMgr*&, std::shared_ptrdoris::FragmentExecState&, std::function<void(doris::PlanFragmentExecutor*)>&> (__t=@0x10e9b490: 0xcdffe00, __f=
@0x10e9b450: (void (doris::FragmentMgr::)(doris::FragmentMgr * const, std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>)) 0x115a530 <doris::FragmentMgr::_exec_actual(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>)>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/atomicity.h:96
#26 __invoke<void (doris::FragmentMgr::&)(std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>), doris::FragmentMgr*&, std::shared_ptrdoris::FragmentExecState&, std::function<void(doris::PlanFragmentExecutor*)>&> (__fn=
@0x10e9b450: (void (doris::FragmentMgr::)(doris::FragmentMgr * const, std::shared_ptrdoris::FragmentExecState, std::function<void(doris::PlanFragmentExecutor)>)) 0x115a530 <doris::FragmentMgr::_exec_actual(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>)>) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/invoke.h:95
#27 __call<void, 0, 1, 2> (__args=..., this=0x10e9b450) at /opt/rh/devtoolset-8/root/usr/include/c++/8/functional:565
#28 operator()<> (this=0x10e9b450) at /opt/rh/devtoolset-8/root/usr/include/c++/8/functional:651
#29 std::_Function_handler<void (), std::_Bind_result<void, void (doris::FragmentMgr::(doris::FragmentMgr, std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>))(std::shared_ptrdoris::FragmentExecState, std::function<void (doris::PlanFragmentExecutor*)>)> >::_M_invoke(std::_Any_data const&) (__functor=...) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:297
#30 0x00000000012a1f42 in operator() (this=0x11b59b98) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#31 run (this=0x11b59b90) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:41
#32 doris::ThreadPool::dispatch_thread() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:545
#33 0x000000000129bc65 in operator() (this=0x6bd7ed8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#34 doris::Thread::supervise_thread(void*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/thread.cpp:386
#35 0x00007f232b741dd5 in start_thread () from /lib64/libpthread.so.0
#36 0x00007f232ab4702d in clone () from /lib64/libc.so.6
(gdb) f 20
#20 0x00000000011cdcb9 in doris::PlanFragmentExecutor::get_next_internal(doris::RowBatch**) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp:476
476 /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/runtime/plan_fragment_executor.cpp: No such file or directory.
(gdb)
`

Compaction宕掉错误日志:
`
tcmalloc: large alloc 1413791744 bytes == 0x1539ac000 @ 0x26391f0 0x27a31a4 0x1088384 0x1737151 0x173b580 0x173bec0 0x173bf8f 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
tcmalloc: large alloc 2147483648 bytes == 0x1a7df8000 @ 0x26391f0 0x27a3994 0x27a3d3c 0x1202951 0x11b9be6 0x1142856 0x109e8be 0x173c06c 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
tcmalloc: large alloc 2153267200 bytes == 0x267556000 @ 0x26391f0 0x27a31a4 0x1088384 0x1737151 0x173b580 0x173bec0 0x173bf8f 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
tcmalloc: large alloc 4294967296 bytes == 0x3182a2000 @ 0x26391f0 0x27a3994 0x27a3d3c 0x1202951 0x11b9be6 0x1142856 0x109e8be 0x173c06c 0x170c3fa 0x1711134 0x10cbad9 0x1052f87 0x1053093 0x1053480 0x1036b9e 0x103a705 0x10268b7 0x1018a7f 0x1019d9c 0x101ac8b 0x1016dd2 0xfaac9f 0xf8859c 0x12a1f42 0x129bc65 0x7f43ec6a1dd5
I failed to find one of the right cookies. Found 16
terminate called after throwing an instance of 'std::runtime_error'
what(): failed alloc while reading
*** Aborted at 1621478894 (unix time) try "date -d @1621478894" if you are using GNU date ***
PC: @ 0x7f43eb9df2c7 __GI_raise
*** SIGABRT (@0x5756) received by PID 22358 (TID 0x7f4391d4e700) from PID 22358; stack trace: ***
@ 0x1bb6aa1 google::(anonymous namespace)::FailureSignalHandler()
@ 0x7f43ec6a95d0 (unknown)
@ 0x7f43eb9df2c7 __GI_raise
@ 0x7f43eb9e09b8 __GI_abort
@ 0xd1639e __gnu_cxx::__verbose_terminate_handler()
@ 0x2709126 __cxxabiv1::__terminate()
@ 0x2709161 std::terminate()
@ 0x2707f83 __cxa_throw
@ 0x1051f91 doris::AggregateFuncTraits<>::update()
@ 0x1035b17 doris::Reader::_agg_key_next_row()
@ 0x1026a33 doris::Merger::merge_rowsets()
@ 0x1018a7f doris::Compaction::do_compaction_impl()
@ 0x1019d9c doris::Compaction::do_compaction()
@ 0x101ac8b doris::CumulativeCompaction::execute_compact_impl()
@ 0x1016dd2 doris::Compaction::execute_compact()
@ 0xfaac9f doris::Tablet::execute_compaction()
@ 0xf8859c _ZNSt17_Function_handlerIFvvEZN5doris13StorageEngine35_compaction_tasks_producer_callbackEvEUlvE0_E9_M_invokeERKSt9_Any_data
@ 0x12a1f42 doris::ThreadPool::dispatch_thread()
@ 0x129bc65 doris::Thread::supervise_thread()
@ 0x7f43ec6a1dd5 start_thread
@ 0x7f43ebaa702d __clone

`

Compaction宕掉Core堆栈:
[New LWP 12100] [New LWP 12099] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by/usr/local/doris/be/lib/palo_be'.
Program terminated with signal 6, Aborted.
#0 0x00007f485474e2c7 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-260.el7_6.6.x86_64 libgcc-4.8.5-39.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x00007f485474e2c7 in raise () from /lib64/libc.so.6
#1 0x00007f485474f9b8 in abort () from /lib64/libc.so.6
#2 0x0000000000d1639e in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] ()
#3 0x0000000002709126 in __cxxabiv1::__terminate(void ()()) ()
#4 0x0000000002709161 in std::terminate() ()
#5 0x0000000002707f83 in __cxa_throw ()
#6 0x0000000001051f91 in read (portable=true, buf=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/thirdparty/installed/include/roaring/roaring.hh:455
#7 read (buf=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:636
#8 deserialize (src=, this=0x7f47fb34ae10) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:1182
#9 deserialize (src=, this=0x7f47fb34ae10) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:1165
#10 BitmapValue (src=, this=0x7f47fb34ae10) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/bitmap_value.h:969
#11 doris::AggregateFuncTraits<(doris::FieldAggregationMethod)7, (doris::FieldType)25>::update (dst=, src=..., mem_pool=) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/aggregate_func.h:528
#12 0x0000000001035b17 in update (this=, mem_pool=0x0, src=..., dst=0x7f47fb34aed8) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/aggregate_func.h:62
#13 agg_update (this=, mem_pool=0x0, src=..., dest=0x7f47fb34aed8) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/field.h:72
#14 agg_update_row<doris::RowCursor, doris::RowCursor> (src=..., dst=0x7f47fb34afb0, cids=...) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/row.h:185
#15 doris::Reader::_agg_key_next_row(doris::RowCursor
, doris::MemPool*, doris::ObjectPool*, bool*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/reader.cpp:224
#16 0x0000000001026a33 in next_row_with_aggregation (eof=0x7f47fb34af5e, agg_pool=0x7f47fb34af90, mem_pool=0x25ad49d80, row_cursor=0x7f47fb34afb0, this=0x7f47fb34b0a0) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/reader.h:98
#17 doris::Merger::merge_rowsets(std::shared_ptrdoris::Tablet, doris::ReaderType, std::vector<std::shared_ptrdoris::RowsetReader, std::allocator<std::shared_ptrdoris::RowsetReader > > const&, doris::RowsetWriter*, doris::Merger::Statistics*) ()
at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/merger.cpp:60
#18 0x0000000001018a7f in doris::Compaction::do_compaction_impl(long) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/atomicity.h:96
#19 0x0000000001019d9c in doris::Compaction::do_compaction (this=this@entry=0x8f285b550, permits=5) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/compaction.cpp:58
#20 0x000000000101194c in doris::BaseCompaction::execute_compact_impl() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/base_compaction.cpp:69
#21 0x0000000001016dd2 in doris::Compaction::execute_compact (this=0x8f285b550) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/compaction.cpp:47
#22 0x0000000000faa9a1 in doris::Tablet::execute_compaction(doris::CompactionType) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/tablet.cpp:1417
#23 0x0000000000f8859c in operator() (__closure=0xc5a9fc150) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/olap/olap_server.cpp:364
#24 std::_Function_handler<void (), doris::StorageEngine::_compaction_tasks_producer_callback()::{lambda()#2}>::_M_invoke(std::_Any_data const&) () at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:297
#25 0x00000000012a1f42 in operator() (this=0x25ad49698) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#26 run (this=0x25ad49690) at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:41
#27 doris::ThreadPool::dispatch_thread() () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/threadpool.cpp:545
#28 0x000000000129bc65 in operator() (this=0xedd09d8) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/std_function.h:260
#29 doris::Thread::supervise_thread(void*) () at /data/tmp/incubator-doris-DORIS-0.13.15-release/be/src/util/thread.cpp:386
#30 0x00007f4855410dd5 in start_thread () from /lib64/libpthread.so.0
#31 0x00007f485481602d in clone () from /lib64/libc.so.6
`

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions