Skip to content

Conversation

@vagetablechicken
Copy link
Contributor

@vagetablechicken vagetablechicken commented Jun 18, 2020

If NodeChannel is internal cancelled, it may has some bytes in _cur_batch or _pending_batches. And it's possible that we close these internal cancelled NodeChannels by mark_close() & close(), cuz the OlapTableSink may be succeed and closed with Status::OK().

  1. In TEST_F(OlapTableSinkTest, add_batch_failed), make ObjectPool created before RuntimeState, make NodeChannels has not empty batches, to simulate this scenario.
  2. UT test couldn't detect it in normal build type, cuz the memory block usually don't be modified after free. We can use ASAN to detect use-after-free errors.
  3. add a temp function for clear all batches in NodeChannel. It should be removed after making all MemTrackers shared(No worries about MemTracker dtor order).

Incidentally, fix bugs reported by ASAN/LSAN.

@vagetablechicken
Copy link
Contributor Author

vagetablechicken commented Jun 18, 2020

The detected error:

==11791==ERROR: AddressSanitizer: heap-use-after-free on address 0x61400000d4d0 at pc 0x00000173f25c bp 0x7ffc769fb940 sp 0x7ffc769fb938
READ of size 8 at 0x61400000d4d0 thread T0
I0618 02:30:07.237269 11803 socket.cpp:2260] Checking Socket{id=17179869185 addr=127.0.0.1:4357} (0x631000014a00)
    #0 0x173f25b in doris::MemTracker::release(long) /root/git/incubator-doris/be/src/runtime/mem_tracker.h:222
    #1 0x173770f in doris::MemPool::free_all() /root/git/incubator-doris/be/src/runtime/mem_pool.cpp:78
    #2 0x177cfe4 in doris::RowBatch::clear() /root/git/incubator-doris/be/src/runtime/row_batch.cpp:276
    #3 0x177d8c5 in doris::RowBatch::~RowBatch() /root/git/incubator-doris/be/src/runtime/row_batch.cpp:301
    #4 0x177d9de in doris::RowBatch::~RowBatch() /root/git/incubator-doris/be/src/runtime/row_batch.cpp:302
    #5 0x1365cea in std::default_delete<doris::RowBatch>::operator()(doris::RowBatch*) const /usr/include/c++/7.3.0/bits/unique_ptr.h:78
    #6 0x1356cd4 in std::unique_ptr<doris::RowBatch, std::default_delete<doris::RowBatch> >::~unique_ptr() /usr/include/c++/7.3.0/bits/unique_ptr.h:268
    #7 0x1325425 in doris::stream_load::NodeChannel::~NodeChannel() /root/git/incubator-doris/be/src/exec/tablet_sink.cpp:41
    #8 0x1391887 in doris::ObjectPool::SpecificElement<doris::stream_load::NodeChannel>::~SpecificElement() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x1391887)
    #9 0x13918e0 in doris::ObjectPool::SpecificElement<doris::stream_load::NodeChannel>::~SpecificElement() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x13918e0)
    #10 0x12af245 in doris::ObjectPool::clear() /root/git/incubator-doris/be/src/common/object_pool.h:54
    #11 0x12aefb8 in doris::ObjectPool::~ObjectPool() /root/git/incubator-doris/be/src/common/object_pool.h:37
    #12 0x12a1c9d in doris::stream_load::OlapTableSinkTest_add_batch_failed_Test::TestBody() /root/git/incubator-doris/be/test/exec/tablet_sink_test.cpp:790
    #13 0x368e3ae in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x368e3ae
)
    #14 0x36897ed in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x36897ed)
    #15 0x366f64f in testing::Test::Run() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x366f64f)
    #16 0x366fed1 in testing::TestInfo::Run() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x366fed1)
    #17 0x3670528 in testing::TestCase::Run() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x3670528)
    #18 0x367709c in testing::internal::UnitTestImpl::RunAllTests() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x367709c)
    #19 0x368f24e in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/root/git/incubator-
doris/be/ut_build/test/exec/tablet_sink_test+0x368f24e)
    #20 0x368a3e5 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) (/root/git/incubator-dor
is/be/ut_build/test/exec/tablet_sink_test+0x368a3e5)
    #21 0x3675d73 in testing::UnitTest::Run() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x3675d73)
    #22 0x12b997e in RUN_ALL_TESTS() (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x12b997e)
    #23 0x12aa24d in main /root/git/incubator-doris/be/test/exec/tablet_sink_test.cpp:983
    #24 0x7fca8a9e1504 in __libc_start_main (/lib64/libc.so.6+0x22504)
    #25 0x118adce  (/root/git/incubator-doris/be/ut_build/test/exec/tablet_sink_test+0x118adce)

report about heap-use-after-free address:

previously allocated by thread T0 here:
    #0 0x1245480 in operator new(unsigned long) ../../.././libsanitizer/asan/asan_new_delete.cc:80
    #1 0x179cb06 in doris::RuntimeState::init_mem_trackers(doris::TUniqueId const&) /root/git/incubator-doris/be/src/runtime/runtime_state.cpp:239

It's state->instance_mem_tracker().

The root cause is, RowBatch in NodeChannel use the deleted tracker parent(state->instance_mem_tracker()).
So unregister OlapTableSink tracker from parent can slove it. But MemTracker::unregister_from_parent() won't delete the parent ref in MemTracker. We'll make all MemTracker shared #3714. So in future, we don't need worry about the MemTracker destruction order or register/unregister, blabla.
Here, I don't choose to modify MemTracker, and add a temp func for clear all batches in NodeChannel.

@vagetablechicken vagetablechicken marked this pull request as ready for review June 18, 2020 10:18
@chaoyli
Copy link
Contributor

chaoyli commented Jun 19, 2020

The patch am not very understand. You meaning is _cur_batch and _pending_batches is not emtpy when destructor? So you clear_all_batches when destructor. @vagetablechicken

@vagetablechicken
Copy link
Contributor Author

The patch am not very understand. You meaning is _cur_batch and _pending_batches is not emtpy when destructor? So you clear_all_batches when destructor. @vagetablechicken

Yes, we should clear _cur_batch and _pending_batches before NodeChannel's destruction.(state->instance_mem_tracker() has already be deleted) So it can be clear at OlapTableSink's dtor.

@chaoyli
Copy link
Contributor

chaoyli commented Jun 19, 2020

The patch am not very understand. You meaning is _cur_batch and _pending_batches is not emtpy when destructor? So you clear_all_batches when destructor. @vagetablechicken

Yes, we should clear _cur_batch and _pending_batches before NodeChannel's destruction.(state->instance_mem_tracker() has already be deleted) So it can be clear at OlapTableSink's dtor.

I understand. I have on problem

@morningman morningman added the approved Indicates a PR has been approved by one committer. label Jun 19, 2020
@chaoyli chaoyli merged commit fdd65c5 into apache:master Jun 20, 2020
morningman pushed a commit to morningman/doris that referenced this pull request Jun 22, 2020
suxiaogang223 pushed a commit to suxiaogang223/doris that referenced this pull request Jul 15, 2025
…if alter version < 2 apache#49062 (apache#49921) (apache#3899)

Cherry-picked from apache#49062

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants