Skip to content

[Bug] fragment mgr thread pool stuck on VTabletSlink::close_wait,bthread worker also stuck on it #16606

@Tanya-W

Description

@Tanya-W

Search before asking

  • I had searched in the issues and found no similar issues.

Version

master

What's Wrong?

Here are two problem

one:

  • Problem phenomenon:
    all thread in fragment thread pool in VNodeChannel::close_wait call std::this_thread::sleep_for to wait for finished.

  • Problem analysis:
    In function FragmentMgr::_exec_actual will call exec_state->execute(), but not process its return status, if _executor.open() failed, there is no place to call cancel actively, and in function FragmentMgr::_exec_actual after exec_state->execute() erase the fragment_instance_id from _fragment_map, leads to also cannot cancel through timeout, when deconstruction for exec_state, will call VNodeChannel::close_wait, and in this function call std::this_thread::sleep_for to wait for finished, but the variable _add_batches_finished and _cancelled value is always false because of executor open failed and not cancel, the thread will hang.

not process error return status, and erase fragment instance id from map directly:

void FragmentMgr::_exec_actual(std::shared_ptr<FragmentExecState> exec_state, FinishCallback cb) {
...
    exec_state->execute();

...

    // remove exec state after this fragment finished
    {
        std::lock_guard<std::mutex> lock(_lock);
        _fragment_map.erase(exec_state->fragment_instance_id());
...
    }

...
}

not process error return status, only print warning log:

Status FragmentExecState::execute() {
...

    {
...

        WARN_IF_ERROR(_executor.open(),
                      strings::Substitute("Got error while opening fragment $0, query id: $1",
                                          print_id(_fragment_instance_id), print_id(_query_id)));

...
    }

...
    return Status::OK();
}
Status VNodeChannel::close_wait(RuntimeState* state) {
...

    // waiting for finished, it may take a long time, so we couldn't set a timeout
    while (!_add_batches_finished && !_cancelled) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }

...
}

two:

  • Problem phenomenon:
    bthread workers are exhausted, leads to BE cannot receive new rpc requests, FE send rpc timeout.

  • Problem analysis:
    When the brpc request reaches BE FragmentMgr::exec_plan_fragment, if the pthread pool is full, submit thread pool failed, will need bthread to destruct the local variable std::shared_ptr<FragmentExecState> exec_state, and then VNodeChannel::close_wait will be called, but in VNodeChannel::close_wait call std::this_thread::sleep_for to wait finish, when the variable _add_batches_finished and _cancelled value is always false, the bthread cannot switch out in time, which leads to the exhaustion of the bthread worker, and finally leads to BE cannot receive new rpc requests.

Status VNodeChannel::close_wait(RuntimeState* state) {
...

    // waiting for finished, it may take a long time, so we couldn't set a timeout
    while (!_add_batches_finished && !_cancelled) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }

...
}

What You Expected?

For problem one:

process the return status for exec_state->execute() in function FragmentMgr::_exec_actual

For problem two:

  • define member variable _cancelled in FragmentExecState as atomic
  • use use bthread_usleep instead of std::this_thread::sleep_for in function VNodeChannel::close_wait

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions