[Bug] fragment mgr thread pool stuck on VTabletSlink::close_wait，bthread worker also stuck on it

### Search before asking

- [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues.


### Version

master

### What's Wrong?

# Here are two problem
## one:
- Problem phenomenon: 
all thread in fragment thread pool in `VNodeChannel::close_wait` call `std::this_thread::sleep_for` to wait for finished.

- Problem analysis:
In function `FragmentMgr::_exec_actual` will call `exec_state->execute()`, but not process its return status, if `_executor.open()` failed, there is no place to call cancel actively, and in function `FragmentMgr::_exec_actual` after `exec_state->execute()` erase the `fragment_instance_id` from `_fragment_map`, leads to also cannot cancel through timeout, when deconstruction for exec_state, will call `VNodeChannel::close_wait`, and in this function call `std::this_thread::sleep_for` to wait for finished, but the variable `_add_batches_finished` and `_cancelled` value is always false because of executor open failed and not cancel, the thread will hang.

not process error return status, and erase fragment instance id from map directly:
```
void FragmentMgr::_exec_actual(std::shared_ptr<FragmentExecState> exec_state, FinishCallback cb) {
...
    exec_state->execute();

...

    // remove exec state after this fragment finished
    {
        std::lock_guard<std::mutex> lock(_lock);
        _fragment_map.erase(exec_state->fragment_instance_id());
...
    }

...
}
```

not process error return status, only print warning log:
```
Status FragmentExecState::execute() {
...

    {
...

        WARN_IF_ERROR(_executor.open(),
                      strings::Substitute("Got error while opening fragment $0, query id: $1",
                                          print_id(_fragment_instance_id), print_id(_query_id)));

...
    }

...
    return Status::OK();
}
```

```
Status VNodeChannel::close_wait(RuntimeState* state) {
...

    // waiting for finished, it may take a long time, so we couldn't set a timeout
    while (!_add_batches_finished && !_cancelled) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }

...
}
```


## two:
- Problem phenomenon:
bthread workers are exhausted, leads to BE cannot receive new rpc requests, FE send rpc timeout.

- Problem analysis:
When the brpc request reaches BE `FragmentMgr::exec_plan_fragment`, if the pthread pool is full, submit thread pool failed, will need bthread to destruct the local variable `std::shared_ptr<FragmentExecState> exec_state`, and then `VNodeChannel::close_wait` will be called, but in `VNodeChannel::close_wait` call `std::this_thread::sleep_for` to wait finish, when the variable `_add_batches_finished` and `_cancelled` value is always false, the bthread cannot switch out in time, which leads to the exhaustion of the bthread worker, and finally leads to BE cannot receive new rpc requests.

```
Status VNodeChannel::close_wait(RuntimeState* state) {
...

    // waiting for finished, it may take a long time, so we couldn't set a timeout
    while (!_add_batches_finished && !_cancelled) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
    }

...
}
```

### What You Expected?

## For problem one:
process the return status for `exec_state->execute()` in function `FragmentMgr::_exec_actual`

## For problem two:
- define member variable `_cancelled` in FragmentExecState as atomic
- use use bthread_usleep instead of std::this_thread::sleep_for in function `VNodeChannel::close_wait`

### How to Reproduce?

_No response_

### Anything Else?

_No response_

### Are you willing to submit PR?

- [X] Yes I am willing to submit a PR!

### Code of Conduct

- [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] fragment mgr thread pool stuck on VTabletSlink::close_wait，bthread worker also stuck on it #16606

Search before asking

Version

What's Wrong?

Here are two problem

one:

two:

What You Expected?

For problem one:

For problem two:

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] fragment mgr thread pool stuck on VTabletSlink::close_wait，bthread worker also stuck on it #16606

Description

Search before asking

Version

What's Wrong?

Here are two problem

one:

two:

What You Expected?

For problem one:

For problem two:

How to Reproduce?

Anything Else?

Are you willing to submit PR?

Code of Conduct

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions