Skip to content

[Bug] too many "send fragment timeout. backend id" error #8942

@morningman

Description

@morningman

Search before asking

  • I had searched in the issues and found no similar issues.

Version

trunk

What's Wrong?

In some high concurrency scenarios, a large number of "send fragment timeout. backend id xxx" errors may appear in the fe.log. And all subsequent requests keep reporting this error and cannot be recovered.

What You Expected?

This is because in high load scenarios, the execution thread pool on the BE side is full, and new requests will enter the waiting queue of the thread pool.
However, the timeout period of the plan fragment request sent by FE to BE is only 5 seconds. The waiting time for requests entering the waiting queue may be very long, resulting in a large number of rpc errors send fragment timeout.

And subsequent requests will continue to enter the waiting queue, causing all subsequent requests to time out.

How to Reproduce?

Run a complex queries with high concurrency.

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions