Skip to content

Timeout not honored #1415

@drcrallen

Description

@drcrallen

Currently If I specify a timeout query context when querying a broker, that timeout is not honored. It is entirely possible to set a timeout of 90s and have the broker hog resources for the query for 15m while it tries to complete the request.

The expected behavior is that the broker terminates the returning data stream once timeout has expired, and also that it cancels any outstanding query efforts on the historicals related to the query.

The current state of timeouts seems to be enforced on a per-segment basis, and not a per-query basis. Meaning that if historical nodes take an abnormally long time on queries, the overall query can take a very long time compared to the timeout.

This stems from 2 things:

  1. Lack of checking in QueryResource for commands taking too long. (see Add basic timeout functionality to QueryResource #1416 )
  2. Lack of ability to interrupt streaming results properly.

Part of this comes from the assumption stated in io.druid.query.AsyncQueryRunner::run

            //Note: this is assumed that baseRunner does most of the work eagerly on call to the
            //run() method and resulting sequence accumulate/yield is fast.

because the a lot of the query workload is actually lazily done in yield.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions