Skip to content

[Proposal] dump query processing performance metrics from various stages #3324

@himanshug

Description

@himanshug

While executing a query, Druid Broker and Historicals (and realtime tasks) publish very useful metrics like

at broker -
query/time
query/bytes
query/node/time
query/node/ttfb

at historical
query/time
query/bytes
query/segment/time
query/segment/wait
...

all the metrics contain queryId, host etc in the dimensions. so if Druid metrics were ingested in another Druid cluster, then users can understand where all the time for a query execution was spent. And, we do have a druid cluster (aka metrics-cluser) to debug performance issues.

However,

  1. some users do not have bandwidth to maintain another druid cluster for metrics and push aggregated metrics to monitoring systems like Graphite. With aggregation, it becomes difficult to understand performance issues for specific queryId.

  2. even with having a druid "metrics" cluster, it takes some time for metrics to get ingested to that cluster. sometimes we want to be able to do the debugging interactively, that is be able to send a query and see all the performance metrics in one place.
    introduce /druid/v3 query endpoint that gives query responseContext #3319 and WIP: optionally configure DirectDruidClient to use /druid/v3 instead of /druid/v2 #3323 enable the ability to have large responseContext from broker (and same accumulated from all historicals)

This proposal is to enable dumping the query performance metrics in the responseContext if query context contains a flag, "dumpPerformance".

With the flag, end user would see a responseContext like below( which would be very useful to debug query performance problems).....

{
    "result": [ .... ],
    "context": {
        ....
        "broker": {
            "query/time" : 783,
            "query/bytes": 1234,
            "historical1": {
                "query/node/ttfb": 124,
                "query/node/time": 567,
                "query/node/bytes": 3564
            },
            "historical2": {
                "query/node/ttfb": 379,
                "query/node/time": 685,
                "query/node/bytes": 5632
            },
        },
        "historical1": {
            "query/time": 554,
            "query/bytes": 3564,
            "segments": [
                "segment_id1": {
                     "query/segment/time": 324,
                     "query/wait/time": 87
                },
                "segment_id2": {
                     "query/segment/time": 314,
                     "query/wait/time": 79
                }
            ]
        },
        "historical2": { .... }
    }
}

Depends on #3319 and #3323

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions