Skip to content

[Proposal] Allow Spark to act as the execution engine for druid queries #2330

@jisookim0513

Description

@jisookim0513

Currently, Druid has a rough time when there is a long query that prevents from other shorter queries to be processed. In order to mitigate the effect of long queries, I'd like to enable Spark to serve as the execution engine for druid queries. I do not have a concrete (or set) design of what I would try to implement, but here are the two approaches I am considering:

  1. Make Spark query historical nodes and merge the results like a broker does.
  2. Make Spark download segments from S3 and do all the computations by itself.

Links related to this issue/proposal:

https://groups.google.com/forum/#!searchin/druid-development/spark/druid-development/ULdKYZeven4/NIM5ySTbAQAJ

https://groups.google.com/forum/#!msg/druid-development/Jp9Lv0mGFlg/f9m1U9MBBAAJ

Related PRs:

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions