Currently, Druid has a rough time when there is a long query that prevents from other shorter queries to be processed. In order to mitigate the effect of long queries, I'd like to enable Spark to serve as the execution engine for druid queries. I do not have a concrete (or set) design of what I would try to implement, but here are the two approaches I am considering:
- Make Spark query historical nodes and merge the results like a broker does.
- Make Spark download segments from S3 and do all the computations by itself.
Links related to this issue/proposal:
https://groups.google.com/forum/#!searchin/druid-development/spark/druid-development/ULdKYZeven4/NIM5ySTbAQAJ
https://groups.google.com/forum/#!msg/druid-development/Jp9Lv0mGFlg/f9m1U9MBBAAJ
Related PRs:
Currently, Druid has a rough time when there is a long query that prevents from other shorter queries to be processed. In order to mitigate the effect of long queries, I'd like to enable Spark to serve as the execution engine for druid queries. I do not have a concrete (or set) design of what I would try to implement, but here are the two approaches I am considering:
Links related to this issue/proposal:
https://groups.google.com/forum/#!searchin/druid-development/spark/druid-development/ULdKYZeven4/NIM5ySTbAQAJ
https://groups.google.com/forum/#!msg/druid-development/Jp9Lv0mGFlg/f9m1U9MBBAAJ
Related PRs: