Skip to content

Dart: Async queries to realtime servers.#18241

Merged
abhishekagarwal87 merged 4 commits intoapache:masterfrom
gianm:msq-realtime-nonblock
Jul 16, 2025
Merged

Dart: Async queries to realtime servers.#18241
abhishekagarwal87 merged 4 commits intoapache:masterfrom
gianm:msq-realtime-nonblock

Conversation

@gianm
Copy link
Copy Markdown
Contributor

@gianm gianm commented Jul 12, 2025

Prior to this patch, queries from MSQ workers to realtime servers would be initiated in the processing pool, and would block the processing pool until results started coming in. This patch addresses it with this strategy:

  1. Update DataServerClient to return a future that resolves when the response starts being written.

  2. Split DataServerQueryHandler into DartDataServerQueryHandler and IndexerDataServerQueryHandler. The Dart version doesn't do retries and doesn't follow segments to other data servers. It just returns the async future from DataServerClient. The Indexer (tasks) version retains the prior logic and isn't really async. I didn't attempt to asyncify its retry logic in this patch. This is hopefully a situation that will be improved in the future, ideally by having realtime servers participate in MSQ queries as workers themselves.

  3. Add ReturnOrAwait.awaitAllFutures, which allows processors to wait for a future to resolve.

  4. Update ScanQueryFrameProcessor and GroupByPreShuffleFrameProcessor to give up the processing thread when waiting for a data server query to come back.

Additionally, to simplify DataServerClient, cancellations are now issued without using a scheduled executor. There should be no need for this, because the service client is async.

Tagged this as a bug because without this change, using Dart on Historicals would lead to the processing pool getting jammed up if there is realtime data in play. It seems serious enough to be classified as a bug.

Prior to this patch, queries from MSQ workers to data servers would
be initiated in the processing pool, and would block the processing
pool until results started coming in. This patch addresses it with
the strategy:

1) Update DataServerClient to return a future that resolves when the
   response starts being written.

2) Split DataServerQueryHandler into DartDataServerQueryHandler and
   IndexerDataServerQueryHandler. The Dart version doesn't do retries
   and doesn't follow segments to other data servers. It just returns
   the async future from DataServerClient. The Indexer (tasks) version
   retains the prior logic and isn't really async. I didn't attempt to
   asyncify its retry logic in this patch.

3) Add ReturnOrAwait.awaitAllFutures, which allows processors to wait
   for a future to resolve.

4) Update ScanQueryFrameProcessor and GroupByPreShuffleFrameProcessor
   to give up the processing thread when waiting for a data server query
   to come back.

Additionally, to simplify DataServerClient, cancellations are now
issued without using a scheduled executor. There should be no need for
this, because the service client is async.
@gianm gianm added this to the 34.0.0 milestone Jul 12, 2025
@gianm gianm added the Bug label Jul 12, 2025
@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Jul 12, 2025
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 👍 (I didn't look at IndexerDataServerQueryHandler because i think it was all the code that was previously in DataServerQueryHandler)

@abhishekagarwal87 abhishekagarwal87 merged commit 01dd8d1 into apache:master Jul 16, 2025
76 checks passed
@gianm gianm deleted the msq-realtime-nonblock branch July 16, 2025 20:57
@gianm
Copy link
Copy Markdown
Contributor Author

gianm commented Jul 16, 2025

lgtm 👍 (I didn't look at IndexerDataServerQueryHandler because i think it was all the code that was previously in DataServerQueryHandler)

Yes, it was just moved from DataServerQueryHandler. Thanks for taking a look.

capistrant pushed a commit to capistrant/incubator-druid that referenced this pull request Jul 17, 2025
* Dart: Async queries to realtime servers.

Prior to this patch, queries from MSQ workers to data servers would
be initiated in the processing pool, and would block the processing
pool until results started coming in. This patch addresses it with
the strategy:

1) Update DataServerClient to return a future that resolves when the
   response starts being written.

2) Split DataServerQueryHandler into DartDataServerQueryHandler and
   IndexerDataServerQueryHandler. The Dart version doesn't do retries
   and doesn't follow segments to other data servers. It just returns
   the async future from DataServerClient. The Indexer (tasks) version
   retains the prior logic and isn't really async. I didn't attempt to
   asyncify its retry logic in this patch.

3) Add ReturnOrAwait.awaitAllFutures, which allows processors to wait
   for a future to resolve.

4) Update ScanQueryFrameProcessor and GroupByPreShuffleFrameProcessor
   to give up the processing thread when waiting for a data server query
   to come back.

Additionally, to simplify DataServerClient, cancellations are now
issued without using a scheduled executor. There should be no need for
this, because the service client is async.

* Fix tests and checkstyle.

* Fix exception checking.
capistrant added a commit that referenced this pull request Jul 18, 2025
* Dart: Async queries to realtime servers.

Prior to this patch, queries from MSQ workers to data servers would
be initiated in the processing pool, and would block the processing
pool until results started coming in. This patch addresses it with
the strategy:

1) Update DataServerClient to return a future that resolves when the
   response starts being written.

2) Split DataServerQueryHandler into DartDataServerQueryHandler and
   IndexerDataServerQueryHandler. The Dart version doesn't do retries
   and doesn't follow segments to other data servers. It just returns
   the async future from DataServerClient. The Indexer (tasks) version
   retains the prior logic and isn't really async. I didn't attempt to
   asyncify its retry logic in this patch.

3) Add ReturnOrAwait.awaitAllFutures, which allows processors to wait
   for a future to resolve.

4) Update ScanQueryFrameProcessor and GroupByPreShuffleFrameProcessor
   to give up the processing thread when waiting for a data server query
   to come back.

Additionally, to simplify DataServerClient, cancellations are now
issued without using a scheduled executor. There should be no need for
this, because the service client is async.

* Fix tests and checkstyle.

* Fix exception checking.

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
ashibhardwaj pushed a commit to ashibhardwaj/druid that referenced this pull request Jul 23, 2025
* Dart: Async queries to realtime servers.

Prior to this patch, queries from MSQ workers to data servers would
be initiated in the processing pool, and would block the processing
pool until results started coming in. This patch addresses it with
the strategy:

1) Update DataServerClient to return a future that resolves when the
   response starts being written.

2) Split DataServerQueryHandler into DartDataServerQueryHandler and
   IndexerDataServerQueryHandler. The Dart version doesn't do retries
   and doesn't follow segments to other data servers. It just returns
   the async future from DataServerClient. The Indexer (tasks) version
   retains the prior logic and isn't really async. I didn't attempt to
   asyncify its retry logic in this patch.

3) Add ReturnOrAwait.awaitAllFutures, which allows processors to wait
   for a future to resolve.

4) Update ScanQueryFrameProcessor and GroupByPreShuffleFrameProcessor
   to give up the processing thread when waiting for a data server query
   to come back.

Additionally, to simplify DataServerClient, cancellations are now
issued without using a scheduled executor. There should be no need for
this, because the service client is async.

* Fix tests and checkstyle.

* Fix exception checking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants