Non querying tasks shouldn't use processing buffers / merge buffers by LakshSingla · Pull Request #16887 · apache/druid

LakshSingla · 2024-08-13T04:26:32Z

Description

Tasks that do not support querying or query processing i.e. supportsQueries = false do not require processing threads, processing buffers, and merge buffers.

The following tasks don't support queries -

Native batch ingestion tasks and subtasks
Native compaction tasks and subtasks
MSQ Controller tasks - MSQControllerTask
Other miscellaneous tasks - ArchiveTask, KillUnusedSegmentsTask, MoveTask, RestoreTask

Release note

Reduce the direct memory requirement on the non query processing tasks by not reserving the query buffers for those.

This PR has:

cryptoe

Changes LGTM

cryptoe · 2024-08-13T13:35:26Z

                        command.addSystemProperty("druid.indexer.task.tmpStorageBytesPerTask", storageSlot.getNumBytes());

+                        if (!task.supportsQueries()) {
+                          // Processing threads, processing buffers and merging buffers are not required on tasks which


Nit: could you add the same comment in k8 task adapter as well.

gianm · 2024-08-13T17:17:19Z

Is it possible to do the logic in the peon itself rather than in the runners? i.e., if a peon is launched with a task that doesn't support queries, it doesn't create a merge pool or processing pool? That way, each way of launching a peon wouldn't need to be aware of this.

LakshSingla · 2024-08-14T04:48:14Z

Is it possible to do the logic in the peon itself rather than in the runners

I looked at the following approaches but didn't find a suitable one:

We can't modify DruidProcessingConfig, since that's shared by historicals etc. as well, and Task.class that we require for checking if task.supportsQueries can't be added to the module (since Task.class resides in the indexing module, which isn't a dep of druid-server where the former class resides).
I was looking for a way to override the providers for the CliPeon, and while I can do that (checkout the last commit), I need to duplicate some of the DruidProcessingModule code too - in case the task supportsQueries. I didn't find a way to conditionally override the binding (since the Task instance isn't present when we add our bindings), or access the overridden bindings at the runtime before overriding them.

LMK if there's a way that I am missing. Otherwise, there's some duplication in the CliPeon code, which we'd need to keep in sync with the DruidProcessingModule

kfaraz · 2024-08-19T03:55:01Z

@LakshSingla , while I completely agree with not installing the DruidProcessingModule if it is not needed, I was wondering if installing it actually consumes any resources.
IIUC, the buffers are all provided as lazy singletons and would be initialized/allocated only if needed by the task.

kfaraz

Left some suggestions.

kfaraz · 2024-08-19T04:03:14Z

  {
    return ImmutableList.of(
-        new DruidProcessingModule(),
+        Modules.override(new DruidProcessingModule()).with(


Rather than doing Modules.override(), another option could be to write up a class TaskQueryProcessingModule extends DruidProcessingModule (or even just inline it here), where you could just call the super implementation, thus avoiding code duplication.

Thanks for the idea!

kfaraz · 2024-08-19T04:06:34Z

+                if (!task.supportsQueries()) {
+                  return new ForwardingQueryProcessingPool(Execs.dummy());
+                }
+                return new MetricsEmittingQueryProcessingPool(
+                    PrioritizedExecutorService.create(
+                        lifecycle,
+                        config
+                    ),
+                    executorServiceMonitor
+                );


Do not use a ForwardingQueryProcessingPool since it is not meant to be used anyway.

Suggested change

if (!task.supportsQueries()) {

return new ForwardingQueryProcessingPool(Execs.dummy());

}

return new MetricsEmittingQueryProcessingPool(

PrioritizedExecutorService.create(

lifecycle,

config

),

executorServiceMonitor

);

if (task.supportsQueries()) {

return super.getProcessingPoolExecutor(args);

} else {

// I wonder if we shouldn't just throw an exception or return null here

return DirectQueryProcessingPool.INSTANCE;

}

A similar simplification can be done for other methods too.

null would look better than the direct processing pool since using the DirectQueryProcessingPool.INSTANCE looks wrong. IMO it means to do everything in the calling thread which isn't the expected behaviour. Also, I'll test if throwing an exception works, but I think that would cause guice initialization error.

since it is not meant to be used anyway

I didn't understand this part. Why should we not be using the ForwardingQueryProcessingPool. The benefit of my approach would be that the calling code wouldn't need to assume that the processing pool can be null anywhere, and handle that case separately. Moreover, it also acts as a safeguard in case any non querying task tries to submit a task to the pool, instead of complacently executing the task in the same thread (as with the direct processing pool).

I didn't understand this part. Why should we not be using the ForwardingQueryProcessingPool. The benefit of my approach would be that the calling code wouldn't need to assume that the processing pool can be null anywhere, and handle that case separately. Moreover, it also acts as a safeguard in case any non querying task tries to submit a task to the pool, instead of complacently executing the task in the same thread (as with the direct processing pool).

I meant that if we know upfront that this task is not meant to use the query processing pool, then we should never return an instance that can be used at all, even if it causes the task to fail (since it was doing something illegal anyway).

I agree with your point about null.
How about we add a NoopQueryProcessingPool that throws Unsupported exception when anything is submitted to it?

Also, I wish the QueryProcessingPool didn't extend ListeningExecutorService.
It would make for a cleaner interface and it would have been much easier to write dummy implementations.
Are the executor service methods ever called on the query processing pool?

druid/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java

Line 214 in 2198001

queryProcessingPool, // Passed as executor service

This is one usage of the processing pool as an executor service. Its javadoc also mentions such usages

druid/processing/src/main/java/org/apache/druid/query/QueryProcessingPool.java

Lines 31 to 34 in 2198001

* This interface extends {@link ListeningExecutorService} as well. It has a separate

* method to submit query execution tasks so that implementations can differentiate those tasks from any regular async

* tasks. One example is {@link org.apache.druid.query.groupby.GroupingEngine#mergeRunners(QueryProcessingPool, Iterable)}

* where different kind of tasks are submitted to same processing pool.

I think a cleaner design would have been to have a method getExecutor() in the QueryProcessingPool interface. But since this is an @ExtensionPoint, I suppose we should leave it as is for now.

then we should never return an instance that can be used at all, even if it causes the task to fail

The ForwardingQueryProcessingPool(Execs.dummy()) would do exactly that unless I am mistaken. The task would be delegated to the dummy executor which throws UOE on any attempt to submit the task.
I attempted to create a NoopQueryProcessingPool while raising the PR, but it was doing the same thing. Maybe I can rename and make it clearer to read, or subclass the forwarding pool explicitly.

The ForwardingQueryProcessingPool(Execs.dummy()) would do exactly that unless I am mistaken. The task would be delegated to the dummy executor which throws UOE on any attempt to submit the task.

While this is true, there are small differences in using a dedicated NoopQueryProcessingPool:

The intent is clearer to someone reading the code. Using the Noop implementation implies that it is meant to do nothing. Using a Forwarding pool with a dummy executor could mean that it is supposed to have partial functionality.

The error message (and perhaps the stack trace too) would be more user-friendly. When using Noop pool, the exception is thrown by the processing pool itself rather than the underlying dummy executor service.

That said, this is not a blocker for this PR as it is a style choice really.
There are already some quirks of the QueryProcessingPool interface that could use some cleanup. We could address this then.

kfaraz · 2024-08-19T04:08:58Z

+            new Module()
+            {
+              @Override
+              public void configure(Binder binder)


Need not override this method if extending DruidProcessingModule.

LakshSingla · 2024-08-19T04:31:59Z

@kfaraz

buffers are all provided as lazy singletons and would be initialized/allocated only if needed by the task.

The pool is created lazily, which is when the various query toolchests/runners/engines are created. The allocation of the buffer can or cannot be lazy depending on the type of the pool.

For the blocking pool, all the buffers are allocated upfront in the constructor.
For the non-blocking pool, the buffers are allocated as you go. There's also an initialization count which determines the number of buffers to initialize at the beginning. This is set to number of processing threads, which means that we still allocate the buffers at the beginning.

I have verified the above by looking at one of the controller logs, which shouldn't be using the buffers.

316  2024-08-06T15:21:43,532 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[0] of size[400,000,000]
   1 2024-08-06T15:21:43,655 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new intermediate processing buffer[1] of size[400,000,000]
   2 2024-08-06T15:21:43,775 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new result merging buffer[0] of size[400,000,000]
   3 2024-08-06T15:21:43,894 INFO [main] org.apache.druid.offheap.OffheapBufferGenerator - Allocating new result merging buffer[1] of size[400,000,000]

LakshSingla · 2024-08-19T04:32:52Z

Thanks for the suggestion, that is much better than what I was trying to achieve with the latest commit.

LakshSingla · 2024-08-20T19:15:47Z

@kfaraz
It seems the overriding the provides methods isn't allowed by Guice. Unfortunately, using Modules.overrides() is the only way. Going ahead with the NoopQueryProcessingPool suggestion along with reverting the latest commit.

  8 1) Overriding @Provides methods is not allowed.
  9         @Provides method: org.apache.druid.guice.DruidProcessingModule.getIntermediateResultsPool()
 10         overridden by: org.apache.druid.guice.PeonProcessingModule.getIntermediateResultsPool()
 11   at com.google.inject.internal.ProviderMethodsModule.getProviderMethods(ProviderMethodsModule.java:163)
 12
 13 2) Overriding @Provides methods is not allowed.

This reverts commit 83085d4.

kfaraz · 2024-08-21T03:20:09Z

Ah, thanks for the clarification, @LakshSingla . Nice of Guice to give clear error messages.

Going ahead with the NoopQueryProcessingPool suggestion along with reverting the latest commit.

You are too quick to jump between commits 😛 .

There are still other things that can be done, like:
a) All the provider methods in DruidProcessingModule internally call a corresponding static creator method. PeonProcessingModule could use the same methods. Thus no code duplication, no override.
OR b) Both DruidProcessingModule and PeonProcessingModule extend a common base class which has the actual (non-provider) methods. Both the modules could override and annotate the methods appropriately.

Although, I think out of these two, option (a) is better.
I can't think of other better ways right now, will let you know if something comes to mind.

For now, do you think the above suggestion seems viable?

LakshSingla · 2024-08-21T04:48:36Z

I think the first one seems neat. Lemme try it out.

kfaraz

Thanks for incorporating the feedback, @LakshSingla ! Left some more suggestions.

kfaraz · 2024-08-21T09:27:38Z

+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+
+public class ProcessingModuleHelper


You need not add a new class for the static methods. I think it is cleaner to just keep these methods in DruidProcessingModule itself. It would help with the review as well.

After this change, DruidProcessingModule is more like the Historical+Indexer processing module. The same method for caching etc is copied everywhere. I feel that its neater to have it in a separate method, so that the methods can be used by other processing modules as well.

kfaraz · 2024-08-21T09:31:15Z

+/**
+ * Implementation of {@link QueryProcessingPool} that throws when it is given any query execution task unit
+ */
+public class NoopQueryProcessingPool extends ForwardingQueryProcessingPool


If we are writing a Noop implementation, it should not extend the Forwarding pool, rather implement the QueryProcessingPool directly and throw unsupported or equivalent exception in all methods.

kfaraz

Minor comments, rest looks good.

kfaraz · 2024-09-02T15:51:00Z

+public class NoopQueryProcessingPool implements QueryProcessingPool
+{
+  private static final QueryProcessingPool INSTANCE = new NoopQueryProcessingPool();
+  private static final DruidException UNSUPPORTED_EXCEPTION =


I am not sure if keeping an exception constant is desirable. You can keep the exception message as a constant but throw a fresh exception wherever needed.

kfaraz · 2024-09-02T15:51:11Z

+  private static final DruidException UNSUPPORTED_EXCEPTION =
+      DruidException.defensive("Unexpected call made to NoopQueryProcessingPool");
+
+  public static QueryProcessingPool instance()


Suggested change

public static QueryProcessingPool instance()

public static NoopQueryProcessingPool instance()

kfaraz · 2024-09-02T15:51:20Z

+ */
+public class NoopQueryProcessingPool implements QueryProcessingPool
+{
+  private static final QueryProcessingPool INSTANCE = new NoopQueryProcessingPool();


Suggested change

private static final QueryProcessingPool INSTANCE = new NoopQueryProcessingPool();

private static final NoopQueryProcessingPool INSTANCE = new NoopQueryProcessingPool();

kfaraz · 2024-09-02T15:53:28Z

+    if (!task.supportsQueries()) {
+      return DummyNonBlockingPool.instance();
+    }
+    return DruidProcessingModule.createIntermediateResultsPool(config);


Maybe invert the condition for readability:

Suggested change

if (!task.supportsQueries()) {

return DummyNonBlockingPool.instance();

}

return DruidProcessingModule.createIntermediateResultsPool(config);

if (task.supportsQueries()) {

return DruidProcessingModule.createIntermediateResultsPool(config);

} else {

return DummyNonBlockingPool.instance();

}

Same comment in other methods.

kfaraz

Thanks for the changes, @LakshSingla !

LakshSingla · 2024-09-09T03:51:45Z

Tests are failing due to insufficient coverage of the changes made to the processing module.

LakshSingla added 2 commits August 13, 2024 00:51

Controllers don't use buffers

7306952

add note

779b036

github-actions Bot added Kubernetes Area - Ingestion labels Aug 13, 2024

LakshSingla marked this pull request as ready for review August 13, 2024 08:42

cryptoe approved these changes Aug 13, 2024

View reviewed changes

review

83085d4

kfaraz reviewed Aug 19, 2024

View reviewed changes

Revert "review"

c93de2c

This reverts commit 83085d4.

LakshSingla added 2 commits August 21, 2024 14:39

helper class for guice bindings

776c400

add back import

a25dc45

kfaraz reviewed Aug 21, 2024

View reviewed changes

LakshSingla added 2 commits September 2, 2024 18:55

review

c997b5b

remove helper

5ed1be6

kfaraz reviewed Sep 2, 2024

View reviewed changes

reviews

275cd6e

kfaraz approved these changes Sep 6, 2024

View reviewed changes

ci

6737ca6

LakshSingla merged commit 72fbaf2 into apache:master Sep 10, 2024

LakshSingla deleted the controller-no-buffers branch September 10, 2024 06:06

LakshSingla mentioned this pull request Oct 9, 2024

Druid 31.0.0 release notes #17092

Merged

1 task

adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025

adarshsanjeev mentioned this pull request Jan 28, 2025

[DRAFT] 32.0.0 release notes #17677

Closed

	* This interface extends {@link ListeningExecutorService} as well. It has a separate
	* method to submit query execution tasks so that implementations can differentiate those tasks from any regular async
	* tasks. One example is {@link org.apache.druid.query.groupby.GroupingEngine#mergeRunners(QueryProcessingPool, Iterable)}
	* where different kind of tasks are submitted to same processing pool.

	public static QueryProcessingPool instance()
	public static NoopQueryProcessingPool instance()

	private static final QueryProcessingPool INSTANCE = new NoopQueryProcessingPool();
	private static final NoopQueryProcessingPool INSTANCE = new NoopQueryProcessingPool();

Conversation

LakshSingla commented Aug 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Release note

Uh oh!

cryptoe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gianm commented Aug 13, 2024

Uh oh!

LakshSingla commented Aug 14, 2024

Uh oh!

kfaraz commented Aug 19, 2024

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz Aug 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshSingla commented Aug 19, 2024

Uh oh!

LakshSingla commented Aug 19, 2024

Uh oh!

LakshSingla commented Aug 20, 2024

Uh oh!

kfaraz commented Aug 21, 2024

Uh oh!

LakshSingla commented Aug 21, 2024

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshSingla Aug 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LakshSingla commented Aug 13, 2024 •

edited

Loading

kfaraz Aug 19, 2024 •

edited

Loading

LakshSingla Aug 19, 2024 •

edited

Loading

kfaraz Aug 19, 2024 •

edited

Loading

LakshSingla Aug 21, 2024 •

edited

Loading