Increased druid.worker.capacity in bin/start-druid script

Follow up to https://github.com/apache/druid/pull/13365.

People often want to evaluate Druid by installing it on a single server, ingesting a bunch of data, and then doing queries. In the common case, ingestion and querying are not happening simultaneously, unless of course it's realtime ingestion. But let's focus on batch for now.

There's a problem here: our out-of-box single-server configs, and the `bin/start-druid` script as well, are designed such that MiddleManager/Indexer occupies a relatively small portion of the resources of the server. On many common server sizes, `druid.worker.capacity` is set smaller than the number of processors on the machine, meaning we aren't able to use the full resources of the hardware to do ingestion. (Most ingestion tasks are single-threaded; parallelism comes from launching more tasks.)

I suggest we solve this using a combination of three approaches:

1. Use 512MB heap per task (or even 256MB) rather than 1GB when CPU-to-memory ratio is higher, so we can get more tasks and therefore use more processors. This would work for both MM and Indexer.
2. Allow TASKS to use some memory that we had set aside for segment page cache. Currently, that's half the memory of the server, as long as it's above a minimum threshold. The beauty of TASKS is that they only use memory if they are actually launched. So, it's OK to eat into that 50% buffer; we'll only actually use it when all task slots are actually full. Then, when they're not, it all becomes available for page cache again. (This doesn't apply to INDEXER, where the memory is always used. Can't pull the same trick here.)
3. Cap `druid.worker.capacity` at the number of processors; doesn't help much to have it be higher.

The goal should be that in most cases, for a single server setup, `druid.worker.capacity` ends up getting set to roughly the number of processors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increased druid.worker.capacity in bin/start-druid script #13547

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Increased druid.worker.capacity in bin/start-druid script #13547

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions