Follow up to #13365.
People often want to evaluate Druid by installing it on a single server, ingesting a bunch of data, and then doing queries. In the common case, ingestion and querying are not happening simultaneously, unless of course it's realtime ingestion. But let's focus on batch for now.
There's a problem here: our out-of-box single-server configs, and the bin/start-druid script as well, are designed such that MiddleManager/Indexer occupies a relatively small portion of the resources of the server. On many common server sizes, druid.worker.capacity is set smaller than the number of processors on the machine, meaning we aren't able to use the full resources of the hardware to do ingestion. (Most ingestion tasks are single-threaded; parallelism comes from launching more tasks.)
I suggest we solve this using a combination of three approaches:
- Use 512MB heap per task (or even 256MB) rather than 1GB when CPU-to-memory ratio is higher, so we can get more tasks and therefore use more processors. This would work for both MM and Indexer.
- Allow TASKS to use some memory that we had set aside for segment page cache. Currently, that's half the memory of the server, as long as it's above a minimum threshold. The beauty of TASKS is that they only use memory if they are actually launched. So, it's OK to eat into that 50% buffer; we'll only actually use it when all task slots are actually full. Then, when they're not, it all becomes available for page cache again. (This doesn't apply to INDEXER, where the memory is always used. Can't pull the same trick here.)
- Cap
druid.worker.capacity at the number of processors; doesn't help much to have it be higher.
The goal should be that in most cases, for a single server setup, druid.worker.capacity ends up getting set to roughly the number of processors.
Follow up to #13365.
People often want to evaluate Druid by installing it on a single server, ingesting a bunch of data, and then doing queries. In the common case, ingestion and querying are not happening simultaneously, unless of course it's realtime ingestion. But let's focus on batch for now.
There's a problem here: our out-of-box single-server configs, and the
bin/start-druidscript as well, are designed such that MiddleManager/Indexer occupies a relatively small portion of the resources of the server. On many common server sizes,druid.worker.capacityis set smaller than the number of processors on the machine, meaning we aren't able to use the full resources of the hardware to do ingestion. (Most ingestion tasks are single-threaded; parallelism comes from launching more tasks.)I suggest we solve this using a combination of three approaches:
druid.worker.capacityat the number of processors; doesn't help much to have it be higher.The goal should be that in most cases, for a single server setup,
druid.worker.capacityends up getting set to roughly the number of processors.