You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For someone getting started with Druid, this can be a lot of new and distinct concepts to understand. Additionally, it is not always clear from the process names what each process type is responsible for. For example, "coordinator", "overlord", and "middle manager" (and possibly even "broker") all suggest some kind of cluster management functionality.
As a result, some initial confusion is not uncommon when trying to understand Druid's cluster architecture.
Such concerns re: process organization and naming have been discussed in the Druid community in the past, e.g.:
This proposal suggests that we introduce the following "server type" concepts to the Druid documentation and default packaging, where a "server type" is a deployment grouping of the existing Druid processes:
Master server
Coordinator + Overlord processes
Manages data ingestion and storage: responsible for starting new ingestion jobs and coordinating availability of data on the "Data servers" described below
Query server
Broker + Router processes
The endpoints that users and client applications interact with, routing queries to data servers or other query servers (and optionally proxied master server requests as well)
Data server
Historical + Middle manager processes
Executes ingestion jobs and stores all queryable data
We have been using this master/query/data server organization in our docs and default packaging at Imply, and we've found in practice that this structure helps users grasp Druid's architecture more quickly.
The Druid docs and packaging would be updated to guide a new user in thinking of a cluster in terms of these larger process groupings:
Introduce a new page or section describing these server types at a high level
Rework docs to relate discussion of specific processes to larger "server type" grouping where appropriate
Update quickstart and config templates
Create a "master" server config template, running a combined Coordinator and Overlord with druid.coordinator.asOverlord.enabled set to true
Create a "query" server config template, including the broker and possibly a colocated router too (maybe when we feel like moving this out of experimental status?)
Create a "data" server config template, including a colocated historical and middle manager
For users with more complex resource allocation requirements, the documentation should clearly describe how/why the processes within these "server types" can be deployed/scaled individually. The docs would frame deployments with separated processes as a more "advanced" architecture, suggesting the simpler consolidated deployments for most users.
New or Changed Public Interfaces
No public interfaces are changed.
Compatibility
These are conceptual changes to the docs and packaged templates only, existing clusters would not be affected.
For simplicity, the broker and router could be consolidated into a single process as well.
More hypothetically, we could consider consolidating the historical and middle manager into one process as well, this could help enable better dynamic resource allocation decisions for example.
Motivation
Druid currently has 6 process types:
For someone getting started with Druid, this can be a lot of new and distinct concepts to understand. Additionally, it is not always clear from the process names what each process type is responsible for. For example, "coordinator", "overlord", and "middle manager" (and possibly even "broker") all suggest some kind of cluster management functionality.
As a result, some initial confusion is not uncommon when trying to understand Druid's cluster architecture.
Such concerns re: process organization and naming have been discussed in the Druid community in the past, e.g.:
Proposed Changes
This proposal suggests that we introduce the following "server type" concepts to the Druid documentation and default packaging, where a "server type" is a deployment grouping of the existing Druid processes:
Master server
Query server
Data server
We have been using this master/query/data server organization in our docs and default packaging at Imply, and we've found in practice that this structure helps users grasp Druid's architecture more quickly.
The Druid docs and packaging would be updated to guide a new user in thinking of a cluster in terms of these larger process groupings:
druid.coordinator.asOverlord.enabledset to trueNew or Changed Public Interfaces
No public interfaces are changed.
Compatibility
These are conceptual changes to the docs and packaged templates only, existing clusters would not be affected.
Potential future work
Alternatives
None