Scheduled batch supervisor#17353
Conversation
Please enter the commit message for your changes. Lines starting
… reusability (#17542) This PR contains non-functional / refactoring changes of the following classes in the sql module: 1. Move ExplainPlan and ExplainAttributes fromsql/src/main/java/org/apache/druid/sql/http to processing/src/main/java/org/apache/druid/query/explain 2. Move sql/src/main/java/org/apache/druid/sql/SqlTaskStatus.java -> processing/src/main/java/org/apache/druid/query/http/SqlTaskStatus.java 3. Add a new class processing/src/main/java/org/apache/druid/query/http/ClientSqlQuery.java that is effectively a thin POJO version of SqlQuery in the sql module but without any of the Calcite functionality and business logic. 4. Move BrokerClient, BrokerClientImpl and Broker classes from sql/src/main/java/org/apache/druid/sql/client to server/src/main/java/org/apache/druid/client/broker. 5. Remove BrokerServiceModule that provided the BrokerClient. The functionality is now contained in ServiceClientModule in the server package itself which provides all the clients as well. This is done so that we can reuse the said classes in #17353 without brining in Calcite and other dependencies to the Overlord.
The jobsExecutor is reponsible for submitting jobs to the broker for all scheduled batch supervisors. The cronExecutor was simply decoupling the submitting of jobs from the actual scheduled running of jobs. The service client library already has async threads, so we remove the cronExecutor to keep things simple as things are handled in an async manner by the service client. If we ever observe and see evidence of bottlenecks around task submission, etc, we should be able to make jobsExecutor multiple threaded instead of single threaded.
| // do nothing | ||
| } | ||
|
|
||
| public enum State implements SupervisorStateManager.State |
Check notice
Code scanning / CodeQL
Class has same name as super class
kfaraz
left a comment
There was a problem hiding this comment.
Sorry for the delay on this, @abhishekrb19 !
I have tried to do a complete review. Once we have addressed these comments, we should be good to merge this PR.
Thanks a lot for your patience!! 🙂
|
@kfaraz, thank you for the reviews! I believe I have addressed and/or responded to all of your comments. |
kfaraz
left a comment
There was a problem hiding this comment.
Minor non-blocking comments. 🚀
| @JsonProperty("taskStatus") TaskStatus taskStatus, | ||
| @JsonProperty("updatedTime") DateTime updatedTime |
There was a problem hiding this comment.
Nit: Since this constructor is not used for deserializing right now, we don't need the @JsonProperty annotation.
| } | ||
|
|
||
| /** | ||
| * Used for internal tracking. So this field is *not* Jackson serialized to avoid |
There was a problem hiding this comment.
Nit: I am not sure if markdown is honored in javadocs. Use <i>, <b> or caps instead for emphasis.
|
@abhishekrb19 , the changes look good to me. You may proceed with the merge. When you get the time though, please take another look at this comment |
|
Sounds good, just responded in the comment inline |
This change introduces a scheduled batch supervisor in Druid. The supervisor periodically wakes up to submit an MSQ ingest query, allowing users to automate batch ingestion directly within Druid. Think of it as simple batch task workflows natively integrated into Druid, though it doesn't replace more sophisticated workflow management systems like Apache Airflow. This is an experimental feature.
Summary of changes:
The
scheduled_batchsupervisor can be configured as follows:{ "type": "scheduled_batch", "schedulerConfig": { "type": "unix", "schedule": "*/5 * * * *" }, "spec": { "query": "REPLACE INTO foo OVERWRITE ALL SELECT * FROM bar PARTITIONED BY DAY" }, "suspended": false }The supervisor will submit the
REPLACEsql query repeatedly every 5 minutes. The supervisor supports two types of cron scheduler configurations:unix.*/5 * * * *to schedule the SQL task every 5 minutes.@daily,@hourly,@monthly, etc.quartz.0 0 0 ? 3,6,9,12 MON-FRIto schedule tasks at midnight on weekdays during March, June, September, and December.Key points:
queryalong with any context in thespec. This structure is identical to what the MSQ task engine accepts.specas-is on its schedule.Some implementation details:
a. Validate and parse the user-specified query.
b. Submit MSQ queries to the
/druid/v2/sql/task/endpoint.ScheduledBatchScheduleris injected in the Overlord, which is responsible for scheduling and unscheduling all scheduled batch instances.BrokerClientimplementation has been added, leveraging theServiceClientfunctionality.SqlTaskStatusand its unit testSqlTaskStatusTesthave been moved from the msq module to the sql module so it can be reused by the BrokerClient implementation in the sql module.ExplainPlanInformationclass, which is used to deserialize the explain plan response from the Broker.The status API response for the supervisor contains the scheduler state along with active and completed tasks:
Details
{ "supervisorId": "scheduled_batch__foo__3764d545-62b8-4d52-a20c-568b009268e1", "state": "RUNNING", "lastTaskSubmittedTime": "2025-02-11T01:39:00.119Z", "nextTaskSubmissionTime": "2025-02-11T01:40:00.000Z", "taskReport": { "totalSubmittedTasks": 5, "totalSuccessfulTasks": 3, "totalFailedTasks": 2, "recentTasks": [ { "taskStatus": { "id": "query-bb0fc147-2cde-4935-95ed-bf40a26568de", "status": "SUCCESS", "duration": 26628, "errorMsg": null, "location": { "host": null, "port": -1, "tlsPort": -1 } }, "updatedTime": "2025-02-11T01:37:26.780Z" }, { "taskStatus": { "id": "query-1d51d1fc-ece5-4376-b64d-716ccecbd708", "status": "SUCCESS", "duration": 26367, "errorMsg": null, "location": { "host": null, "port": -1, "tlsPort": -1 } }, "updatedTime": "2025-02-11T01:38:26.493Z" }, { "taskStatus": { "id": "query-baa8d243-ff57-4c18-b513-865687dd3e93", "status": "FAILED", "duration": 26197, "errorMsg": "Task execution process exited unsuccessfully with code[2]. See middleManager logs for more details.", "location": { "host": null, "port": -1, "tlsPort": -1 } }, "updatedTime": "2025-02-11T01:39:26.355Z" } ] } }Future Improvements:
Release Note
This change introduces an experimental feature called scheduled batch supervisor in Druid. The supervisor periodically wakes up to submit an MSQ ingest query, allowing users to automate batch ingestion directly within Druid. Think of it as simple batch task workflows natively integrated into Druid, though it doesn't replace more sophisticated workflow management systems like Apache Airflow. Currently, the supervisor will repeatedly submit the same MSQ query as-is. The payload for the supervisor is below, please check the PR summary for additional details:
{ "type": "scheduled_batch", "schedulerConfig": { "type": "unix", "schedule": "*/5 * * * *" }, "spec": { "query": "REPLACE INTO foo OVERWRITE ALL SELECT * FROM bar PARTITIONED BY DAY" }, "suspended": false }This PR has: