-
Notifications
You must be signed in to change notification settings - Fork 4.8k
job controller proposal document added #2000
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,226 @@ | ||
| **Abstract** | ||
|
|
||
| First basic implementation proposal for a Job controller. Several exiting issues were already created regarding that particular subject: | ||
|
|
||
| Distributed CRON jobs in k8s #2156 | ||
| Job Controller #1624 | ||
|
|
||
| Several features also already exist that could be used by external tools to trigger batch execution on one pod within k8 cluster. | ||
|
|
||
| Create a standalone pod (not linked to any existing replication controllers) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you point to those places? |
||
| Execute a new process within a given container from a running pod. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you're referring here to remote command execution, then I'd say that's not related to Job controller. If you're talking about something different, then I'd say to make it more clear here. |
||
|
|
||
| **Motivation** | ||
|
|
||
| The main goal is to provide a new pod controller type, called Job, with the following characteristics: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be consistent with k8s, we'll probably name this a JobController, and single running instance of a piece of work will be a Job.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI ReplicationController is going to be renamed to ReplicaSet for the v2 API |
||
|
|
||
| * Be able to start of one or several pods tracked as a single entity representing a one execution of this job. Final job execution status will reflect the completion status of all pods started for this run. | ||
|
|
||
| * Ability to trigger a new job execution either on demand (triggered through server API by an external agent or by another k8 subsystem), or periodically based on a time-based schedule defined in job specification part (using ISO8601 syntax notation). | ||
|
|
||
| * Be able to restart a limited number of time failing containers of pods created by one job execution (using "Restart on failure" logic extended by a maximal restart count). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd extend this as an option to react to the output, we could then have two of them:
We should provide some structure allowing to configure those actions. |
||
|
|
||
| * Be able to limit the execution time of pods created by this job. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here all the admission control mechanics should be applied as well. I'm seeing time limiting as yet another option of admission control. @derekwaynecarr is my thinking correct? |
||
|
|
||
| * Be able to provide status information about pending and current job executions, along with a (limited) history of previous job runs (containing start/stop times with completion exit code) | ||
|
|
||
| * (Future evolution) Trigger jobs based on success/failure of others. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, here's what I was referring as reaction to OnFailure and OnSuccess. I think we could provide some more information around that.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm thinking here of a use case where you could chain one job after another OnSuccess. |
||
|
|
||
|
|
||
| **Job basic definition** | ||
|
|
||
| The new Job object json definition for a basic implementation will have the following content: | ||
| ``` | ||
| { | ||
| "apiVersion": "v1beta3", | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently that'll be v1, I guess. |
||
| "kind": "Job", | ||
| "metadata" : { | ||
| "name": "myJob" | ||
| "labels" : { | ||
| "name": "myJob" | ||
| } | ||
| } | ||
| "spec": { | ||
| "schedule": { | ||
| "timeSpec": "R5/T01:00:00/PT01", | ||
| "skipOutdated": true | ||
| "blockOnFailure": false | ||
| }, | ||
| "allowConcurrentRun": false, | ||
| "maxRestart": 2, | ||
| "execTimeout": 100, | ||
| "PodCount": 1, | ||
| "selector": { "name": "myjob"}, | ||
| "template": { | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd propose to use |
||
| "metadata" : { | ||
| "name": "myJobPodTemplate" | ||
| "labels" : { | ||
| "name": "myJobPodTemplate" | ||
| } | ||
| } | ||
| "spec": { | ||
| "restartPolicy": "OnFailure" | ||
| "containers": [{ | ||
| "name": "job-container", | ||
| "image": "app/job", | ||
| ... | ||
| }] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ``` | ||
| **Fields definition:** | ||
|
|
||
| Compared to replication controller, replica count is replaced by the PodCount field (number of Pods started simultaneously for a single job execution run). | ||
| All other fields are characteritic of this new controller type: | ||
|
|
||
| * "maxRestart": (default to 1) maximal number of times a failing container is allowed to be restarted. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By default it should not restart, imho. |
||
| * "execTimeout": (not restricted by default) maximal duration a container within a pod started by the Job controller is allowed to run. Reaching this limit leads to the container to be killed. | ||
| * "PodCount": (default to 1) number of pods started by a single run of the Job. | ||
| * "allowConcurrentRun": (default to true) dictates if concurrent runs of this Job are allowed or not. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe instead of boolean value, we could have a number defining how many concurrent runs are allowed. 0 - would mean no limits, whereas number > 0 would specify that actual limit. |
||
|
|
||
| * "schedule": (optional) sub structure defining scheduling details | ||
| * "timeSpec": describes the job schedule, expressed (using the ISO 8601) with a 3 parts dash separated string: "R[n]/[start date]/[duration]" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dash is |
||
| The first part gives the number of repetitions of the job (omitting this number means infinite repetitions) | ||
| The second part gives the starting time of the scheduling period. | ||
| The third part is the duration between successive runs of the scheduled pod(s). | ||
| * "suspendOnFailure": dictated if job schedule should be suspended in case of failure of the last completed job run. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by that? |
||
| * "skipOutdated": dictates if pending scheduled job runs should be discarded in favour of the latest one. This covers the case of a job having suspended its execution after a failed run, with several schedule points reached in the meantime (or the case of a long lasting run with concurrent execution disable). Instead of stacking pending executions, only the latest one is kept (and intermediate pending runs discarded). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That can be applied even without previous option. I can imagine the system came down (for whatever reason), a couple of jobs might miss their fire time. |
||
|
|
||
| **Implementation details** | ||
|
|
||
| Job objects are stored in a dedicated registry directory key (separated from the directory key used for replication controllers). | ||
|
|
||
| Similarly to replication controllers, a dedicated manager takes care of all jobs existing in the system, performing the following tasks: | ||
|
|
||
| * Executes a loop on the full list of defined jobs, calling on each a defined synchronization call-back, each execution loop being spaced by a configuration defined time interval (with default value of 5 seconds). | ||
| * Performs a (recursive) watch of the registry directory key containing jobs definition. For each job newly defined or updated in the system, the same synchronization routine is invoked. | ||
|
|
||
| This job manager provides a way for each job defined with schedule information to setup a dedicated timer for next schedule time. This allows a better accuracy of start execution time for pods launched by jobs and remove dependency on the interval value set for the synchronization loop. | ||
|
|
||
| The maximal restart count and maximal execution time for pods containers started by a job will be enforced at kubelet level, using dedicated resource limits added by the job in pod's containers definition. Two new resource types ("execution time" and "execution count") are introduced, that are used to add in the container's resources limits the figures defined in job specification and taken into account by the kubelet module (container is killed if the execution time is reached and restarted in case of failure exit up to this maximal count). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not clear to me if the |
||
|
|
||
| In order to precisely track job pods, 2 specific labels are pushed in created pod metadata: | ||
| * the job name ("jobName") | ||
| * A tag ("jobTag", either a generated UID, or simply the schedule date of this job execution) allowing to group all pods started started for a given job execution. | ||
|
|
||
| Along with labels, specific environment variables are also dynamically added in created pods definition: the number of pods started for this job execution and for each one its index number in the started pods list. | ||
|
|
||
| Job has the responsibility to advertise job run completion status (success or failure) through events, to perform the necessary clean-up in pods registry, to provide a detailed status of active job runs and the completion status of previous completed runs up to a given amount of history). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Events are not guaranteed to reach its destination. I agree about the responsibility to advertise, but its the role of the manager (JobController here) to clean-up pods registry, provide detailed status of all jobs at any time. |
||
| Collecting the standard output/error of pod's containers is not covered by this design (a common solution for containers started by any controller is needed). | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| **API types definition:** | ||
|
|
||
| The following API objects can be introduced: | ||
|
|
||
| ``` | ||
| type JobScheduleSpec struct { | ||
|
|
||
| // TimeSpec contains the schedule in ISO8601 format. | ||
| TimeSpec string `json:"timeSpec"` | ||
|
|
||
| // SkipOutdated specifies that if AllowConcurrent is false, only the latest pending job | ||
| // that is waiting for the currently executing one to complete must be started (default: true) | ||
| SkipOutdated bool `json:"skipOutdated"` | ||
|
|
||
| // BlockOnFailure suspends scheduling of next job runs after a failed one | ||
| BlockOnFailure bool `json:"blockOnFailure"` | ||
| } | ||
|
|
||
| type JobSpec struct { | ||
|
|
||
| // JobScheduleSpec contains scheduling related infos | ||
| JobScheduleSpec ScheduleSpec `json:"scheduleSpec"` | ||
|
|
||
| // AllowConcurrent specified whether concurrent jobs may be started | ||
| // (covers the case where new job schedule time is reached, while | ||
| // other previously started jobs are still running) | ||
| AllowConcurrent bool `json:"allowConcurrent"` | ||
|
|
||
| // ExecTimeout specifies the max amount of time a job is allowed to run (in seconds). | ||
| ExecTimeout int `json:"execTimeout"` | ||
|
|
||
| // MaxRestart specifies the max number of restarts for a failing scheduled pod. | ||
| MaxRestart int `json:"maxRestart"` | ||
|
|
||
| // PodCount specifies the numbe of pods to start for each job run | ||
| PodCount int `json:"podCount"` | ||
|
|
||
| // Pod selector for that controller | ||
| Selector map[string]string `json:"selector"` | ||
|
|
||
| // Reference to stand alone PodTemplate | ||
| TemplateRef *ObjectReference `json:"templateRef,omitempty"` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need for that I guess. |
||
|
|
||
| // Embedded pod template | ||
| Template *PodTemplateSpec `json:"template,omitempty"` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's |
||
| } | ||
|
|
||
| type JobExecStatus string | ||
| const ( | ||
| JobExecPending JobExecStatus = "pending" | ||
| JobExecRunning JobExecStatus = "running" | ||
| JobExecCompleted JobExecStatus = "completed" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At least missing |
||
| ) | ||
|
|
||
| type JobExecution struct { | ||
| // CreatedAt gives job creation time | ||
| CreatedAt util.Time | ||
|
|
||
| // StartedAt gives job starting time | ||
| StartedAt util.Time | ||
|
|
||
| // CompletedAt | ||
| CompletedAt util.Time | ||
|
|
||
| // Success gives job completion result | ||
| Success bool | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The next one, namely |
||
|
|
||
| // JobExecStatus | ||
| Status JobExecStatus | ||
|
|
||
| // Tag is added in labels of pod(s) created for this job execution. It allows | ||
| // job object to safely group/track all pods started for one given job execution. | ||
| Tag util.UID | ||
| } | ||
|
|
||
| // JobStatus represents the current state of Job | ||
| type JobStatus struct { | ||
| // PendingExecutions are the job runs pending for execution | ||
| PendingExecutions []JobExecution `json:"pendingExecutions"` | ||
|
|
||
| // CurrentExecutions are the job run currently executing | ||
| // (With pods scheduled in the kubernetes cluster) | ||
| RunningExecutions []JobExecution `json:"runningExecutions"` | ||
|
|
||
| // PreviousRunStates tracks previously scheduled jobs (up to a limited amount) | ||
| CompletedExecution []JobExecution `json:"completedExecution"` | ||
|
|
||
| // ScheduleCount tracks the count of run for this job | ||
| ScheduleCount int `json:"scheduleCount"` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it future schedule count or past? |
||
| } | ||
|
|
||
| // Job represents the configuration of a job controller. | ||
| type Job struct { | ||
| TypeMeta `json:",inline"` | ||
| ObjectMeta `json:"metadata,omitempty"` | ||
|
|
||
| // Spec defines the desired behavior of this job controller. | ||
| Spec JobSpec `json:"spec,omitempty"` | ||
|
|
||
| // Status is the current status of this job controller. | ||
| Status JobStatus `json:"status,omitempty"` | ||
| } | ||
|
|
||
| // JobList is a collection of job controllers. | ||
| type JobList struct { | ||
| TypeMeta `json:",inline"` | ||
| ListMeta `json:"metadata,omitempty"` | ||
|
|
||
| Items []Job `json:"items"` | ||
| } | ||
| ``` | ||
|
|
||
| JobExecution structure is defined to contain all informations for pending, running and completed job executions. The Job status structure holds objects of this type in three separated ordered lists (one by execution status). The "pending" execution list is used as a way to stack executions for job preventing concurrent runs, and also to trigger execution for job defined without time based scheduling. A new API server entry point is introduced to allow this on-demand job execution. Changes in this pending list is detected by the watch process in controller-manager module and the job instance called back takes the appropriate action (in the common case, creates pods and move this job execution in the running list) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to propose a architecture design, rather than implementation. Additionally I'd rather see that as a complete set of features, at least the most crucial ones we will implement. The implementation doesn't have to cover all from start, but this document should.