-
Notifications
You must be signed in to change notification settings - Fork 58
Add template execution and secret creation to the operator #16
Conversation
010d38a to
95aaaf0
Compare
|
Added a test step command. Fixed a bug where multiple identical Requires links caused multiple executions of the step. Moved from |
|
Added |
Templates are passed to the command, parsed, and then used as arbitrary
execution units in the CI graph. The operator requires that each
template contain at least one restart=Never pod and will set up
dependencies on earlier stages based on the parameters passed as input.
The following parameters are automatic:
* JOB_NAME: the name of the job from the Job spec
* JOB_NAME_HASH: a short hash of the job name
* JOB_NAME_SAFE: the job name converted to a Kubernetes safe name
* NAMESPACE: the target namespace
* IMAGE_FORMAT: requires the release stage be defined
* RPM_REPO: a URL pointing to the hosted RPM repo
* IMAGE_*: depends on the release stage and places the wildcard value
into IMAGE_FORMAT as ${component}
Environment variables can underfill the job.
The pods in each template are considered successful when they reach
phase=Succeeded and considered failed if any container or init container
has a non-zero exit code. The template name is used as the unique key.
--secret-dir may be specified multiple times and creates a secret with the contents of each directory. The secret name is the name of the directory.
Add an example of collecting JUnit results
Not just the release step.
Only dependencies of the named targets are returned. Steps return names that correspond to their images in general.
If the stage is completed, we delete the pod/templateinstance and wait for it to complete. If it's running, we wait for it to complete. Assumption is that on cancellation our pods get marked deleted anyway.
This guarantees a Prow cancellation stops pending work.
| func printBuildLogs(buildClient BuildClient, name string) { | ||
| if s, err := buildClient.Logs(name, &buildapi.BuildLogOptions{ | ||
| NoWait: true, | ||
| Timestamps: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timestamps were confusing and ragged. Wait was an accident.
| return nil, nil | ||
| } | ||
| return api.ParameterMap{ | ||
| fmt.Sprintf("IMAGE_%s", strings.ToUpper(strings.Replace(s.config.To.As, "-", "_", -1))): func() (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this and why is it not going to be a pain to depend on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids complex replacement logic (which is impossible) based on format inside of the template. You depend on IMAGE_ANSIBLE and you get IMAGE_FORMAT replaced by ansible.
| func (s *releaseImagesTagStep) Provides() (api.ParameterMap, api.StepLink) { | ||
| return api.ParameterMap{ | ||
| "IMAGE_FORMAT": func() (string, error) { | ||
| registry := "REGISTRY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When is this a valid output? Why not var registry string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to return an error actually, thanks.
| } | ||
| } | ||
| var format string | ||
| if len(s.config.Name) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what this implies
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was previously added, the godoc should make it clear.
pkg/steps/template.go
Outdated
| if err != nil { | ||
| return err | ||
| } | ||
| s.template.Parameters[i].Value = strings.Replace(format, "${component}", component, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make a constant for "${component}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
cmd/ci-operator/main.go
Outdated
|
|
||
| func (o *options) Complete() error { | ||
| if err := json.Unmarshal([]byte(o.rawBuildConfig), &o.buildConfig); err != nil { | ||
| configSpec := os.Getenv("CONFIG_SPEC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os.LookupEnv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah.
| From PipelineImageStreamTagReference `json:"from"` | ||
| // Commands are the shell commands to run in | ||
| // the repository root to execute tests. | ||
| Commands string `json:"commands"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it on us to do word splitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it? We take this single string and need to feed it into a []string{} for the Pod, so isn't every entry in that array treated as a single arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We’re feeding a scriptlet to bash as arg[2]. The same as the build command.
| { | ||
| Name: "test", | ||
| Image: fmt.Sprintf("%s:%s", PipelineImageStream, s.config.From), | ||
| Command: []string{"/bin/bash", "-c", "#!/bin/bash\nset -euo pipefail\n" + s.config.Commands}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am weary of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean having to deal with shell? We already require this for most other things.
cmd/ci-operator/main.go
Outdated
|
|
||
| func bindOptions() *options { | ||
| opt := &options{} | ||
| flag.Var(&opt.targets, "target", "A set of names in the config to target. Only steps that are required for these targets will be run.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use-case for more than one leaf here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build two images in the same repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you are building images, just run the build job to completion. When you need a leaf, you are running a single test. If the test requies multiple parents, make the tree relationship as necessary? Using --target for non-test runs seems wrong.
| return nil, nil | ||
| } | ||
|
|
||
| func (s *pipelineImageCacheStep) Name() string { return string(s.config.To) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want any steps other than leaves to be targeted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build just a specific image.
|
Addressed or commented, changes pushed. |
|
Added a |
The hash used for the namespace name needs to include the external dependencies we pull for images as well as the build configuration and source code in order to ensure we have reproducible builds. Add a new Step method `Inputs(...) api.InputDefinition` which returns an opaque list of inputs that can be combined into the final hash. After loading the steps we resolve all inputs and calculate the hash for the inputs, then use that in the input name. Because namespace is an input to the steps but can only be determined after we resolve inputs defined in other namespaces, we must switch steps to lazily include their namespace (so we pass cluster scoped clients to steps instead of namespace scoped steps). This makes the ci-operator hermetic with respect to its inputs.
|
Made the graph include the images we will tag in for prereqs part of the input hash, which required two changes:
I debated whether to change the signature of Run and decided against it because it keeps the run action independent of the actual graph being executed. Steps are already lazily evaluating state from the source of truth (the cluster) so the Run() command just asks the JobSpec what namespace it should run from. We don't attach things to context generally. After this commit, builds are hermetic which is a nice win (two different users with the same source, build definition, and base images will get the same output). This was required to make retries work:
After this change since the base image is changed, the input hash changes, which means a new namespace is used and new artifacts are created. |
|
@stevekuznetsov I get lonely without your comments... :( |
|
/test pull-ci-operator-unit |
1 similar comment
|
/test pull-ci-operator-unit |
stevekuznetsov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incoming changes LGTM but I am still not clear on the Creates() API and why we ever want multiple leaves for --target
|
I’ll make target singular. I can unify creates and provides in a follow up? |
|
Disabled multiple targets. Will come back to creates and provides. |
Because while two seconds may be sufficient teardown time for unit tests and such, it's too short for e2e teardown (where cluster logs should be collected, cluster resources need to be reaped, and collected assets need to be uploaded). The two-second timeout is from c2ac3ab (When interrupted, mark any in progress pods/templates as deleted, 2018-05-04, openshift#16). And actually, I'm not sure how the 2-second default worked even for unit tests and such, because Kubernetes pods have a 30-second grace period by default [1]. A recent aborted e2e job that did not run teardown to completion or collect any cluster assets is [2,3]. A recent successful e2e job showing current teardown timing is [4,5]. From [4]: 2019/02/14 18:19:54 Container setup in pod e2e-aws completed successfully 2019/02/14 18:46:08 Container test in pod e2e-aws completed successfully 2019/02/14 18:51:38 Container teardown in pod e2e-aws completed successfully So 5.5 minutes to teardown. And from [5]: time="2019-02-14T18:47:40Z" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"openshiftClusterID\":\"1341ccc2-66ac-46bb-ae15-0295d4a126ba\"}" ... time="2019-02-14T18:51:38Z" level=debug msg="Purging asset \"Cluster\" from disk" about 4 minutes of that is resource reaping (with the previous 1.5 minutes being log collection). [1]: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1255/pull-ci-openshift-installer-master-e2e-aws/3707/build-log.txt [3]: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/1255/pull-ci-openshift-installer-master-e2e-aws/3707/artifacts/ [4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1045/pull-ci-openshift-installer-master-e2e-aws/3706/build-log.txt [5]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1045/pull-ci-openshift-installer-master-e2e-aws/3706/artifacts/e2e-aws/installer/.openshift_install.log
Because while two seconds may be sufficient teardown time for unit tests and such, it's too short for e2e teardown (where cluster logs should be collected, cluster resources need to be reaped, and collected assets need to be uploaded). The two-second timeout is from c2ac3ab (When interrupted, mark any in progress pods/templates as deleted, 2018-05-04, openshift#16). And actually, I'm not sure how the 2-second default worked even for unit tests and such, because Kubernetes pods have a 30-second grace period by default [1]. A recent aborted e2e job that did not run teardown to completion or collect any cluster assets is [2,3]. A recent successful e2e job showing current teardown timing is [4,5]. From [4]: 2019/02/14 18:19:54 Container setup in pod e2e-aws completed successfully 2019/02/14 18:46:08 Container test in pod e2e-aws completed successfully 2019/02/14 18:51:38 Container teardown in pod e2e-aws completed successfully So 5.5 minutes to teardown. And from [5]: time="2019-02-14T18:47:40Z" level=debug msg="search for and delete matching resources by tag in us-east-1 matching aws.Filter{\"openshiftClusterID\":\"1341ccc2-66ac-46bb-ae15-0295d4a126ba\"}" ... time="2019-02-14T18:51:38Z" level=debug msg="Purging asset \"Cluster\" from disk" So about 4 minutes of that is resource reaping (with the previous 1.5 minutes being log collection). [1]: https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods [2]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1255/pull-ci-openshift-installer-master-e2e-aws/3707/build-log.txt [3]: https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/openshift_installer/1255/pull-ci-openshift-installer-master-e2e-aws/3707/artifacts/ [4]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1045/pull-ci-openshift-installer-master-e2e-aws/3706/build-log.txt [5]: https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1045/pull-ci-openshift-installer-master-e2e-aws/3706/artifacts/e2e-aws/installer/.openshift_install.log
Templates are passed to the command, parsed, and then used as arbitrary
execution units in the CI graph. The operator requires that each
template contain at least one restart=Never pod and will set up
dependencies on earlier stages based on the parameters passed as input.
The following parameters are automatic:
into IMAGE_FORMAT as ${component}
Environment variables can underfill the job.
The pods in each template are considered successful when they reach
phase=Succeeded and considered failed if any container or init container
has a non-zero exit code. The template name is used as the unique key.
Also support
--secret-dirwhich converts a directory into a secret with the same name as the directory.