ci-operator: add build pending timeout period#2875
ci-operator: add build pending timeout period#2875bbguimaraes wants to merge 2 commits intoopenshift:masterfrom
Conversation
|
/cc @jupierce |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bbguimaraes The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test e2e |
4ba2cc6 to
13d7a34
Compare
| inputImages[conf.InputImage] = struct{}{} | ||
| } else if rawStep.PipelineImageCacheStepConfiguration != nil { | ||
| step = steps.PipelineImageCacheStep(*rawStep.PipelineImageCacheStepConfiguration, config.Resources, buildClient, jobSpec, pullSecret) | ||
| step = steps.PipelineImageCacheStep(*rawStep.PipelineImageCacheStepConfiguration, config.Resources, buildClient, podClient, jobSpec, pullSecret) |
There was a problem hiding this comment.
thoughts for food... We should really need to implement a stepFactory or something similar. I hate the fact that we keep passing clients everywhere.
13d7a34 to
c2cf334
Compare
|
/hold cancel |
c2cf334 to
b5ee416
Compare
This follows the similar implementation (and uses the same time period)
in pkg/steps/template.go.
---
Example (with timeout changed to `1s`):
```yaml
resources:
src:
requests:
memory: 1T
```
```
INFO[2022-06-17T18:22:59Z] Building src
INFO[2022-06-17T18:23:05Z] build didn't start running within 1s (phase: Pending):
Found 1 events for Pod src-build:
* 0x : 0/23 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/ci-builds-worker: ci-builds-worker}, that the pod didn't tolerate, 1 node(s) had taint {node-role.kubernetes.io/ci-prowjobs-worker: ci-prowjobs-worker}, that the pod didn't tolerate, 1 node(s) were unschedulable, 2 Insufficient memory, 2 node(s) had taint {ci.openshift.io/ci-search: true}, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/infra: }, that the pod didn't tolerate, 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 4 node(s) had taint {node-role.kubernetes.io/ci-longtests-worker: ci-longtests-worker}, that the pod didn't tolerate, 6 node(s) had taint {node-role.kubernetes.io/ci-tests-worker: ci-tests-worker}, that the pod didn't tolerate.
```
---
There is an unfortunate layering violation here in that we are forced to
get to the build pod through the `pod-name` annotation and examine it.
I could not find a way to do this through the `Build` object (as is
possible for logs, for example). A pending build has very little
information:
```
status:
conditions:
- lastTransitionTime: "2022-06-17T11:06:41Z"
lastUpdateTime: "2022-06-17T11:06:41Z"
status: "False"
type: New
- lastTransitionTime: "2022-06-17T11:06:41Z"
lastUpdateTime: "2022-06-17T11:06:41Z"
status: "True"
type: Pending
output: {}
outputDockerImageReference: image-registry.openshift-image-registry.svc:5000/bbguimaraes0/pipeline:src
phase: Pending
```
It is only by examining the build pod (reusing the existing code which
handles test pods, at least) that the cause can be determined.
b5ee416 to
6231445
Compare
|
@bbguimaraes: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
Issues go stale after 90d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle stale |
|
@bbguimaraes: PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
Stale issues rot after 30d of inactivity. Mark the issue as fresh by commenting If this issue is safe to close now please do so with /lifecycle rotten |
|
Rotten issues close after 30d of inactivity. Reopen the issue by commenting /close |
|
@openshift-bot: Closed this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/reopen |
|
@bbguimaraes: Failed to re-open PR: state cannot be changed. The build_pending branch was force-pushed or recreated. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
This follows the similar implementation (and uses the same time period)
in pkg/steps/template.go.
Example (with timeout changed to
1s):There is an unfortunate layering violation here in that we are forced to
get to the build pod through the
pod-nameannotation and examine it.I could not find a way to do this through the
Buildobject (as ispossible for logs, for example). A pending build has very little
information:
It is only by examining the build pod (reusing the existing code which
handles test pods, at least) that the cause can be determined.
https://issues.redhat.com/browse/DPTP-2836
/hold
It's Friday.