pkg/steps: emit detailed build errors in debug log#3385
Merged
openshift-ci[bot] merged 6 commits intoopenshift:masterfrom Apr 24, 2023
Merged
pkg/steps: emit detailed build errors in debug log#3385openshift-ci[bot] merged 6 commits intoopenshift:masterfrom
openshift-ci[bot] merged 6 commits intoopenshift:masterfrom
Conversation
This parameter is only ever set in tests.
It can be difficult to identify when builds are reused based on the main
output:
```
…
INFO[2023-04-20T13:18:59Z] Building failure
INFO[2023-04-20T13:19:31Z] Build failure failed, printing logs:
…
STEP 3/5: RUN false
error: build error: error building at STEP "RUN false": error while running runtime: exit status 1
INFO[2023-04-20T13:19:32Z] Ran for 44s
ERRO[2023-04-20T13:19:32Z] Some steps failed:
ERRO[2023-04-20T13:19:32Z]
* could not run steps: step failure failed: error occurred handling build failure: the build failure failed after 32s with reason DockerBuildFailed: Dockerfile build strategy has failed.
…
```
Other than a few accidental log messages from the build environment, there is
no temporal information in the build logs. This is especially problematic for
permanent failures which will never succeed without the deletion of the build,
either directly or via the deletion of the test namespace.
Identifying this type of scenario requires searching the artifacts for the
generated `Build` objects, which is laborious at best but can also be
impossible in the infamous case of the reuse of a build scheduled on a node
which has since been removed from the cluster.
To make it easier to identify these cases, more information is now emitted:
- a message in the main log informs whether builds are created or reused
- a message in the debug log details failure conditions
```
INFO[2023-04-20T13:23:57Z] Building failure
INFO[2023-04-20T13:23:57Z] Found existing build "failure"
INFO[2023-04-20T13:23:57Z] Build failure failed, printing logs:
```
```
$ jq --raw-output '[.time,.msg]|join(" ")' ci-operator.log
2023-04-20T13:23:50Z Building failure
…
2023-04-20T13:23:57Z Building failure
2023-04-20T13:23:57Z Found existing build "failure"
2023-04-20T13:23:57Z Waiting for build to be complete.
2023-04-20T13:23:57Z Build failure failed, printing logs:
2023-04-20T13:23:58Z Build "failure" (created at 2023-04-20 13:20:45 +0000 UTC) classified as legitimate failure, will not be retried
…
```
For nonexistent pods:
```
$ ci-operator …
…
INFO[2023-04-20T13:26:54Z] Building failure
INFO[2023-04-20T13:26:55Z] Found existing build "failure"
INFO[2023-04-20T13:26:55Z] Build failure failed, printing logs:
WARN[2023-04-20T13:26:55Z] Unable to retrieve logs from failed build error=pod "failure-build" not found
…
```
```
$ jq … ci-operator.log
2022-08-15T10:21:58Z unset version 0
…
2023-04-20T13:26:54Z Building failure
2023-04-20T13:26:55Z Found existing build "failure"
2023-04-20T13:26:55Z Waiting for build to be complete.
2023-04-20T13:26:55Z Build failure failed, printing logs:
2023-04-20T13:26:55Z Unable to retrieve logs from failed build
2023-04-20T13:26:55Z Build "failure" (created at 2023-04-20 13:20:45 +0000 UTC) classified as legitimate failure, will not be retried
…
```
droslean
approved these changes
Apr 24, 2023
Contributor
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bbguimaraes, droslean The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Contributor
Contributor
|
@bbguimaraes: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
It can be difficult to identify when builds are reused based on the main
output:
Other than a few accidental log messages from the build environment, there is
no temporal information in the build logs. This is especially problematic for
permanent failures which will never succeed without the deletion of the build,
either directly or via the deletion of the test namespace.
Identifying this type of scenario requires searching the artifacts for the
generated
Buildobjects, which is laborious at best but can also beimpossible in the infamous case of the reuse of a build scheduled on a node
which has since been removed from the cluster.
To make it easier to identify these cases, more information is now emitted:
For nonexistent pods:
https://issues.redhat.com/browse/DPTP-2836