Skip to content

tests/fedora32: retry dnf#2511

Merged
mrunalp merged 2 commits into
opencontainers:masterfrom
kolyshkin:fedora-dnf-fix
Jul 9, 2020
Merged

tests/fedora32: retry dnf#2511
mrunalp merged 2 commits into
opencontainers:masterfrom
kolyshkin:fedora-dnf-fix

Conversation

@kolyshkin
Copy link
Copy Markdown
Contributor

@kolyshkin kolyshkin commented Jul 7, 2020

Fedora mirrors are not very stable recently, leading to CI failures
(seen in multiple recent PRs) that usually look like this:

sudo: make: command not found

In fact it's caused by dnf failure to read metadata from mirrors:

Errors during downloading metadata for repository 'updates':
Downloading successful, but checksum doesn't match. Calculated: <....>
Error: Failed to download metadata for repo 'updates': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

The error went undetected due to lack of exit code check.

This PR:

  • adds set -e -u -o pipefail so the script will fail early (and do the same for centos7 vagrantfile);
  • adds a retry loop with a sleep around dnf invocation.

Fedora mirrors are not very stable recently, leading to CI failures
that usually look like this:

> sudo: make: command not found

In fact it's caused by dnf failure to read metadata from mirrors:

> Errors during downloading metadata for repository 'updates':
>    - Downloading successful, but checksum doesn't match. Calculated: <....>
> Error: Failed to download metadata for repo 'updates': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried

The error went undetected due to lack of exit code check.

This commit:
 - adds `set -e -u -o pipefail` so the script will fail early;
 - adds a retry loop with a sleep around dnf invocation.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Copy Markdown
Contributor Author

I plan to kick CI a few times in order to repro the dnf failure/retry.

@kolyshkin kolyshkin marked this pull request as draft July 7, 2020 20:11
@kolyshkin kolyshkin force-pushed the fedora-dnf-fix branch 2 times, most recently from 495f58a to 904d3c1 Compare July 7, 2020 20:52
@kolyshkin
Copy link
Copy Markdown
Contributor Author

kolyshkin commented Jul 7, 2020

It is now failed on

curl -o /usr/local/bin/umoci -fsSL https://github.com/opencontainers/umoci/releases/download/v0.4.5/umoci.amd64

with "429 too many requests", and I have seen it before.

Not sure what to do about it. Also retry? This is getting ugly :-\

@kolyshkin
Copy link
Copy Markdown
Contributor Author

Ran fedora32 vagrant CI 6 times, was not able to repro the repo failure.

Yet, I am pretty sure the retry code works as intended.

@AkihiroSuda @mrunalp PTAL

mrunalp
mrunalp previously approved these changes Jul 7, 2020
AkihiroSuda
AkihiroSuda previously approved these changes Jul 8, 2020
@AkihiroSuda
Copy link
Copy Markdown
Member

Travis not responding :(

Add `set -e -u -o pipefail` so the script will fail early
if there's an error.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin dismissed stale reviews from AkihiroSuda and mrunalp via ffe9f0b July 8, 2020 14:33
@kolyshkin
Copy link
Copy Markdown
Contributor Author

force-pushed to re-kick CI

@kolyshkin
Copy link
Copy Markdown
Contributor Author

CI is actually green, did not propagate to github though (see https://travis-ci.org/github/opencontainers/runc/builds/706190244)

@mrunalp PTAL

@mrunalp
Copy link
Copy Markdown
Contributor

mrunalp commented Jul 8, 2020

Hmm, another force push maybe?

@kolyshkin
Copy link
Copy Markdown
Contributor Author

this time I have a link to Travis, so I just clicked on on "restart build" in there

@kolyshkin
Copy link
Copy Markdown
Contributor Author

CI failed on checkpoint --lazy-pages test, which I have seen failing before, and this is partially addressed in #2509. Here is the failure (I think it is unrelated):

not ok 12 checkpoint --lazy-pages and restore
2971# (from function `__runc' in file tests/integration/helpers.bash, line 57,
2972#  in test file tests/integration/checkpoint.bats, line 182)
2973#   `__runc --criu "$CRIU" restore -d --work-path ./image-dir --image-path ./image-dir --lazy-pages test_busybox_restore <&60 >&51 2>&51' failed
2974# runc list (status=0):
2975# ID          PID         STATUS      BUNDLE      CREATED     OWNER
2976# runc spec (status=0):
2977# 
2978# runc state test_busybox (status=0):
2979# {
2980#   "ociVersion": "1.0.2-dev",
2981#   "id": "test_busybox",
2982#   "pid": 3982,
2983#   "status": "running",
2984#   "bundle": "/tmp/busyboxtest",
2985#   "rootfs": "/tmp/busyboxtest/rootfs",
2986#   "created": "2020-07-09T01:08:08.450097724Z",
2987#   "owner": ""
2988# }
2989# Warn  (criu/kerndat.c:869): Can't keep kdat cache on non-tempfs
2990# runc list (status=0):
2991# ID             PID         STATUS      BUNDLE             CREATED                          OWNER
2992# test_busybox   3982        running     /tmp/busyboxtest   2020-07-09T01:08:08.450097724Z   root
2993# runc kill test_busybox KILL (status=0):
2994# 
2995# runc delete test_busybox (status=0):

@mrunalp mrunalp merged commit 545ebdd into opencontainers:master Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants