tests/integration/checkpoint.bats: fix showing errors, rm unused code by kolyshkin · Pull Request #2581 · opencontainers/runc

kolyshkin · 2020-09-10T03:50:49Z

Another set of fixes to aim in debugging of #2475.

For previous fixes, see #2476, #2509, and #2548.

Individual commits description follows

tests/int/checkpoint: don't hide stderr

For test cases where we used pipes for container's stdin/stdout/stderr,
stderr was redirected to the same pipe as stdout, which practically
means it is lost.

These redirects to fd is needed not because we check that container is
working by writing to its stdin and reading from stdout (see
check_pipes), but also because bats redirects test stdout/stderr to a
file, which makes c/r impossible (as the file is outside of container).
This is why we can't just do something like 2>stderr.log, and have
to do what is done in this commit.

Introduce and use another pipe for stdout, to be used for both runc run
and runc restore, so it will be shown in case of errors.

Since its handling is somewhat complicated and is used from 4 places
(2 for run, 2 for restore), separate it into a helper functions.

NOTE the code assumes that runc exits with non-zero exit code in case
there is anything that needs to be shown to a user from runc's stderr.

While at it, add error checking to runc run calls.

Hopefully, this will help debug those rare checkpoint failures in CI.

tests/int/checkpoint: rm useless code

Commit 417f5ff added a code to close fds and kill processes, which
should have helped in an event of test case failure.

In fact, each test case is executed in a subshell, so

any variables set there can't reach teardown();
all the fds are closed (as the process is gone).

Now, I am not sure about the processes, but the code being removed
has never worked anyway, so it does not make sense to keep it.
Normally, those are waited for. In case of a test case failure,
well, the subsequent cases might fail, too.

Fixes: 417f5ff

tests/int/checkpoint.bats: fix showing logs on fail

This fixes the following issue with the test:

        not ok 12 checkpoint --lazy-pages and restore
        # (in test file tests/integration/checkpoint.bats, line 202)
        #   `[ $ret -eq 0 ]' failed
        ...
        # grep: ./work-dir/restore.log: No such file or directory

One might think that --work-path ./image-dir is a mistake, but it's
not, since criu lazy-pages -D ./image-dir creates a socket in
./image-dir, and then criu restore --lazy-pages expects a socket in
the work-dir.

Fixes: e232a71

This fixes the following issue with the test: not ok 12 checkpoint --lazy-pages and restore # (in test file tests/integration/checkpoint.bats, line 202) # `[ $ret -eq 0 ]' failed ... # grep: ./work-dir/restore.log: No such file or directory One might think that `--work-path ./image-dir` is a mistake, but it's not, since `criu lazy-pages -D ./image-dir` creates a socket in ./image-dir, and then `criu restore --lazy-pages` expects a socket in the workdir. Fixes: e232a71 Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

Commit 417f5ff added a code to close fds and kill processes, which should have helped in an event of test case failure. In fact, each test case is executed in a subshell, so - any variables set there can't reach teardown(); - all the fds are closed (as the process is gone). Now, I am not sure about the processes, but the code being removed has never worked anyway, so it does not make sense to keep it. Normally, those are waited for. In case of a test case failure, well, the subsequent cases might fail, too. Fixes: 417f5ff Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

For test cases where we used pipes for container's stdin/stdout/stderr, stderr was redirected to the same pipe as stdout, which practically means it is lost. These redirects to fd is needed not because we check that container is working by writing to its stdin and reading from stdout (see check_pipes), but also because bats redirects test stdout/stderr to a file, which makes c/r impossible (as the file is outside of container). This is why we can't just do something like `2>stderr.log`, and have to do what is done in this commit. Introduce and use another pipe for stdout, to be used for both runc run and runc restore, so it will be shown in case of errors. Since its handling is somewhat complicated and is used from 4 places (2 for run, 2 for restore), separate it into a helper functions. NOTE the code assumes that runc exits with non-zero exit code in case there is anything that needs to be shown to a user from runc's stderr. While at it, add error checking to runc run calls. Hopefully, this will help debug those rare checkpoint failures in CI. Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>

kolyshkin · 2020-09-10T03:52:14Z

@adrianreber @avagin PTAL

adrianreber · 2020-09-10T07:31:38Z

I cannot really say much about your shell pipe magic, but if it makes the test less flaky: 👍

kolyshkin · 2020-09-10T20:33:32Z

if it makes the test less flaky

It does not. What it does is it makes print stderr in case the exit code is non-zero.

Currently, stderr is appended to stdout and it is never shown, making it impossible to debug occasional CI failures (see #2475).

Also, I really hate the complexity of it, but I was unable to come up with something simpler.

kolyshkin added 3 commits September 9, 2020 18:24

kolyshkin mentioned this pull request Sep 10, 2020

criu int test failures #2475

Closed

AkihiroSuda approved these changes Sep 10, 2020

View reviewed changes

mrunalp approved these changes Sep 15, 2020

View reviewed changes

mrunalp merged commit 892477c into opencontainers:master Sep 15, 2020

kolyshkin deleted the fix-cpt-again branch September 16, 2020 15:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests/integration/checkpoint.bats: fix showing errors, rm unused code#2581

tests/integration/checkpoint.bats: fix showing errors, rm unused code#2581
mrunalp merged 3 commits into
opencontainers:masterfrom
kolyshkin:fix-cpt-again

kolyshkin commented Sep 10, 2020

Uh oh!

kolyshkin commented Sep 10, 2020

Uh oh!

adrianreber commented Sep 10, 2020

Uh oh!

kolyshkin commented Sep 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kolyshkin commented Sep 10, 2020

tests/int/checkpoint: don't hide stderr

tests/int/checkpoint: rm useless code

tests/int/checkpoint.bats: fix showing logs on fail

Uh oh!

kolyshkin commented Sep 10, 2020

Uh oh!

adrianreber commented Sep 10, 2020

Uh oh!

kolyshkin commented Sep 10, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants