Skip to content

tests/integration/checkpoint.bats: fix showing errors, rm unused code#2581

Merged
mrunalp merged 3 commits into
opencontainers:masterfrom
kolyshkin:fix-cpt-again
Sep 15, 2020
Merged

tests/integration/checkpoint.bats: fix showing errors, rm unused code#2581
mrunalp merged 3 commits into
opencontainers:masterfrom
kolyshkin:fix-cpt-again

Conversation

@kolyshkin
Copy link
Copy Markdown
Contributor

Another set of fixes to aim in debugging of #2475.

For previous fixes, see #2476, #2509, and #2548.

Individual commits description follows

tests/int/checkpoint: don't hide stderr

For test cases where we used pipes for container's stdin/stdout/stderr,
stderr was redirected to the same pipe as stdout, which practically
means it is lost.

These redirects to fd is needed not because we check that container is
working by writing to its stdin and reading from stdout (see
check_pipes), but also because bats redirects test stdout/stderr to a
file, which makes c/r impossible (as the file is outside of container).
This is why we can't just do something like 2>stderr.log, and have
to do what is done in this commit.

Introduce and use another pipe for stdout, to be used for both runc run
and runc restore, so it will be shown in case of errors.

Since its handling is somewhat complicated and is used from 4 places
(2 for run, 2 for restore), separate it into a helper functions.

NOTE the code assumes that runc exits with non-zero exit code in case
there is anything that needs to be shown to a user from runc's stderr.

While at it, add error checking to runc run calls.

Hopefully, this will help debug those rare checkpoint failures in CI.

tests/int/checkpoint: rm useless code

Commit 417f5ff added a code to close fds and kill processes, which
should have helped in an event of test case failure.

In fact, each test case is executed in a subshell, so

  • any variables set there can't reach teardown();
  • all the fds are closed (as the process is gone).

Now, I am not sure about the processes, but the code being removed
has never worked anyway, so it does not make sense to keep it.
Normally, those are waited for. In case of a test case failure,
well, the subsequent cases might fail, too.

Fixes: 417f5ff

tests/int/checkpoint.bats: fix showing logs on fail

This fixes the following issue with the test:

        not ok 12 checkpoint --lazy-pages and restore
        # (in test file tests/integration/checkpoint.bats, line 202)
        #   `[ $ret -eq 0 ]' failed
        ...
        # grep: ./work-dir/restore.log: No such file or directory

One might think that --work-path ./image-dir is a mistake, but it's
not, since criu lazy-pages -D ./image-dir creates a socket in
./image-dir, and then criu restore --lazy-pages expects a socket in
the work-dir.

Fixes: e232a71

This fixes the following issue with the test:

	not ok 12 checkpoint --lazy-pages and restore
	# (in test file tests/integration/checkpoint.bats, line 202)
	#   `[ $ret -eq 0 ]' failed
	...
	# grep: ./work-dir/restore.log: No such file or directory

One might think that `--work-path ./image-dir` is a mistake, but it's
not, since `criu lazy-pages -D ./image-dir` creates a socket in
./image-dir, and then `criu restore --lazy-pages` expects a socket in
the workdir.

Fixes: e232a71
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Commit 417f5ff added a code to close fds and kill processes, which
should have helped in an event of test case failure.

In fact, each test case is executed in a subshell, so
 - any variables set there can't reach teardown();
 - all the fds are closed (as the process is gone).

Now, I am not sure about the processes, but the code being removed
has never worked anyway, so it does not make sense to keep it.
Normally, those are waited for. In case of a test case failure,
well, the subsequent cases might fail, too.

Fixes: 417f5ff
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
For test cases where we used pipes for container's stdin/stdout/stderr,
stderr was redirected to the same pipe as stdout, which practically
means it is lost.

These redirects to fd is needed not because we check that container is
working by writing to its stdin and reading from stdout (see
check_pipes), but also because bats redirects test stdout/stderr to a
file, which makes c/r impossible (as the file is outside of container).
This is why we can't just do something like `2>stderr.log`, and have
to do what is done in this commit.

Introduce and use another pipe for stdout, to be used for both runc run
and runc restore, so it will be shown in case of errors.

Since its handling is somewhat complicated and is used from 4 places
(2 for run, 2 for restore), separate it into a helper functions.

NOTE the code assumes that runc exits with non-zero exit code in case
there is anything that needs to be shown to a user from runc's stderr.

While at it, add error checking to runc run calls.

Hopefully, this will help debug those rare checkpoint failures in CI.

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin
Copy link
Copy Markdown
Contributor Author

@adrianreber @avagin PTAL

@adrianreber
Copy link
Copy Markdown
Contributor

I cannot really say much about your shell pipe magic, but if it makes the test less flaky: 👍

@kolyshkin
Copy link
Copy Markdown
Contributor Author

if it makes the test less flaky

It does not. What it does is it makes print stderr in case the exit code is non-zero.

Currently, stderr is appended to stdout and it is never shown, making it impossible to debug occasional CI failures (see #2475).

Also, I really hate the complexity of it, but I was unable to come up with something simpler.

@mrunalp mrunalp merged commit 892477c into opencontainers:master Sep 15, 2020
@kolyshkin kolyshkin deleted the fix-cpt-again branch September 16, 2020 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants