Skip to content
This repository was archived by the owner on May 12, 2021. It is now read-only.

[WIP: DNM] CI: zuul: add fake ZUUL.md to check Zuul CI#154

Closed
grahamwhaley wants to merge 1 commit intokata-containers:masterfrom
grahamwhaley:20190319_test_zuul
Closed

[WIP: DNM] CI: zuul: add fake ZUUL.md to check Zuul CI#154
grahamwhaley wants to merge 1 commit intokata-containers:masterfrom
grahamwhaley:20190319_test_zuul

Conversation

@grahamwhaley
Copy link
Contributor

Previous already open zuul test PR is getting a bit long in the tooth and collecting some fluff. Make a new one to test out the new Zuul integration.

Add a fake fixes line as well ;-)

Fixes: kata-containers#1

Signed-off-by: Graham Whaley <graham.whaley@intel.com>
@grahamwhaley grahamwhaley requested a review from a team as a code owner March 19, 2019 16:26
@grahamwhaley
Copy link
Contributor Author

Yay, I see a kata-containers/PR-check pending... let's see how it flies, and where it ends up pointing its 'details' URL to..

@grahamwhaley
Copy link
Contributor Author

ftr, the job is failing to run like:

+ export WORKSPACE=/home/zuul
+ WORKSPACE=/home/zuul
+ export GOPATH=/home/zuul/go
+ GOPATH=/home/zuul/go
+ export CI=true
+ CI=true
+ export ZUUL=true
+ ZUUL=true
+ export PATH=/home/zuul/go/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+ PATH=/home/zuul/go/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+ .ci/install_go.sh -p -f
/bin/bash: line 8: .ci/install_go.sh: No such file or directory
+ .ci/setup.sh
.ci/lib.sh: line 18: go: command not found
.ci/setup.sh: line 14: pushd: /home/zuul/go/src/github.com/kata-containers/tests: No such file or directory

I need to go stare at the code to work out where/why the tests repo code is not present when we executed. I probably missed something (obvious) when translating/re-orging from the old Zuul jobs.

@grahamwhaley
Copy link
Contributor Author

Ah, I see .... this is invoking the .ci/setup.sh from the repo under test - in this case the proxy repo. Which has indeed done the right thing, and tried to use go to go and get the full tests repo.... but, we don't have go installed. My mistake - I tried to get the scripts to use the tests repo .ci/install_go.sh, but there is the catch22 - we can't call that script until we have used go to pull the repo :-)

In the former Zuul scripts they had a distro specific pre-script that pulled the minimum components required, including golang. That was in a zuul role before, and hard wired the go version to 1.10, which is not ideal. Let me go have a think/look at that - unless @ttx or @chavafg want to have a peek - but, I know they (as well as I) are all a bit busy right now - I'll try to chip away at this over the next week...

@chavafg
Copy link
Contributor

chavafg commented Mar 19, 2019

hmm, seems like there is also something wrong with our scripts...
install_go.sh shouldn't need to look for go

@chavafg
Copy link
Contributor

chavafg commented Mar 19, 2019

@grahamwhaley is there any link to the failed job? I don't see it here

@grahamwhaley
Copy link
Contributor Author

@chavafg - it's not install_go.sh that is causing it - it is pre-that - the repo local setup.sh calls the lib.sh clone_tests_repo(), which tries to go get the tests repo... but, we don't have go installed on the slave... hence, above we see:

.ci/lib.sh: line 18: go: command not found

which took me a moment to realise. See:
https://github.com/kata-containers/proxy/blob/master/.ci/lib.sh#L9

@grahamwhaley
Copy link
Contributor Author

@chavafg - yeah, let me go write that wiki page the build links to - but, in the mean time, try:
https://zuul.opendev.org/t/kata-containers/builds?job_name=QA-check-Ubuntu-16.04

@chavafg
Copy link
Contributor

chavafg commented Mar 19, 2019

yeah, you are right, we are tying to go get the tests repo when go is not installed.
Also, the .ci/install_go.sh will not work unless it is being called on the tests repo.

One option would be to add go as prerequisite. Just like we had before:
https://git.openstack.org/cgit/openstack-infra/openstack-zuul-jobs/tree/roles/kata-setup/tasks/setup/Ubuntu.yaml#n6
but that will hardcode a version that we might change in the future.

@grahamwhaley
Copy link
Contributor Author

recheck
with https://review.openstack.org/#/c/650313/ now landed.

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck

@grahamwhaley
Copy link
Contributor Author

Looks like I have a path wrong somewhere to invoke the go installation:

+ PATH=/home/zuul/go/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+ .ci/install_go.sh -p -f
/bin/bash: line 8: .ci/install_go.sh: No such file or directory

I'll go stare and PR to Zuul.

@grahamwhaley
Copy link
Contributor Author

OK, I think this is a golang version issue. We incorrectly (still) try to run the install_go.sh from the under-test repo .ci dir, rather than pulling and then using the tests repo. mea-culpa. Ideally, I think the install_go should happen from the setup.sh chained sequence, and not be hand-invoked by the CIs. So, I've opened
kata-containers/tests#1465
to see if we can make that a reality.

If that works, then we (I) can rework the Zuul code to cater for that.

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck

@grahamwhaley
Copy link
Contributor Author

Update then. With the newly landed kata-containers/tests#1465
we still get the initiall install_go.sh fail, but that is benign and just needs deleting from the zuul files:

+ .ci/install_go.sh -p -f
/bin/bash: line 8: .ci/install_go.sh: No such file or directory

We do now see the new golang installed:

+ .ci/setup.sh
~/go/src/github.com/kata-containers/tests ~/src/github.com/kata-containers/proxy
/tmp/install-go-tmp.B2xGYqnYU4 ~/go/src/github.com/kata-containers/tests
/usr/bin/go
INFO: removing go version go1.6.2 linux/amd64
INFO: Download go version 1.11.1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
 21  121M   21 26.5M    0     0  38.9M      0  0:00:03 --:--:--  0:00:03 38.9M
100  121M  100  121M    0     0  87.5M      0  0:00:01  0:00:01 --:--:-- 87.5M
INFO: Install go

But, we still fail the agent build:

INFO: Building github.com/kata-containers/agent
go build -buildmode=pie -tags "" -o kata-agent \
	-ldflags "-X main.version=1.7.0-alpha0-7720b93c0cbef25490d3f4c88b2c796a5a895cd7 -X main.seccompSupport=no "
channel.go:10:2: cannot find package "context" in any of:
	/home/zuul/go/src/github.com/kata-containers/agent/vendor/context (vendor tree)
	/usr/lib/go-1.6/src/context (from $GOROOT)
	/home/zuul/go/src/context (from $GOPATH)

I suspect a GOPATH setup item somewhere. Any immediate thoughts @chavafg ? I'll go dig later otherwise.

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck
as https://review.openstack.org/#/c/653711/ merged. Oops, I forgot to remove the now-un-necessary extra 'install go' script call - but, I believe that is benign and will not error out. I guess we'll see!

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck

as it doesn't appear to have triggered last time - I can find no build record on Zuul...
if it doesn't trigger again then we'll investigate why.

@grahamwhaley
Copy link
Contributor Author

(zuul had some maintenance etc., but it looks like this PR-check chain is firing again now...)
From a build on another PR on this repo then, seems we still have some golang path type issues somewhere:

     BUILD    /home/zuul/go/src/github.com/kata-containers/runtime/kata-runtime
     GENERATE cli/config/configuration-qemu.toml
     GENERATE cli/config/configuration-fc.toml
     BUILD    /home/zuul/go/src/github.com/kata-containers/runtime/containerd-shim-kata-v2
     BUILD    /home/zuul/go/src/github.com/kata-containers/runtime/kata-netmon
~/go/src/github.com/kata-containers/tests
~/go/src/github.com/kata-containers/runtime ~/go/src/github.com/kata-containers/tests
INFO: Installing github.com/kata-containers/runtime in No GO command or GOPATH not set mode
golang.mk:60: *** "ERROR: golang minor version too old: got 1.6.2, need atleast 1.10".  Stop.

So, we built a lot of things, including the runtime, but failed to install it. I'll have a stare and try and work out what we are still missing. I suspect this might have something to do with installs being done with sudo as well.
/cc @chavafg

@grahamwhaley
Copy link
Contributor Author

@chavafg @jodh-intel , over in the tests .ci/lib.sh we have this function:

https://github.com/kata-containers/tests/blob/master/.ci/lib.sh#L90-L105

function build_and_install() {
	github_project="$1"
	make_target="$2"
	test_not_gopath_set="$3"

	build "${github_project}" "${make_target}"
	pushd "${GOPATH}/src/${github_project}"
	if [ "$test_not_gopath_set" = "true" ]; then
		info "Installing ${github_project} in No GO command or GOPATH not set mode"
		sudo -E KATA_RUNTIME="${KATA_RUNTIME}" make install
		[ $? -ne 0 ] && die "Fail to install ${github_project} in No GO command or GOPATH not set mode"
	fi
	info "Installing ${github_project}"
	sudo -E PATH="$PATH" KATA_RUNTIME="${KATA_RUNTIME}" make install
	popd
}

I'm wondering if that 'nogopath set' sudo should also have a PATH=$PATH in its -E section - and then it would pick up the correct version of go in the zuul run I think.
WDYT?

@chavafg
Copy link
Contributor

chavafg commented Apr 24, 2019

I'm wondering if that 'nogopath set' sudo should also have a PATH=$PATH in its -E section - and then it would pick up the correct version of go in the zuul run I think.

@grahamwhaley yes, seems like PATH=$PATH is missing.

@grahamwhaley
Copy link
Contributor Author

right, let me PR that @chavafg thx

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck
as kata-containers/tests#1498 landed to try and fix the go/path/sudo issue.

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck

@grahamwhaley
Copy link
Contributor Author

Hooray - that looks to have triggered (I see the status change here to pending for the relevant Zuul items). Now to keep an eye on the PR-check to see how it flies..

@grahamwhaley
Copy link
Contributor Author

@ttx @cboylan @fungi - that last PR-check trigger ended in a NODE_FAILURE, which then seems to not link to any logs I can look at:
https://zuul.opendev.org/t/kata-containers/builds?pipeline=PR-check
We seem to have had 4 of those node failures in a row now on the PR-check job - can any of you dig deeper and work out what is failing?

@cboylan
Copy link

cboylan commented May 1, 2019

@ttx @cboylan @fungi - that last PR-check trigger ended in a NODE_FAILURE, which then seems to not link to any logs I can look at:
https://zuul.opendev.org/t/kata-containers/builds?pipeline=PR-check
We seem to have had 4 of those node failures in a row now on the PR-check job - can any of you dig deeper and work out what is failing?

This state happens when zuul is unable to get a valid test node from Nodepool. Currently the ubuntu-xenial-vexxhost label which this job uses is only provided by the vexxhost-sjc1 region. Nodepool will attempt to boot a working node three times (as configured by our install) before giving up on it and attempting the next region (but there is no next region in this case).

Looking at logs I see there are ssh connection errors over ipv6 to the test node.

nodepool.exceptions.ConnectionTimeoutException: Timeout waiting for connection to 2604:e100:3:0:f816:3eff:fede:2089 on port 22

Yesterday we had dns resolution issues over ipv6 as well. Possible that these are growing pains with the new ipv6 connectivity in that region? @mnaser were you able to find anything on that issue? Possible that these are related?

As for what we can do: @mnaser is it possible to make the kata test resources available in ca-ymq-1 again so that we have multiple regions available to provide these resources? We'll also need to dig in further and see if this is an ipv6 specific issue. If so one option is to force nodepool to use ipv4.

@mnaser
Copy link
Member

mnaser commented May 5, 2019

I believe this was actually an issue due to a bad image that was uploaded.

There shouldn't be an issue at this point right now AFAIK.

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck
thanks - re-testing...

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck
afaict, the last PR-check ran, but seems to have had some sort of timeout, and I can't see an obvious thing to check in the ara report.
Let's try again and see if that was a one off or is reproducible.

@cboylan
Copy link

cboylan commented May 7, 2019

/zuul-recheck
afaict, the last PR-check ran, but seems to have had some sort of timeout, and I can't see an obvious thing to check in the ara report.
Let's try again and see if that was a one off or is reproducible.

It does indeed appear to be a timeout. By default you get half an hour in the run playbook. You can increase this if it needs to be longer though.

Here is an example of how to do that.

@grahamwhaley
Copy link
Contributor Author

Ah, thanks @cboylan - pretty sure we might need more than 30 minutes - I'll look at our historical record at present, probably add 50%, and then respin. thx!

@grahamwhaley
Copy link
Contributor Author

PR adding timeout submitted at https://review.opendev.org/#/c/657776/

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck
Looks like Zuul run failed the 'state test', which I think is a known sporadic CI failure case, so re-triggering:

[90m/home/zuul/go/src/github.com/kata-containers/tests/functional/state_test.go:26�[0m
  container
  �[90m/home/zuul/go/src/github.com/kata-containers/tests/vendor/github.com/onsi/ginkgo/extensions/table/table.go:92�[0m
    �[91m�[1mwith workload [true], timeWait 5 [It]�[0m
    �[90m/home/zuul/go/src/github.com/kata-containers/tests/vendor/github.com/onsi/ginkgo/extensions/table/table_entry.go:46�[0m

    �[91mExpected
        <int>: 1
    to equal
        <int>: 0�[0m

    /home/zuul/go/src/github.com/kata-containers/tests/functional/state_test.go:45
�[90m------------------------------�[0m


�[91m�[1mSummarizing 1 Failure:�[0m

�[91m�[1m[Fail] �[0m�[90mstate �[0m�[0mcontainer �[0m�[91m�[1m[It] with workload [true], timeWait 5 �[0m
�[37m/home/zuul/go/src/github.com/kata-containers/tests/functional/state_test.go:45�[0m

@grahamwhaley
Copy link
Contributor Author

/zuul-recheck

@grahamwhaley
Copy link
Contributor Author

Looks like the build/install works OK, but we seem to fail the first test run. @chavafg - any quick ideas about what might be happening here? Do you remember maybe any Zuul specific things we may have missed?
Logs are at http://logs.openstack.org/54/154/89c8bf8c82ff50332aba412046b3fdfa8ddc3f4d/PR-check/QA-check-Ubuntu-16.04/d4cd36c/ara-report/
Looks like we get one of the fairly anonymous 'grpc errors' on the first test.
I'll see if I can spot more in the logs, and find if the full logs got attached to the build. I'll also do an in-head diff between current Zuul setup and the old one in the other Zuul instance configs.

@egernst egernst added the do-not-merge PR has problems or depends on another label Jun 11, 2019
@egernst
Copy link
Member

egernst commented Jun 11, 2019

eric-stale-bot: so... what's next here?

@grahamwhaley
Copy link
Contributor Author

as per the last comment @egernst , the Zuul for Kata now builds and installs, but fails the first test when it runs - it is not obvious why, so it needs some hard diagnosis, and then if that fails either embelleshments to a PR to try and extract more info or try to produce locally or we capture a failing run on a vexxhost instance and hand-diagnose.

@grahamwhaley
Copy link
Contributor Author

We've moved on and shut this CI functionality down now. Closing this PR.

@mnaser
Copy link
Member

mnaser commented Mar 12, 2020

@grahamwhaley that's a bit of a bummer, is there a reason why it was shut down?

@grahamwhaley
Copy link
Contributor Author

Hi @mnaser . We still use Zuul to run a couple of the first-pass CI checks - looking for signed-off by tags and WIP/RFC labels for instance. The Zuul job we shut down was to try and run a full build/run CI test of Kata. It was a WIP that I didn't quite get to work some time (>1y) ago, and was only tied to this one repo as a test. I shut it as it was still 'alive', but would always post a 'fail' tag on the PRs, which just confused folks (a false negative in effect), and in effect was eating Zuul resource for no reason.

The Kata CI infra has been running on Jenkins since inception, and has been backed by a number of clouds - currently Azure I believe. We basically evaluated Zuul and held/hold it as a reserve option for if/when we found Jenkins did not scale or work for the project any longer - but, presently, Jenkins is coping.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

do-not-merge PR has problems or depends on another

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants