[WIP: DNM] CI: zuul: add fake ZUUL.md to check Zuul CI#154
[WIP: DNM] CI: zuul: add fake ZUUL.md to check Zuul CI#154grahamwhaley wants to merge 1 commit intokata-containers:masterfrom
Conversation
Add a fake fixes line as well ;-) Fixes: kata-containers#1 Signed-off-by: Graham Whaley <graham.whaley@intel.com>
|
Yay, I see a |
|
ftr, the job is failing to run like: + export WORKSPACE=/home/zuul
+ WORKSPACE=/home/zuul
+ export GOPATH=/home/zuul/go
+ GOPATH=/home/zuul/go
+ export CI=true
+ CI=true
+ export ZUUL=true
+ ZUUL=true
+ export PATH=/home/zuul/go/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+ PATH=/home/zuul/go/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+ .ci/install_go.sh -p -f
/bin/bash: line 8: .ci/install_go.sh: No such file or directory
+ .ci/setup.sh
.ci/lib.sh: line 18: go: command not found
.ci/setup.sh: line 14: pushd: /home/zuul/go/src/github.com/kata-containers/tests: No such file or directoryI need to go stare at the code to work out where/why the tests repo code is not present when we executed. I probably missed something (obvious) when translating/re-orging from the old Zuul jobs. |
|
Ah, I see .... this is invoking the In the former Zuul scripts they had a distro specific pre-script that pulled the minimum components required, including golang. That was in a zuul role before, and hard wired the go version to 1.10, which is not ideal. Let me go have a think/look at that - unless @ttx or @chavafg want to have a peek - but, I know they (as well as I) are all a bit busy right now - I'll try to chip away at this over the next week... |
|
hmm, seems like there is also something wrong with our scripts... |
|
@grahamwhaley is there any link to the failed job? I don't see it here |
|
@chavafg - it's not .ci/lib.sh: line 18: go: command not foundwhich took me a moment to realise. See: |
|
@chavafg - yeah, let me go write that wiki page the build links to - but, in the mean time, try: |
|
yeah, you are right, we are tying to One option would be to add |
|
recheck |
|
/zuul-recheck |
|
Looks like I have a path wrong somewhere to invoke the go installation: + PATH=/home/zuul/go/bin:/usr/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games
+ .ci/install_go.sh -p -f
/bin/bash: line 8: .ci/install_go.sh: No such file or directoryI'll go stare and PR to Zuul. |
|
OK, I think this is a golang version issue. We incorrectly (still) try to run the If that works, then we (I) can rework the Zuul code to cater for that. |
|
/zuul-recheck |
|
Update then. With the newly landed kata-containers/tests#1465 We do now see the new golang installed: But, we still fail the agent build: I suspect a GOPATH setup item somewhere. Any immediate thoughts @chavafg ? I'll go dig later otherwise. |
|
/zuul-recheck |
|
/zuul-recheck as it doesn't appear to have triggered last time - I can find no build record on Zuul... |
|
(zuul had some maintenance etc., but it looks like this PR-check chain is firing again now...) So, we built a lot of things, including the runtime, but failed to install it. I'll have a stare and try and work out what we are still missing. I suspect this might have something to do with installs being done with |
|
@chavafg @jodh-intel , over in the tests .ci/lib.sh we have this function: https://github.com/kata-containers/tests/blob/master/.ci/lib.sh#L90-L105 function build_and_install() {
github_project="$1"
make_target="$2"
test_not_gopath_set="$3"
build "${github_project}" "${make_target}"
pushd "${GOPATH}/src/${github_project}"
if [ "$test_not_gopath_set" = "true" ]; then
info "Installing ${github_project} in No GO command or GOPATH not set mode"
sudo -E KATA_RUNTIME="${KATA_RUNTIME}" make install
[ $? -ne 0 ] && die "Fail to install ${github_project} in No GO command or GOPATH not set mode"
fi
info "Installing ${github_project}"
sudo -E PATH="$PATH" KATA_RUNTIME="${KATA_RUNTIME}" make install
popd
}I'm wondering if that 'nogopath set' |
@grahamwhaley yes, seems like |
|
right, let me PR that @chavafg thx |
|
/zuul-recheck |
|
/zuul-recheck |
|
Hooray - that looks to have triggered (I see the status change here to pending for the relevant Zuul items). Now to keep an eye on the |
|
@ttx @cboylan @fungi - that last |
This state happens when zuul is unable to get a valid test node from Nodepool. Currently the ubuntu-xenial-vexxhost label which this job uses is only provided by the vexxhost-sjc1 region. Nodepool will attempt to boot a working node three times (as configured by our install) before giving up on it and attempting the next region (but there is no next region in this case). Looking at logs I see there are ssh connection errors over ipv6 to the test node. Yesterday we had dns resolution issues over ipv6 as well. Possible that these are growing pains with the new ipv6 connectivity in that region? @mnaser were you able to find anything on that issue? Possible that these are related? As for what we can do: @mnaser is it possible to make the kata test resources available in ca-ymq-1 again so that we have multiple regions available to provide these resources? We'll also need to dig in further and see if this is an ipv6 specific issue. If so one option is to force nodepool to use ipv4. |
|
I believe this was actually an issue due to a bad image that was uploaded. There shouldn't be an issue at this point right now AFAIK. |
|
/zuul-recheck |
|
/zuul-recheck |
It does indeed appear to be a timeout. By default you get half an hour in the run playbook. You can increase this if it needs to be longer though. Here is an example of how to do that. |
|
Ah, thanks @cboylan - pretty sure we might need more than 30 minutes - I'll look at our historical record at present, probably add 50%, and then respin. thx! |
|
PR adding timeout submitted at https://review.opendev.org/#/c/657776/ |
|
/zuul-recheck |
|
/zuul-recheck |
|
/zuul-recheck |
|
Looks like the build/install works OK, but we seem to fail the first test run. @chavafg - any quick ideas about what might be happening here? Do you remember maybe any Zuul specific things we may have missed? |
|
eric-stale-bot: so... what's next here? |
|
as per the last comment @egernst , the Zuul for Kata now builds and installs, but fails the first test when it runs - it is not obvious why, so it needs some hard diagnosis, and then if that fails either embelleshments to a PR to try and extract more info or try to produce locally or we capture a failing run on a vexxhost instance and hand-diagnose. |
|
We've moved on and shut this CI functionality down now. Closing this PR. |
|
@grahamwhaley that's a bit of a bummer, is there a reason why it was shut down? |
|
Hi @mnaser . We still use Zuul to run a couple of the first-pass CI checks - looking for signed-off by tags and WIP/RFC labels for instance. The Zuul job we shut down was to try and run a full build/run CI test of Kata. It was a WIP that I didn't quite get to work some time (>1y) ago, and was only tied to this one repo as a test. I shut it as it was still 'alive', but would always post a 'fail' tag on the PRs, which just confused folks (a false negative in effect), and in effect was eating Zuul resource for no reason. The Kata CI infra has been running on Jenkins since inception, and has been backed by a number of clouds - currently Azure I believe. We basically evaluated Zuul and held/hold it as a reserve option for if/when we found Jenkins did not scale or work for the project any longer - but, presently, Jenkins is coping. |
Previous already open zuul test PR is getting a bit long in the tooth and collecting some fluff. Make a new one to test out the new Zuul integration.