Skip to content
This repository was archived by the owner on Oct 24, 2023. It is now read-only.

chore: create parallel Jenkins pipeline for e2e tests#1875

Merged
acs-bot merged 3 commits intoAzure:masterfrom
devigned:jenkins-pipeline
Aug 30, 2019
Merged

chore: create parallel Jenkins pipeline for e2e tests#1875
acs-bot merged 3 commits intoAzure:masterfrom
devigned:jenkins-pipeline

Conversation

@devigned
Copy link
Member

@devigned devigned commented Aug 29, 2019

Reason for Change:
We are moving from one Jenkins cluster to another and it would be nice to tidy up our collection of jobs. In the longer term, I suspect we'll move over to DevOps or Actions, but for the short term, hopefully, this will work.

This PR is wip as I would like to use it to gather feedback and incorporate it into the design of the job.

Current functionality

The Jenkinsfile does the following:

  • sets up default parameters for the build
  • produces a matrix of jobs; [orchestrator versions] x [api models]
    • lots of jobs
    • looks for api models, actually job_config.json, under the ./test/e2e/test_cluster_configs dir
      • job configs consist of {"env":{}, "options":{}, "apiModel":{}} which themselves can customize the orchestrator versions they are run for or other job environmental configuration. A good example of an interesting job config is the gpu config
  • runs the ./test/e2e/cluster.sh script to validate each cluster in parallel across available nodes (mostly courtesy of @jackfrancis)
  • persists _output and _logs as Jenkins artifacts
  • cleans up after the docker container which leaves behind root owned folders

Request for reviewers

  • Is the job testing too much? Are the runs not focused enough?
  • Should the job be structured differently? Is it too parallel? Is it too rigid?
  • What's missing? We are not currently scaling or upgrading. Should we in this job?
  • How long should the job run? What signal should be delivered and in what time frame?

Requirements:

@codecov
Copy link

codecov bot commented Aug 29, 2019

Codecov Report

Merging #1875 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1875   +/-   ##
=======================================
  Coverage   76.39%   76.39%           
=======================================
  Files         134      134           
  Lines       20245    20245           
=======================================
  Hits        15467    15467           
  Misses       3861     3861           
  Partials      917      917

@devigned devigned force-pushed the jenkins-pipeline branch 2 times, most recently from 2abce6d to 0a9a9e1 Compare August 30, 2019 00:16
@devigned
Copy link
Member Author

Ok, by adding the ws("${env.JOB_NAME}-${jobName}") {...} the jobs seems to be functioning as intended. I think this is a good start toward a more robust and thorough test suite. There are certainly improvements to be had, but seems like a good starting point. Thoughts? Ready to merge?

@devigned
Copy link
Member Author

devigned commented Aug 30, 2019

Ok, the k8s-matrix job run 51 produced the output I would expect. We have 3 repeated failures (v1.16/flannel/docker, v1.16/network_policy/azure and v1.13/network_policy/callico). All other configurations passed the test suite.

v1.16/flannel/docker (legit error)

10:18:31  $ k get nodes -o json
10:18:46  2019/08/30 05:18:45 NAME                                 STATUS     ROLES    AGE   VERSION          INTERNAL-IP    EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
10:18:46  k8s-agentpool1-22867765-vmss000000   NotReady   agent    21m   v1.16.0-beta.1   10.240.0.4     <none>        Ubuntu 16.04.6 LTS   4.15.0-1055-azure   docker://3.0.6
10:18:46  k8s-agentpool1-22867765-vmss000001   NotReady   agent    21m   v1.16.0-beta.1   10.240.0.5     <none>        Ubuntu 16.04.6 LTS   4.15.0-1055-azure   docker://3.0.6
10:18:46  k8s-agentpool1-22867765-vmss000002   NotReady   agent    21m   v1.16.0-beta.1   10.240.0.6     <none>        Ubuntu 16.04.6 LTS   4.15.0-1055-azure   docker://3.0.6
10:18:46  k8s-master-22867765-0                NotReady   master   21m   v1.16.0-beta.1   10.240.255.5   <none>        Ubuntu 16.04.6 LTS   4.15.0-1055-azure   docker://3.0.6
10:18:46  

v1.16/network_policy/azure (test error, maybe legit)

10:37:31  �[1mSTEP�[0m: Ensuring that the correct resources have been applied for azure-npm
10:37:32  
10:37:32  �[91m�[1m•! Panic [94.121 seconds]�[0m
10:37:32  Azure Container Cluster using the Kubernetes Orchestrator
10:37:32  �[90m/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:121�[0m
10:37:32    regardless of agent pool type
10:37:32    �[90m/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:122�[0m
10:37:32      �[91m�[1mshould have addons running [It]�[0m
10:37:32      �[90m/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:678�[0m
10:37:32  
10:37:32      �[91m�[1mTest Panicked�[0m
10:37:32      �[91mruntime error: index out of range�[0m
10:37:32      /usr/local/go/src/runtime/panic.go:44
10:37:32  
10:37:32      �[91mFull Stack Trace�[0m
10:37:32      	/usr/local/go/src/runtime/panic.go:522 +0x1b5
10:37:32      github.com/Azure/aks-engine/test/e2e/kubernetes.glob..func2.1.22()
10:37:32      	/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:703 +0xdde
10:37:32      github.com/Azure/aks-engine/vendor/github.com/onsi/ginkgo/internal/leafnodes.(*runner).runSync(0xc00035aa20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
10:37:32      	/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_suite_test.go:17 +0x10e
10:37:32      testing.tRunner(0xc000105600, 0xaf03b8)
10:37:32      	/usr/local/go/src/testing/testing.go:865 +0xc0
10:37:32      created by testing.(*T).Run
10:37:32      	/usr/local/go/src/testing/testing.go:916 +0x35a
10:37:32      
10:37:32  �[90m------------------------------�[0m
10:37:32  �[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m
10:37:32  
10:37:32  �[91m�[1mSummarizing 1 Failure:�[0m
10:37:32  
10:37:32  �[91m�[1m[Panic!] �[0m�[90mAzure Container Cluster using the Kubernetes Orchestrator �[0m�[0mregardless of agent pool type �[0m�[91m�[1m[It] should have addons running �[0m
10:37:32  �[37m/usr/local/go/src/runtime/panic.go:44�[0m

v1.13/network_policy/calico (test error, maybe legit)

10:33:30  �[91m�[1m• Failure [43.641 seconds]�[0m
10:33:30  Azure Container Cluster using the Kubernetes Orchestrator
10:33:30  �[90m/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:121�[0m
10:33:30    regardless of agent pool type
10:33:30    �[90m/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:122�[0m
10:33:30      �[91m�[1mshould have core kube-system componentry running [It]�[0m
10:33:30      �[90m/go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:625�[0m
10:33:30  
10:33:30      �[91mExpected error:
10:33:30          <*exec.ExitError | 0xc00000fb40>: {
10:33:30              ProcessState: {
10:33:30                  pid: 4386,
10:33:30                  status: 256,
10:33:30                  rusage: {
10:33:30                      Utime: {Sec: 0, Usec: 148541},
10:33:30                      Stime: {Sec: 0, Usec: 13699},
10:33:30                      Maxrss: 40212,
10:33:30                      Ixrss: 0,
10:33:30                      Idrss: 0,
10:33:30                      Isrss: 0,
10:33:30                      Minflt: 1729,
10:33:30                      Majflt: 0,
10:33:30                      Nswap: 0,
10:33:30                      Inblock: 0,
10:33:30                      Oublock: 8,
10:33:30                      Msgsnd: 0,
10:33:30                      Msgrcv: 0,
10:33:30                      Nsignals: 0,
10:33:30                      Nvcsw: 888,
10:33:30                      Nivcsw: 373,
10:33:30                  },
10:33:30              },
10:33:30              Stderr: nil,
10:33:30          }
10:33:30          exit status 1
10:33:30      not to have occurred�[0m
10:33:30  
10:33:30      /go/src/github.com/Azure/aks-engine/test/e2e/kubernetes/kubernetes_test.go:633
10:33:30  �[90m------------------------------�[0m
10:33:30  �[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m�[36mS�[0m
10:33:30  
10:33:30  �[91m�[1mSummarizing 1 Failure:�[0m
10:33:30  
10:33:30  �[91m�[1m[Fail] �[0m�[90mAzure Container Cluster using the Kubernetes Orchestrator �[0m�[0mregardless of agent pool type �[0m�[91m�[1m[It] should have core kube-system componentry running �[0m

@devigned devigned changed the title [wip] chore: create parallel Jenkins pipeline for e2e tests chore: create parallel Jenkins pipeline for e2e tests Aug 30, 2019
Copy link
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot acs-bot added the lgtm label Aug 30, 2019
@acs-bot acs-bot merged commit 5bca2b4 into Azure:master Aug 30, 2019
@acs-bot
Copy link

acs-bot commented Aug 30, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: devigned, jackfrancis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [devigned,jackfrancis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants