Skip to content
This repository was archived by the owner on Oct 24, 2023. It is now read-only.

E2E: rationalize node check + kube-system check, no kms#33

Merged
jackfrancis merged 20 commits intoAzure:masterfrom
jackfrancis:e2e-print-kubectl-get-nodes
Nov 21, 2018
Merged

E2E: rationalize node check + kube-system check, no kms#33
jackfrancis merged 20 commits intoAzure:masterfrom
jackfrancis:e2e-print-kubectl-get-nodes

Conversation

@jackfrancis
Copy link
Member

@jackfrancis jackfrancis commented Nov 16, 2018

What this PR does / why we need it: An active "intermittent zero byte /etc/kubernetes/azure.json on master nodes issue" reveals that we aren't properly (1) testing for the integrity of this cloudprovider file, and (2) our E2E tests are implemented in a way that masks this.

This PR re-arranges node readiness checks so that they more clearly communicate the state of nodes before testing, and during the E2E run.

We haven't root caused the azure.json issue, but we have repro'd it in isolation w/ KMS encryption enabled, so we're disabling that feature on standard cluster definitions.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #

Special notes for your reviewer:

Testing:

  • disabled rescheduler addon
  • disabled flexvol addons
  • disabled KMS <-- this is the one :)

Release note:

E2E: rationalize node check + kube-system check, no kms

@jackfrancis jackfrancis changed the title E2E: print nodes DO-NOT-MERGE E2E: print nodes Nov 16, 2018
@codecov
Copy link

codecov bot commented Nov 16, 2018

Codecov Report

Merging #33 into master will decrease coverage by 0.66%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master      #33      +/-   ##
==========================================
- Coverage   47.55%   46.89%   -0.67%     
==========================================
  Files          91       91              
  Lines       13974    13800     -174     
==========================================
- Hits         6646     6471     -175     
  Misses       6721     6721              
- Partials      607      608       +1

@jackfrancis jackfrancis changed the title DO-NOT-MERGE E2E: print nodes E2E: print nodes Nov 16, 2018
@acs-bot acs-bot added size/S and removed size/XS labels Nov 16, 2018
@tariq1890
Copy link
Contributor

/lgtm

@acs-bot acs-bot removed the lgtm label Nov 17, 2018
UseDeployCommand bool `envconfig:"USE_DEPLOY_COMMAND"`
GinkgoFocus string `envconfig:"GINKGO_FOCUS"`
GinkgoSkip string `envconfig:"GINKGO_SKIP"`
NewCluster bool `envconfig:"NEW_CLUSTER" default:"false"` // track when we create a new cluster via E2E
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this equivalent to Name == ""?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind, this is needed because you are using it after Name is set. I don't think we should make it an envconfig var though since its value will be set in https://github.com/Azure/aks-engine/pull/33/files#diff-881ae33659acba7c8c55e6efbf5edd3dR107 with os.Setenv("NEW_CLUSTER", "true"). This isn't a configurable input but a variable determined by other values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that's the only way that the env config parser works. It has a whitelist of env vars to parse and turn into a typed struct.

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon Nov 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it can follow the same pattern as CurrentWorkingDir

@jackfrancis jackfrancis added the bug Something isn't working label Nov 20, 2018
@jackfrancis jackfrancis changed the title E2E: print nodes Zero byte azure.json Nov 20, 2018
@acs-bot acs-bot added size/L and removed size/M labels Nov 20, 2018
"containers": [
{
"name": "tiller",
"image": "gcr.io/kubernetes-helm/tiller:v2.8.1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious. Why are we doing this? Defaults should populate the exact same thing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing

@acs-bot acs-bot added size/M and removed size/L labels Nov 21, 2018
@jackfrancis jackfrancis changed the title Zero byte azure.json E2E: rationalize node check + kube-system check Nov 21, 2018
}

if config.EnableKMSEncryption && config.ClientObjectID != "" {
if prop.OrchestratorProfile.KubernetesConfig == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nil panic vector in e2e runner if no kubernetesConfig object is present in input api model

@jackfrancis jackfrancis changed the title E2E: rationalize node check + kube-system check E2E: rationalize node check + kube-system check, no kms Nov 21, 2018
@jackfrancis
Copy link
Member Author

Filed #49 to mark KMS as buggy

@@ -74,14 +74,22 @@ type List struct {
// AreAllReady returns a bool depending on cluster state
func AreAllReady(nodeCount int) bool {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of these changes is to make this func more scrupulous: instead of "looking for specific failure conditions and returning true otherwise" we are now "looking for specific success conditions and returning false otherwise".

Copy link
Contributor

@CecileRobertMichon CecileRobertMichon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@acs-bot
Copy link

acs-bot commented Nov 21, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: CecileRobertMichon, jackfrancis, tariq1890

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [CecileRobertMichon,jackfrancis,tariq1890]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@jackfrancis jackfrancis merged commit 994fcd0 into Azure:master Nov 21, 2018
@jackfrancis jackfrancis deleted the e2e-print-kubectl-get-nodes branch November 21, 2018 16:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

approved bug Something isn't working lgtm size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants