Skip to content
This repository was archived by the owner on Oct 24, 2023. It is now read-only.

feat: use go template for cluster resource names #552

Closed
sylr wants to merge 5 commits intoAzure:masterfrom
sylr:resources-name-template
Closed

feat: use go template for cluster resource names #552
sylr wants to merge 5 commits intoAzure:masterfrom
sylr:resources-name-template

Conversation

@sylr
Copy link
Contributor

@sylr sylr commented Feb 21, 2019

New attempt at #56

Follow up of #202

@acs-bot
Copy link

acs-bot commented Feb 21, 2019

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sylr
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: jackfrancis

If they are not already assigned, you can assign the PR to them by writing /assign @jackfrancis in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@sylr sylr force-pushed the resources-name-template branch from a109faf to 97cb720 Compare February 21, 2019 14:26
@sylr sylr force-pushed the resources-name-template branch from 0182c40 to 4e5ec7d Compare February 21, 2019 15:10
@sylr sylr marked this pull request as ready for review February 21, 2019 15:18
@sylr
Copy link
Contributor Author

sylr commented Feb 21, 2019

@jackfrancis @CecileRobertMichon @tariq1890 could you please review this.

It is a fairly big change in terms of volume of code so I would appreciate if we could move along quickly so that it does not conflict with other PR.

Thank you.

@jackfrancis jackfrancis force-pushed the resources-name-template branch from 4e5ec7d to 039c7ed Compare February 21, 2019 16:51
@sylr sylr force-pushed the resources-name-template branch from 039c7ed to 92e6c89 Compare February 21, 2019 17:04
@mboersma
Copy link
Member

@sylr could you git commit --amend at least the first commit so it starts with a category and a colon? If it's not a semantic commit it won't show up in the CHANGELOG. Something like:

feat: use go template for cluster resource names

@jackfrancis
Copy link
Member

Also FYI this is was rebased/force-pushed after master was changed to no longer require generated code.

Sylvain Rabot added 3 commits February 21, 2019 20:43
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
@sylr sylr force-pushed the resources-name-template branch from 92e6c89 to efe2c6e Compare February 21, 2019 19:44
@sylr
Copy link
Contributor Author

sylr commented Feb 21, 2019

@mboersma Done.

@jackfrancis
Copy link
Member

This change is not backwards compatible. See this upgrade operation on this branch against a cluster built w/ v0.31.1:

INFO[0004] Upgrading cluster with name suffix: 10648402 
time="2019-02-21T22:15:55Z" level=info msg="Evaluating VM: k8s-agent1-10648402-0 in pool: agent1..."
time="2019-02-21T22:15:55Z" level=info msg="Adding Agent VM: k8s-agent1-10648402-0, orchestrator: Kubernetes:1.10.13 to pool: agent1 (AgentVMs)\n"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating VM: k8s-agent2-10648402-0 in pool: agent2..."
time="2019-02-21T22:15:55Z" level=info msg="Adding Agent VM: k8s-agent2-10648402-0, orchestrator: Kubernetes:1.10.13 to pool: agent2 (AgentVMs)\n"
time="2019-02-21T22:15:55Z" level=info msg="Master VM name: k8s-master-10648402-0, orchestrator: 1.10.13 (MasterVMs)\n"
time="2019-02-21T22:15:55Z" level=info msg="Master VM name: k8s-master-10648402-1, orchestrator: 1.10.13 (MasterVMs)\n"
time="2019-02-21T22:15:55Z" level=info msg="Master VM name: k8s-master-10648402-2, orchestrator: 1.10.13 (MasterVMs)\n"
time="2019-02-21T22:15:55Z" level=info msg="Upgrading to Kubernetes version 1.11.7\n"
time="2019-02-21T22:15:55Z" level=info msg="Master nodes StorageProfile: ManagedDisks"
time="2019-02-21T22:15:55Z" level=info msg="Prepping master nodes for upgrade..."
time="2019-02-21T22:15:55Z" level=info msg="Resource count before running NormalizeResourcesForK8sMasterUpgrade: 19"
time="2019-02-21T22:15:55Z" level=info msg="Removing nic: [concat(variables('agent1VMNamePrefix'), 'nic-', copyIndex(variables('agent1Offset')))] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if agent pool: agent1, resource: [concat(variables('agent1VMNamePrefix'), copyIndex(variables('agent1Offset')))] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="agentPoolsToPreserve: map[]..."
time="2019-02-21T22:15:55Z" level=info msg="Removing agent pool: agent1, resource: [concat(variables('agent1VMNamePrefix'), copyIndex(variables('agent1Offset')))] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if extension: [concat(variables('agent1VMNamePrefix'), copyIndex(variables('agent1Offset')),'/cse', '-agent-', copyIndex(variables('agent1Offset')))] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="agentPoolsToPreserve: map[]..."
time="2019-02-21T22:15:55Z" level=info msg="Removing extension: [concat(variables('agent1VMNamePrefix'), copyIndex(variables('agent1Offset')),'/cse', '-agent-', copyIndex(variables('agent1Offset')))] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if extension: [concat(variables('agent1VMNamePrefix'), copyIndex(variables('agent1Offset')), '/computeAksLinuxBilling')] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="agentPoolsToPreserve: map[]..."
time="2019-02-21T22:15:55Z" level=info msg="Removing extension: [concat(variables('agent1VMNamePrefix'), copyIndex(variables('agent1Offset')), '/computeAksLinuxBilling')] from template"
time="2019-02-21T22:15:55Z" level=info msg="Removing nic: [concat(variables('agent2VMNamePrefix'), 'nic-', copyIndex(variables('agent2Offset')))] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if agent pool: agent2, resource: [concat(variables('agent2VMNamePrefix'), copyIndex(variables('agent2Offset')))] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="agentPoolsToPreserve: map[]..."
time="2019-02-21T22:15:55Z" level=info msg="Removing agent pool: agent2, resource: [concat(variables('agent2VMNamePrefix'), copyIndex(variables('agent2Offset')))] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if extension: [concat(variables('agent2VMNamePrefix'), copyIndex(variables('agent2Offset')),'/cse', '-agent-', copyIndex(variables('agent2Offset')))] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="agentPoolsToPreserve: map[]..."
time="2019-02-21T22:15:55Z" level=info msg="Removing extension: [concat(variables('agent2VMNamePrefix'), copyIndex(variables('agent2Offset')),'/cse', '-agent-', copyIndex(variables('agent2Offset')))] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if extension: [concat(variables('agent2VMNamePrefix'), copyIndex(variables('agent2Offset')), '/computeAksLinuxBilling')] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="agentPoolsToPreserve: map[]..."
time="2019-02-21T22:15:55Z" level=info msg="Removing extension: [concat(variables('agent2VMNamePrefix'), copyIndex(variables('agent2Offset')), '/computeAksLinuxBilling')] from template"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if agent pool: master, resource: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')))] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if extension: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')),'/cse', '-master-', copyIndex(variables('masterOffset')))] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="Evaluating if extension: [concat(variables('masterVMNamePrefix'), copyIndex(variables('masterOffset')), '/computeAksLinuxBilling')] needs to be removed"
time="2019-02-21T22:15:55Z" level=info msg="Resource count after running NormalizeResourcesForK8sMasterUpgrade: 11"
time="2019-02-21T22:15:55Z" level=info msg="Total expected master count: 3"
time="2019-02-21T22:15:55Z" level=info msg="Master nodes that need to be upgraded: 3"
time="2019-02-21T22:15:55Z" level=info msg="Master nodes that have been upgraded: 0"
time="2019-02-21T22:15:55Z" level=info msg="Starting upgrade of master nodes..."
time="2019-02-21T22:15:55Z" level=info msg="masterNodesInCluster: 3"
time="2019-02-21T22:15:55Z" level=info msg="Upgrading Master VM: k8s-master-10648402-0"
time="2019-02-21T22:15:55Z" level=info msg="fetching VM: kubernetes-northcentralus-59941/k8s-master-10648402-0"
time="2019-02-21T22:15:55Z" level=info msg="found nic name for VM (kubernetes-northcentralus-59941/k8s-master-10648402-0): k8s-master-10648402-nic-0"
time="2019-02-21T22:15:55Z" level=info msg="deleting VM: kubernetes-northcentralus-59941/k8s-master-10648402-0"
time="2019-02-21T22:15:55Z" level=info msg="waiting for vm deletion: kubernetes-northcentralus-59941/k8s-master-10648402-0"
time="2019-02-21T22:17:26Z" level=info msg="deleting nic: kubernetes-northcentralus-59941/k8s-master-10648402-nic-0"
time="2019-02-21T22:17:26Z" level=info msg="waiting for nic deletion: kubernetes-northcentralus-59941/k8s-master-10648402-nic-0"
time="2019-02-21T22:17:27Z" level=info msg="deleting managed disk: kubernetes-northcentralus-59941/k8s-master-10648402-0_OsDisk_1_fbcaf25c65294f41ac39a8280e85a78a"
time="2019-02-21T22:17:28Z" level=info msg="Master offset: 0\n"
time="2019-02-21T22:17:28Z" level=info msg="Master pool set count to: 1 temporarily during upgrade...\n"
INFO[0097] Starting ARM Deployment (master-19-02-21T22.17.28-1469035424). This will take some time... 
time="2019-02-21T22:18:10Z" level=error msg="Error creating upgraded master VM: k8s-master-10648402-0, err: Code=\"DeploymentFailed\" Message=\"At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details.\" Details=[{\"code\":\"BadRequest\",\"message\":\"{\\r\\n  \\\"error\\\": {\\r\\n    \\\"code\\\": \\\"PrivateIPAddressIsAllocated\\\",\\r\\n    \\\"message\\\": \\\"IP configuration /subscriptions/3014546b-7d1c-4f80-8523-f24a9976fe6a/resourceGroups/kubernetes-northcentralus-59941/providers/Microsoft.Network/loadBalancers/k8s-master-10648402-internal-lb/frontendIPConfigurations/k8s-master-10648402-internal-lb-frontend is using the private IP address 10.239.255.249 which is already allocated to resource /subscriptions/3014546b-7d1c-4f80-8523-f24a9976fe6a/resourceGroups/kubernetes-northcentralus-59941/providers/Microsoft.Network/loadBalancers/k8s-master-internal-lb-10648402.\\\",\\r\\n    \\\"details\\\": []\\r\\n  }\\r\\n}\"},{\"code\":\"BadRequest\",\"message\":\"{\\r\\n  \\\"error\\\": {\\r\\n    \\\"code\\\": \\\"DnsRecordInUse\\\",\\r\\n    \\\"message\\\": \\\"DNS record kubernetes-northcentralus-59941.northcentralus.cloudapp.azure.com is already used by another public IP.\\\",\\r\\n    \\\"details\\\": []\\r\\n  }\\r\\n}\"}]"
ERRO[0139] Finished ARM Deployment (master-19-02-21T22.17.28-1469035424). Error: Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details." Details=[{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"PrivateIPAddressIsAllocated\",\r\n    \"message\": \"IP configuration /subscriptions/3014546b-7d1c-4f80-8523-f24a9976fe6a/resourceGroups/kubernetes-northcentralus-59941/providers/Microsoft.Network/loadBalancers/k8s-master-10648402-internal-lb/frontendIPConfigurations/k8s-master-10648402-internal-lb-frontend is using the private IP address 10.239.255.249 which is already allocated to resource /subscriptions/3014546b-7d1c-4f80-8523-f24a9976fe6a/resourceGroups/kubernetes-northcentralus-59941/providers/Microsoft.Network/loadBalancers/k8s-master-internal-lb-10648402.\",\r\n    \"details\": []\r\n  }\r\n}"},{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"DnsRecordInUse\",\r\n    \"message\": \"DNS record kubernetes-northcentralus-59941.northcentralus.cloudapp.azure.com is already used by another public IP.\",\r\n    \"details\": []\r\n  }\r\n}"}] 
FATA[0139] Error upgrading cluster: Code="DeploymentFailed" Message="At least one resource deployment operation failed. Please list deployment operations for details. Please see https://aka.ms/arm-debug for usage details." Details=[{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"PrivateIPAddressIsAllocated\",\r\n    \"message\": \"IP configuration /subscriptions/3014546b-7d1c-4f80-8523-f24a9976fe6a/resourceGroups/kubernetes-northcentralus-59941/providers/Microsoft.Network/loadBalancers/k8s-master-10648402-internal-lb/frontendIPConfigurations/k8s-master-10648402-internal-lb-frontend is using the private IP address 10.239.255.249 which is already allocated to resource /subscriptions/3014546b-7d1c-4f80-8523-f24a9976fe6a/resourceGroups/kubernetes-northcentralus-59941/providers/Microsoft.Network/loadBalancers/k8s-master-internal-lb-10648402.\",\r\n    \"details\": []\r\n  }\r\n}"},{"code":"BadRequest","message":"{\r\n  \"error\": {\r\n    \"code\": \"DnsRecordInUse\",\r\n    \"message\": \"DNS record kubernetes-northcentralus-59941.northcentralus.cloudapp.azure.com is already used by another public IP.\",\r\n    \"details\": []\r\n  }\r\n}"}] 

The api model used to build the original cluster:

{   "apiVersion": "vlabs",   "properties": {     "orchestratorProfile": {       "orchestratorType": "Kubernetes", "orchestratorRelease": "1.10",       "kubernetesConfig": {         "clusterSubnet": "10.239.0.0/16",         "addons": [           {             "name": "tiller",             "enabled": true,             "config": {               "max-history": "10"             },             "containers": [               {                 "name": "tiller",                 "cpuRequests": "1",                 "memoryRequests": "1Gi",                 "cpuLimits": "1",                 "memoryLimits": "1Gi"               }             ]           },           {             "name": "kubernetes-dashboard",             "enabled": true,             "containers": [               {                 "name": "kubernetes-dashboard",                 "cpuRequests": "50m",                 "memoryRequests": "512Mi",                 "cpuLimits": "50m",                 "memoryLimits": "512Mi"               }             ]           },           {             "name": "rescheduler",             "enabled": true,             "containers": [               {                 "name": "rescheduler",                 "cpuRequests": "20m",                 "memoryRequests": "200Mi",                 "cpuLimits": "20m",                 "memoryLimits": "200Mi"               }             ]           }         ]       }     },     "masterProfile": {       "count": 3,       "dnsPrefix": "",       "vmSize": "Standard_D2_v2",       "OSDiskSizeGB": 200,       "vnetSubnetId": "/subscriptions/SUB_ID/resourceGroups/RG_NAME/providers/Microsoft.Network/virtualNetworks/VNET_NAME/subnets/SUBNET_NAME",       "firstConsecutiveStaticIP": "10.239.255.239",       "vnetCidr": "10.239.0.0/16"     },     "agentPoolProfiles": [       {         "name": "agent1",         "count": 1,         "vmSize": "Standard_D2_v2",         "OSDiskSizeGB": 200,         "storageProfile": "ManagedDisks",         "diskSizesGB": [           128,           128,           128,           128         ],         "availabilityProfile": "AvailabilitySet",         "vnetSubnetId": "/subscriptions/SUB_ID/resourceGroups/RG_NAME/providers/Microsoft.Network/virtualNetworks/VNET_NAME/subnets/SUBNET_NAME"       },       {         "name": "agent2",         "count": 1,         "vmSize": "Standard_D2_v3",         "OSDiskSizeGB": 200,         "storageProfile": "ManagedDisks",         "diskSizesGB": [           128         ],         "availabilityProfile": "AvailabilitySet",         "vnetSubnetId": "/subscriptions/SUB_ID/resourceGroups/RG_NAME/providers/Microsoft.Network/virtualNetworks/VNET_NAME/subnets/SUBNET_NAME"       }     ],     "linuxProfile": {       "adminUsername": "azureuser",       "ssh": {         "publicKeys": [           {             "keyData": ""           }         ]       }     },     "servicePrincipalProfile": {       "clientId": "",       "secret": ""     }   } }

@serbrech
Copy link
Member

serbrech commented Feb 22, 2019

We read the vm name suffix from the azuredeploy.json arm template and use it to find the VMs to upgrade in the targeted resource group. This means that upgrading an existing cluster with this new naming convention will not reliably find the VMs because the naming pattern is not matching what the ARM Template contains.

see :

for _, vm := range vmListPage.Values() {
// Windows VMs contain a substring of the name suffix
if !strings.Contains(*(vm.Name), uc.NameSuffix) && !strings.Contains(*(vm.Name), uc.NameSuffix[:4]+"k8s") {
uc.Logger.Infof("Skipping VM: %s for upgrade as it does not belong to cluster with expected name suffix: %s\n",
*vm.Name, uc.NameSuffix)
continue
}

@sylr
Copy link
Contributor Author

sylr commented Feb 22, 2019

@serbrech I removed that https://github.com/Azure/aks-engine/pull/552/files#diff-95227b07e97ba7cbd04cfd8b2cdb6980L168

I think I'll remove the selection of the VM based on resource name and replace it with tag matching.

Sylvain Rabot added 2 commits February 22, 2019 15:24
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
Signed-off-by: Sylvain Rabot <s.rabot@lectra.com>
@CecileRobertMichon CecileRobertMichon changed the title Resources name template feat: use go template for cluster resource names Feb 27, 2019
@CecileRobertMichon
Copy link
Contributor

I don't think we should add a dependency on VM tags in upgrade. VM tags are often changed by users and are not sticky. Removing them will cause a no-op upgrade. We ran into the same issue in the past which generated a large amount of support tickets for AKS because customers' VMs weren't getting upgraded when they had changed/removed the VM tags and we eventually worked to remove that dependency. @mboersma might be able to bring more context since he implemented the PR to fix it.

See Azure/acs-engine#3663 for background.

@sylr
Copy link
Contributor Author

sylr commented Feb 27, 2019

This is becoming too difficult for me to implement something that will comply with AKS needs which I'm completely unaware of.

I'm currently happy with the PR as it for the needs of my company and I can hardly make the case internally to spend more time working on this as it will not directly benefit my company. It would have been nice to have this feature merged but I believe I will have to make it live in a soft fork and rebase it when needed.

@jackfrancis
Copy link
Member

@sylr Totally understood. Do we have a comprehensive statement of functional requirements? We can file an issue and gather interest in terms of priority there.

As you point out, it's difficult for you to have a full grasp of the back-compat ramifications as they relate to AKS Engine being a downstream dependency for AKS, and implementation of this work may have to be guided by AKS.

@stale
Copy link

stale bot commented Mar 29, 2019

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 29, 2019
@stale stale bot closed this Apr 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants