diff --git a/docs/deployment/Azure/cloud_init_readme.md b/docs/deployment/Azure/cloud_init_readme.md index 0882e863e..c7995d0d9 100755 --- a/docs/deployment/Azure/cloud_init_readme.md +++ b/docs/deployment/Azure/cloud_init_readme.md @@ -74,13 +74,26 @@ If you run into a deployment issue, please check [here](FAQ.md) first. `./ctl.py` is the main file used for maintaining a cluster. It provides several handy tools that provide more convenient interface for developers. We introduce some commands below: +## back up and restore cluster information +``` +./ctl.py backuptodir (e.g., ./ctl.py backuptodir ~/Deployment/Azure-EASTUS-V100) +./ctl.py restorefromdir (e.g., ./ctl.py restorefromdir ~/Deployment/Azure-EASTUS-V100) +``` +Rendered files and binaries would often occupy quite some space on disk. We don't need most of those files after the deployment. We backup several yaml files: + 1. config.yaml, which describes the "fixed and firm" configuration of a cluster, such as NSG rules, alert email addresses, docker registry etc. + 2. action.yaml, which describes one-time deployment action. + 3. status.yaml, which describes the up-to-date machine info of the cluster. Whoever changed the cluster(added/removed machines, etc.) last would be responsible of updating this file and backup/let colleagues know. +besides yamls, we also need a cluster ID, sshkey and k8s basic authentication. These are all fixed, and independent of later deployment. + ## adding more machines This might be the only maintain task where need `cloud_init_aztools.py` instead of `ctl.py`. -To add more machines (worker/elasticsearch nodes etc., NFS/infra nodes not supported), re-configure `azure_cluster.virtual_machines` in `config.yaml` and use below command: +To add more machines(multi-infra not supported now), either +1. re-configure `azure_cluster.virtual_machines` in `config.yaml`, (leave/uncomment only the items corresponding to machines you want to add, delete/comment previously existing items that were used to generate machine list for previous deployment/maintanence) and use below command: ``` ./cloud_init_aztools.py prerender ``` -to generate new machine list. You can also edit `az_complementary.yaml` directly. +or +2. Edit `action.yaml` directly, keep only the machine items that you want to deploy for this time. You may want to save the previous config files in advance. After reconfiguration, you may use below commands to finish the new deployment of several nodes to the existing cluster: @@ -108,17 +121,6 @@ specify "dynamic_worker_num" in config.yaml, and use `./cloud_init_aztools.py dynamic_around`. the monitoring frequency is specified by "monitor_again_after" in config.yaml -## back up and restore cluster information -``` -./ctl.py backuptodir (e.g., ./ctl.py backuptodir ~/Deployment/Azure-EASTUS-V100) -./ctl.py restorefromdir (e.g., ./ctl.py restorefromdir ~/Deployment/Azure-EASTUS-V100) -``` -Rendered files and binaries would often occupy quite some space on disk. We don't need most of those files after the deployment. We backup several yaml files: - 1. config.yaml, which describes the "fixed and firm" configuration of a cluster, such as NSG rules, alert email addresses, docker registry etc. - 2. az_complementary.yaml, which describes one-time deployment action. - 3. status.yaml, which describes the up-to-date machine info of the cluster. Whoever changed the cluster(added/removed machines, etc.) last would be responsible of updating this file and backup/let colleagues know. -besides yamls, we also need a cluster ID, sshkey and k8s basic authentication. These are all fixed, and independent of later deployment. - ## connect to nodes ``` ./ctl.py connect (e.g. ./ctl.py connect infra 0) @@ -170,26 +172,30 @@ this subcommand would retire after we use configmap to configure parameters for Some advanced tricks are possible if you are familiar with options. In general, `-cnf` specifies what config files to use, we try our best to eliminating overlapping content, but if there's any confilict, the later loaded configuration would override the previous ones. `-s` specifies sudo mode, which should be used when you want to copy a certain file to sub directory of `/etc/` etc. on remote machines. `-v` is set to enable verbose mode. `-d` would mean `dryrun` -- az cli wouldn't be executed, only render some files. Dryrun mode would usually be used together when `-o` option is on so you can dump the commands to a file without actually executing them. For instance: ``` -./cloud_init_aztools.py -v -cnf config.yaml -cnf az_complementary.yaml -d -o scripts/addmachines.sh addmachines +./cloud_init_aztools.py -v -cnf config.yaml -cnf action.yaml -d -o scripts/addmachines.sh addmachines ``` # Details in deploy.sh We will explain the operations behind `deploy.sh` in this section. -Clean up existing binaries/certificates etc. and complementary yaml files: +Clean up existing binaries/certificates etc. and action yaml files: ``` -#!/bin/bash -rm -rf deploy/* cloudinit* az_complementary.yaml +shopt -s extglob +rm -rf deploy/!(bin) cloudinit* !(config).yaml ``` -Generate complementary yaml file `az_complementary.yaml` based on given configuration file `config.yaml` of a cluster (machine names are generated if not specified): +Generate action yaml file `action.yaml` based on given configuration file `config.yaml` of a cluster (machine names are generated if not specified): ``` -# render ./cloud_init_deploy.py clusterID ./cloud_init_aztools.py prerender ``` +Deploy the framework of a cluster, including everything but VMs. +``` +./cloud_init_aztools.py -v deployframework +``` + Render templates, generate certificates and prepare binaries for cluster setup: ``` ./cloud_init_deploy.py render @@ -205,13 +211,17 @@ Push docker images that are required by services specified in configuration: ./cloud_init_deploy.py docker servicesprerequisite ``` -Deploy a cluster: +Deploy VMs in the cluster: ``` -./cloud_init_aztools.py -v deploy -./cloud_init_aztools.py interconnect +./cloud_init_aztools.py -v addmachines ``` -Generate a yaml file `brief.yaml` for cluster maintenance: +List VMs in the cluster in `status.yaml` for cluster maintenance: ``` ./cloud_init_aztools.py listcluster ``` + +Make sure that all VMs in the cluster got connected: +``` +./cloud_init_aztools.py interconnect +``` diff --git a/src/ClusterBootstrap/cloud_init_deploy.py b/src/ClusterBootstrap/cloud_init_deploy.py index 32d369747..9b29449ab 100755 --- a/src/ClusterBootstrap/cloud_init_deploy.py +++ b/src/ClusterBootstrap/cloud_init_deploy.py @@ -8,6 +8,7 @@ import yaml import uuid import utils +import time import textwrap import argparse @@ -191,48 +192,12 @@ def load_node_list_by_role_from_config(config, roles, with_domain=True): return Nodes, config -# Get the list of nodes for a particular service -def get_node_lists_for_service(service, config): - if "etcd_node" not in config or "worker_node" not in config: - print("cluster not ready! nodes unknown!") - labels = fetch_config(config, ["kubelabels"]) - nodetype = labels[service] if service in labels else labels["default"] - if nodetype == "worker_node": - nodes = config["worker_node"] - elif nodetype == "etcd_node": - nodes = config["etcd_node"] - elif nodetype.find("etcd_node_") >= 0: - nodenumber = int(nodetype[nodetype.find( - "etcd_node_") + len("etcd_node_"):]) - if len(config["etcd_node"]) >= nodenumber: - nodes = [config["etcd_node"][nodenumber-1]] - else: - nodes = [] - elif nodetype == "all": - nodes = config["worker_node"] + config["etcd_node"] - else: - machines = fetch_config(config, ["machines"]) - if machines is None: - print("Service %s has a nodes type %s, but there is no machine configuration to identify node" % ( - service, nodetype)) - exit(-1) - allnodes = config["worker_node"] + config["etcd_node"] - nodes = [] - for node in allnodes: - nodename = kubernetes_get_node_name(node) - if nodename in machines and nodetype in machines[nodename]: - nodes.append(node) - return nodes - - def load_default_config(config): apply_config_mapping(config, default_config_mapping) if ("mysql_node" not in config): - config["mysql_node"] = None if len(get_node_lists_for_service("mysql", config)) == 0 \ - else get_node_lists_for_service("mysql", config)[0] + config["mysql_node"] = config["infra_node"][0] if ("host" not in config["prometheus"]): - config["prometheus"]["host"] = None if len(get_node_lists_for_service("prometheus", config)) == 0 \ - else get_node_lists_for_service("prometheus", config)[0] + config["prometheus"]["host"] = config["infra_node"][0] config = update_docker_image_config(config) config["admin_username"] = config.get("admin_username", config["cloud_config_nsg_rules"]["default_admin_username"]) config["api_servers"] = "https://" + \