Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 33 additions & 23 deletions docs/deployment/Azure/cloud_init_readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,26 @@ If you run into a deployment issue, please check [here](FAQ.md) first.

`./ctl.py` is the main file used for maintaining a cluster. It provides several handy tools that provide more convenient interface for developers. We introduce some commands below:

## back up and restore cluster information
```
./ctl.py backuptodir <path> (e.g., ./ctl.py backuptodir ~/Deployment/Azure-EASTUS-V100)
./ctl.py restorefromdir <path> (e.g., ./ctl.py restorefromdir ~/Deployment/Azure-EASTUS-V100)
```
Rendered files and binaries would often occupy quite some space on disk. We don't need most of those files after the deployment. We backup several yaml files:
1. config.yaml, which describes the "fixed and firm" configuration of a cluster, such as NSG rules, alert email addresses, docker registry etc.
2. action.yaml, which describes one-time deployment action.
3. status.yaml, which describes the up-to-date machine info of the cluster. Whoever changed the cluster(added/removed machines, etc.) last would be responsible of updating this file and backup/let colleagues know.
besides yamls, we also need a cluster ID, sshkey and k8s basic authentication. These are all fixed, and independent of later deployment.

## adding more machines
This might be the only maintain task where need `cloud_init_aztools.py` instead of `ctl.py`.
To add more machines (worker/elasticsearch nodes etc., NFS/infra nodes not supported), re-configure `azure_cluster.virtual_machines` in `config.yaml` and use below command:
To add more machines(multi-infra not supported now), either
1. re-configure `azure_cluster.virtual_machines` in `config.yaml`, (leave/uncomment only the items corresponding to machines you want to add, delete/comment previously existing items that were used to generate machine list for previous deployment/maintanence) and use below command:
```
./cloud_init_aztools.py prerender
```
to generate new machine list. You can also edit `az_complementary.yaml` directly.
or
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format as

or

2. Edit ...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

2. Edit `action.yaml` directly, keep only the machine items that you want to deploy for this time.
You may want to save the previous config files in advance.

After reconfiguration, you may use below commands to finish the new deployment of several nodes to the existing cluster:
Expand Down Expand Up @@ -108,17 +121,6 @@ specify "dynamic_worker_num" in config.yaml,
and use `./cloud_init_aztools.py dynamic_around`.
the monitoring frequency is specified by "monitor_again_after" in config.yaml

## back up and restore cluster information
```
./ctl.py backuptodir <path> (e.g., ./ctl.py backuptodir ~/Deployment/Azure-EASTUS-V100)
./ctl.py restorefromdir <path> (e.g., ./ctl.py restorefromdir ~/Deployment/Azure-EASTUS-V100)
```
Rendered files and binaries would often occupy quite some space on disk. We don't need most of those files after the deployment. We backup several yaml files:
1. config.yaml, which describes the "fixed and firm" configuration of a cluster, such as NSG rules, alert email addresses, docker registry etc.
2. az_complementary.yaml, which describes one-time deployment action.
3. status.yaml, which describes the up-to-date machine info of the cluster. Whoever changed the cluster(added/removed machines, etc.) last would be responsible of updating this file and backup/let colleagues know.
besides yamls, we also need a cluster ID, sshkey and k8s basic authentication. These are all fixed, and independent of later deployment.

## connect to nodes
```
./ctl.py connect <role> <index> (e.g. ./ctl.py connect infra 0)
Expand Down Expand Up @@ -170,26 +172,30 @@ this subcommand would retire after we use configmap to configure parameters for
Some advanced tricks are possible if you are familiar with options.
In general, `-cnf` specifies what config files to use, we try our best to eliminating overlapping content, but if there's any confilict, the later loaded configuration would override the previous ones. `-s` specifies sudo mode, which should be used when you want to copy a certain file to sub directory of `/etc/` etc. on remote machines. `-v` is set to enable verbose mode. `-d` would mean `dryrun` -- az cli wouldn't be executed, only render some files. Dryrun mode would usually be used together when `-o` option is on so you can dump the commands to a file without actually executing them. For instance:
```
./cloud_init_aztools.py -v -cnf config.yaml -cnf az_complementary.yaml -d -o scripts/addmachines.sh addmachines
./cloud_init_aztools.py -v -cnf config.yaml -cnf action.yaml -d -o scripts/addmachines.sh addmachines
```

# Details in deploy.sh

We will explain the operations behind `deploy.sh` in this section.

Clean up existing binaries/certificates etc. and complementary yaml files:
Clean up existing binaries/certificates etc. and action yaml files:
```
#!/bin/bash
rm -rf deploy/* cloudinit* az_complementary.yaml
shopt -s extglob
rm -rf deploy/!(bin) cloudinit* !(config).yaml
```

Generate complementary yaml file `az_complementary.yaml` based on given configuration file `config.yaml` of a cluster (machine names are generated if not specified):
Generate action yaml file `action.yaml` based on given configuration file `config.yaml` of a cluster (machine names are generated if not specified):
```
# render
./cloud_init_deploy.py clusterID
./cloud_init_aztools.py prerender
```

Deploy the framework of a cluster, including everything but VMs.
```
./cloud_init_aztools.py -v deployframework
```

Render templates, generate certificates and prepare binaries for cluster setup:
```
./cloud_init_deploy.py render
Expand All @@ -205,13 +211,17 @@ Push docker images that are required by services specified in configuration:
./cloud_init_deploy.py docker servicesprerequisite
```

Deploy a cluster:
Deploy VMs in the cluster:
```
./cloud_init_aztools.py -v deploy
./cloud_init_aztools.py interconnect
./cloud_init_aztools.py -v addmachines
```

Generate a yaml file `brief.yaml` for cluster maintenance:
List VMs in the cluster in `status.yaml` for cluster maintenance:
```
./cloud_init_aztools.py listcluster
```

Make sure that all VMs in the cluster got connected:
```
./cloud_init_aztools.py interconnect
```
41 changes: 3 additions & 38 deletions src/ClusterBootstrap/cloud_init_deploy.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
import yaml
import uuid
import utils
import time
import textwrap
import argparse

Expand Down Expand Up @@ -191,48 +192,12 @@ def load_node_list_by_role_from_config(config, roles, with_domain=True):
return Nodes, config


# Get the list of nodes for a particular service
def get_node_lists_for_service(service, config):
if "etcd_node" not in config or "worker_node" not in config:
print("cluster not ready! nodes unknown!")
labels = fetch_config(config, ["kubelabels"])
nodetype = labels[service] if service in labels else labels["default"]
if nodetype == "worker_node":
nodes = config["worker_node"]
elif nodetype == "etcd_node":
nodes = config["etcd_node"]
elif nodetype.find("etcd_node_") >= 0:
nodenumber = int(nodetype[nodetype.find(
"etcd_node_") + len("etcd_node_"):])
if len(config["etcd_node"]) >= nodenumber:
nodes = [config["etcd_node"][nodenumber-1]]
else:
nodes = []
elif nodetype == "all":
nodes = config["worker_node"] + config["etcd_node"]
else:
machines = fetch_config(config, ["machines"])
if machines is None:
print("Service %s has a nodes type %s, but there is no machine configuration to identify node" % (
service, nodetype))
exit(-1)
allnodes = config["worker_node"] + config["etcd_node"]
nodes = []
for node in allnodes:
nodename = kubernetes_get_node_name(node)
if nodename in machines and nodetype in machines[nodename]:
nodes.append(node)
return nodes


def load_default_config(config):
apply_config_mapping(config, default_config_mapping)
if ("mysql_node" not in config):
config["mysql_node"] = None if len(get_node_lists_for_service("mysql", config)) == 0 \
else get_node_lists_for_service("mysql", config)[0]
config["mysql_node"] = config["infra_node"][0]
if ("host" not in config["prometheus"]):
config["prometheus"]["host"] = None if len(get_node_lists_for_service("prometheus", config)) == 0 \
else get_node_lists_for_service("prometheus", config)[0]
config["prometheus"]["host"] = config["infra_node"][0]
config = update_docker_image_config(config)
config["admin_username"] = config.get("admin_username", config["cloud_config_nsg_rules"]["default_admin_username"])
config["api_servers"] = "https://" + \
Expand Down