diff --git a/docs/source/install/k8s_ha.rst b/docs/source/install/k8s_ha.rst index 7c72cf59..003ad597 100644 --- a/docs/source/install/k8s_ha.rst +++ b/docs/source/install/k8s_ha.rst @@ -4,11 +4,13 @@ This document provides an installation blueprint for a Highly Available StackStorm cluster based on `Kubernetes `__, a container orchestration platform at planet scale. -The cluster deploys a minimum of 2 replicas for each component of StackStorm microservices for redundancy and reliability. It -also configures backends like MongoDB HA Replicaset, RabbitMQ HA and Redis Sentinel cluster that st2 relies on for database, -communication bus, and distributed coordination respectively. That raises a fleet of more than ``30`` pods total. +A StackStorm HA cluster consists of 2 replicas for most StackStorm microservices for redundancy and reliability. +The cluster must also have access to backend services like MongoDB HA Replicaset, RabbitMQ HA and a Redis Sentinel cluster +that st2 relies on for database, communication bus, and distributed coordination respectively. These services are +included in the default StackStorm HA cluster, but StackStorm can also use services provisioned separately. +By default, the StackStorm HA cluster consists of a fleet of more than ``30`` pods. -The source code for K8s resource templates is available as a GitHub repo: +The source code for K8s resource templates (part of our Helm chart) is available as a GitHub repo: `StackStorm/stackstorm-ha `_. .. warning:: @@ -23,13 +25,13 @@ The source code for K8s resource templates is available as a GitHub repo: Requirements ------------ * `Kubernetes `__ cluster -* `Helm `__, the K8s package manager and `Tiller `_ +* `Helm `__ 3, the K8s package manager (Helm 2 is not supported) * Enough computing resources for production use, respecting :doc:`/install/system_requirements` Usage ----- This document assumes some basic knowledge of Kubernetes and Helm. -Please refer to `K8s `__ and `Helm `__ +Please refer to `K8s `__ and `Helm `__ documentation if you find any difficulties using these tools. However, here are some minimal instructions to get started. @@ -52,16 +54,17 @@ or ``st2`` CLI client: .. figure :: /_static/images/helm-chart-notes.png :align: center +.. todo:: Update this screenshot. It is out of date. The installation uses some unsafe defaults which we recommend you change for production use via Helm ``values.yaml``. Helm Values ___________ Helm package ``stackstorm-ha`` comes with default settings (see `values.yaml `_). -Fine-tune them to achieve desired configuration for the StackStorm HA K8s cluster. +Fine-tune them to achieve desired configuration for your StackStorm HA K8s cluster. .. note:: - Keep custom values you want to override in a separated yaml file so they won't get lost. + Keep custom values you want to override in a separate YAML file so they won't get lost. Example: ``helm install -f custom_values.yaml`` or ``helm upgrade -f custom_values.yaml`` You can configure: @@ -71,13 +74,22 @@ You can configure: - st2.conf settings - RBAC roles, assignments and mappings (enterprise only for StackStorm v3.2 and before, open source for StackStorm v3.4 and later) -- custom st2 packs and its configs +- custom st2 packs (in persistent volumes or via custom docker images) and their configs - SSH private key -- K8s resources and settings to control pod/deployment placement -- Mongo, RabbitMQ clusters +- K8s resources, annotations, and settings to control pod/deployment placement +- Image tag and repository settings to select the ST2 version or use customized/private component images +- DNS and Ingress configuration +- Miscellaneous other ST2 cluster customizations +- Mongo, RabbitMQ, and Redis clusters + +If not defined, these values are auto-generated on install and preserved across upgrades: + +- SSH private key +- st2 auth secrets (ie: the password for the st2admin user) .. warning:: - It's highly recommended to set your own secrets as the file contains unsafe defaults like SSH keys, StackStorm access credentials and MongoDB/RabbitMQ passwords! + It's highly recommended to set your own secrets to replace the unsafe defaults for for the MongoDB and RabbitMQ subhcarts! + If you disable the subcharts, make sure to secure the services and add the relevant secrets to st2.conf. Upgrading _________ @@ -121,16 +133,34 @@ Grab all logs only for stackstorm backend services, excluding st2web and DB/MQ/r Custom st2 packs ---------------- -To follow the stateless model, shipping custom st2 packs is now part of the deployment process. -It means that ``st2 pack install`` won't work in a distributed environment and you have to bundle all the -required packs into a Docker image that you can codify, version, package and distribute in a repeatable way. -The responsibility of this Docker image is to hold pack content and their virtualenvs. -So the custom st2 pack docker image you have to build is essentially a couple of read-only directories that -are shared with the corresponding st2 services in the cluster. - -For your convenience, we created a new ``st2-pack-install `` utility +There are two ways to install st2 packs in the k8s cluster. + +1. The ``st2packs`` method is the default. This method will work for practically all clusters, but ``st2 pack install`` does not work. The packs are injected via ``st2packs`` images instead. + +2. The other method defines shared/writable ``volumes``. This method allows ``st2 pack install`` to work, but requires a persistent storage backend to be available in the cluster. This chart will not configure a storage backend for you. + +.. note:: + In general, we recommend using only one of these methods. See the NOTE under Method 2 below about how both methods can be used together with care. + +Method 1: st2packs images (the default) +_______________________________________ + +This method strives to follow the stateless model, so shipping custom st2 packs is part of the deployment process. +Without persistent storage (ie without state), packs and their virtualenvs need to be installed in each pod. +``st2 pack install`` does not work in this distributed model because it assumes that nodes have a shared filesystem +(Method 2, below, uses a shared filesystem), so that only one node needs to download the pack files or setup the +virtualenv and all other nodes will see those files right away. + +In order to achieve this stateless model, you have to bundle all the required packs (and their virtualenvs) +into one or more Docker images that you can codify, version, package and distribute in a repeatable way. +The responsibility of these Docker images is to hold pack content and their virtualenvs. +Effectively, the st2packs Docker image(s) you have to build are a couple of read-only directories that +are shared with the corresponding st2 services in the cluster. When a new st2actionrunner +pod starts up, those directories get copied into the pod. + +For your convenience, we created an ``st2-pack-install `` utility and included it in a container `stackstorm/st2packs `_ -that will help to install custom packs during the Docker build process without relying on live DB and MQ connection. +that will help to install custom packs during the Docker build process without relying on live MongoDB and RabbitMQ connections. For more detailed instructions see `StackStorm/st2packs-dockerfiles `_ on how to build your custom `st2packs` image. @@ -139,9 +169,28 @@ Please refer to `StackStorm/stackstorm-ha#install-custom-st2-packs-in-the-cluste Helm chart repository with more information about how to reference custom st2pack Docker image in Helm values, providing packs configs, using private Docker registry and more. +Method 2: Shared Volumes +________________________ + +Pack content can also be shared via ReadWriteMany volumes such as NFS (Network File System) as :doc:`/reference/ha` recommends. +Using shared volumes sacrifices the stateless infrastructure model, but enables normal pack management features +such as ``st2 pack install``. + +Relying on shared volumes requires cluster-specific storage setup and configuration. As that storage setup varies +widely, manging that storage is out-of-scope for this helm chart. For example, before you can install this chart to use NFS, +you would have to create the NFS exports, and you might need ``PersistentVolume`` and ``PersistentVolumeClaim`` k8s objects. +Then, you add some volume definitions to your ``values.yaml``, and install or upgrade StackStorm with Helm. +Not every cluster uses NFS or PV/PVCs to manage the storage, so the chart treats your volume definitions as opaque data, +merely including your volume definitions in the appropriate place in various ``Deployment`` and ``Job`` k8s objects. + .. note:: - There is an alternative approach, - sharing pack content via read-write-many NFS (Network File System) as :doc:`/reference/ha` recommends. - As beta is in progress and both methods have their pros and cons, we'd like to hear your feedback and which way would work better for you. + With care, ``st2packs`` images can be used with ``volumes``. Just make sure to keep the ``st2packs`` images up-to-date + with any changes made via ``st2 pack install``. If a pack is installed via an ``st2packs`` image and then it gets updated + with ``st2 pack install``, a subsequent ``helm upgrade`` will revert back to the version in the ``st2packs`` image. + +Please refer to `StackStorm/stackstorm-ha#install-custom-st2-packs-in-the-cluster `_ +Helm chart repository with more information about how to pass custom volume definitions for ``packs``, ``virtualenvs`` +and pack ``configs`` in Helm values. Ingress ------- @@ -185,7 +234,7 @@ st2web ______ st2web is a StackStorm Web UI admin dashboard. By default, st2web K8s config includes a Pod Deployment and a Service. ``2`` replicas (configurable) of st2web serve the web app and proxy requests to st2auth, st2api, st2stream. -By default, st2web uses HTTP instead of HTTPS. We recommend you rely on ``LoadBalancer`` or ``Ingress`` to add HTTPS layer on top of it. +By default, st2web uses HTTP instead of HTTPS. We recommend you rely on ``LoadBalancer`` (a ``Service`` type) or ``Ingress`` to add HTTPS layer on top of it. .. note:: By default, st2web is a NodePort Service and is not exposed to the public net. @@ -209,7 +258,7 @@ if you are planning a high-volume environment. st2stream _________ -StackStorm st2stream - exposes a server-sent event stream, used by the clients like WebUI and ChatOps to receive updates from the st2stream server. +The StackStorm ``st2stream`` service exposes a server-sent event stream, used by the clients like WebUI and ChatOps to receive updates from the st2stream server. Similar to st2auth and st2api, st2stream K8s configuration includes Pod Deployment with ``2`` replicas for HA (can be increased in ``values.yaml``) and ClusterIP Service listening on port ``9102``. @@ -263,8 +312,8 @@ st2actionrunner _______________ Stackstorm workers that actually execute actions. ``5`` replicas for K8s Deployment are configured by default to increase StackStorm ability to execute actions without excessive queuing. -Relies on ``redis`` for coordination. This is likely the first thing to lift if you have a lot of actions -to execute per time period in your StackStorm cluster. +Relies on ``redis`` for coordination. The ``st2actionrunner`` replicas count is likely the first thing to increase if you have +a lot of actions to execute per time period in your StackStorm cluster. st2scheduler ____________ @@ -294,6 +343,14 @@ By default ``3`` nodes (1 primary and 2 secondaries) of MongoDB are deployed via For more advanced MongoDB configuration, refer to official `mongodb-replicaset `_ Helm chart settings, which might be fine-tuned via ``values.yaml``. +The deployment of MongoDB to the k8s cluster can be disabled by setting the mongodb-ha.enabled key in values.yaml to false. + +.. note:: + Stackstorm relies heavily on connections to a MongoDB instance. If the in-cluster deployment of MongoDB is disabled, + a connection to an external instance of MongoDB must be configured. The st2.config key in values.yaml provides a way + to configure stackstorm. + See `Configure MongoDB `_ for configuration details. + `RabbitMQ HA Cluster `_ ______________________________________________________________________________________ RabbitMQ is a message bus StackStorm relies on for inter-process communication and load distribution. @@ -302,6 +359,14 @@ By default ``3`` nodes of RabbitMQ are deployed via K8s StatefulSet. For more advanced RabbitMQ configuration, please refer to official `rabbitmq-ha `_ Helm chart repository, - all settings could be overridden via ``values.yaml``. +The deployment of RabbitMQ to the k8s cluster can be disabled by setting the rabbitmq-ha.enabled key in values.yaml to false. + +.. note:: + Stackstorm relies heavily on connections to a RabbitMQ instance. If the in-cluster deployment of RabbitMQ is disabled, + a connection to an external instance of RabbitMQ must be configured. The st2.config key in values.yaml provides a way + to configure stackstorm. + See `Configure RabbitMQ `_ for configuration details. + redis _____ StackStorm employs redis as a distributed coordination backend, required for st2 cluster components to work properly in an HA scenario. @@ -311,8 +376,9 @@ As any other Helm dependency, it's possible to further configure it for specific Feedback Needed! ---------------- As this deployment method new and beta is in progress, we ask you to try it and provide your feedback via + bug reports, ideas, feature or pull requests in `StackStorm/stackstorm-ha `_, -and ecourage discussions in `Slack `_ ``#docker`` channel or write us an email. +and encourage discussions in `Slack `_ ``#k8s`` channel. .. only:: community diff --git a/docs/source/reference/ha.rst b/docs/source/reference/ha.rst index ba0f338d..06d3c8a0 100644 --- a/docs/source/reference/ha.rst +++ b/docs/source/reference/ha.rst @@ -18,7 +18,7 @@ a reference to layer on some HA deployment-specific details. .. note:: - A reproducible blueprint of StackStorm HA cluster is available as a code based on Docker and Kubernetes, see :doc:`/install/k8s_ha`. + A reproducible blueprint of StackStorm HA cluster is available as a helm chart, which is based on Docker and Kubernetes. See :doc:`/install/k8s_ha`. Components @@ -122,9 +122,10 @@ You have to have exactly one active ``st2timersengine`` process running to sched Having more than one active ``st2timersengine`` will result in duplicate timer events and therefore duplicate rule evaluations leading to duplicate workflows or actions. -In HA deployments, external monitoring needs to setup and a new ``st2timersengine`` process needs -to be spun up to address failover. Losing the ``st2timersengine`` will mean no timer events will be -injected into |st2| and therefore no timer rules would be evaluated. +To address failover in HA deployments, use external monitoring of the ``st2timersengine`` process to ensure +one process is running, and to trigger spinning up a new ``st2timersengine`` process if it fails. +Losing the ``st2timersengine`` will mean no timer events will be injected into |st2| and therefore +no timer rules would be evaluated. st2workflowengine ^^^^^^^^^^^^^^^^^