From 9b3759925ab75228356389a49d7e98168e3153d7 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Mon, 20 Aug 2018 10:41:54 +0200 Subject: [PATCH 01/14] Fix typo. --- docs/source/reference/action_output_streaming.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/reference/action_output_streaming.rst b/docs/source/reference/action_output_streaming.rst index 8a4cf7258..f4ed15f65 100644 --- a/docs/source/reference/action_output_streaming.rst +++ b/docs/source/reference/action_output_streaming.rst @@ -3,9 +3,9 @@ Real-time Action Output Streaming .. note:: - This feature was adding in |st2| v2.5.0 and above. Initially it was disabled by default. From v2.6.0, - it is enabled by default. This can be changed with the ``actionrunner.stream_output`` config option - in ``st2.conf``. + This feature was added and is available in |st2| v2.5.0 and above. Initially it was disabled by + default. From v2.6.0, it is enabled by default. This can be changed with the + ``actionrunner.stream_output`` config option in ``st2.conf``. How it Works ------------ From 00b7d59591874f1fea916f5a4a8c87ba151d8564 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Mon, 20 Aug 2018 12:07:57 +0200 Subject: [PATCH 02/14] Add some WIP docs on metrics and instrumentation. --- docs/source/reference/metrics.rst | 82 +++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 docs/source/reference/metrics.rst diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst new file mode 100644 index 000000000..14d0da90e --- /dev/null +++ b/docs/source/reference/metrics.rst @@ -0,0 +1,82 @@ +Metrics and Instrumentation +=========================== + +|st2| services and code base contain instrumentation with metrics in various critical places. +This provides better operational visibility and allows operators to detect various infrastructure +or deployment related issues (e.g. long average duration for a particular action could indicate +an issue with that action or similar). + +Configuring and Enabling Metrics Collection +=========================================== + +.. note:: + + This feature was added and is available in |st2| v2.8.0 and above. + +By default metrics collection is disabled. To enable it, you need to configure ``metrics.driver`` +and depending on the driver, also ``metrics.host`` and ``metrics.port`` option in +``/etc/st2/st2.conf``. + +Right now, the only supported driver is ``statsd``. To configure it, add the following entries to +``st2.conf``: + +.. code-block:: ini + + [metrics] + driver = statsd + host = 127.0.0.1 # statsd collection and aggregation server address + port = 8125 # statsd collection and aggregation server port + +After you have configured it, you need to restart all the services using ``st2ctl restart``. + +In case your statsd daemon is running on a remote sever and you have a firewall configured, you +also need to make sure that all the servers where |st2| components are running are allowed +outgoing access to the configured host and port. + +For debugging and troubleshooting purposes, you can also set driver to ``echo``. This will cause +|st2| to log under ``DEBUG`` log level any metrics operation which would have otherwise be performed +(increasing a counter, timing an operation, etc.) without actually performing it. + +Exposed Metrics +=============== + +.. note:: + + Various metrics documented in this section are only available in |st2| v2.9.0 and above. + +This section describes which metrics are currently exposed by various |st2| services. + ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +| Name | Type | Service | Description | ++============================================================+============+==============================================================================================================================================+ +| st2.action.executions | counter | st2actionrunner | Number of action executions processed by st2actionrunner service. | ++--------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action.executions | timer | st2actionrunner | How long it took to process (run) a particular action execution inside st2actionrunner service. | ++--------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action. | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | ++--------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action. | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action. | counter | st2actionrunner | Counter information for various final execution states (succeeded, failed, timeout). | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.rule.processed | counter | st2rulesengine | Numbers of rules processed by st2rulesengine service. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.rule.processed | timer | st2rulesengine | How long it took to process a particular rule (trigger instance) inside st2rulesengine. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.rule. | counter | st2rulesengine | Number of particular rules processed by st2rulesengine. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.trigger. | counter | st2rulesengine | Number of particular trigger types processed by st2rulesengine. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.requests | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.requests | timer | st2auth, st2api, st2stream | How long it took to process a particular HTTP request. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.requests.method. | counter | st2auth, st2api, st2stream | Number of requests with particular HTTP method processed by st2auth / st2api / st2stream. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.requests.path. | counter | st2auth, st2api, st2stream | Number of requests to a particular HTTP path (controller endpoint) processed by st2auth / st2api / st2stream. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.responses.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | ++------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ + +Depending on the metric backend used and metric type, some of those metrics will also be averaged, +aggregated and converted into a rate (operations / seconds for ``counter`` metrics), etc. From e69094f3471d46abf8b1bee8c01161494e2ded0a Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Mon, 20 Aug 2018 15:13:25 +0200 Subject: [PATCH 03/14] Fix formatting, add new metrics, use consistent naming. --- docs/source/reference/index.rst | 1 + docs/source/reference/metrics.rst | 38 +++++++++++++++---------------- 2 files changed, 20 insertions(+), 19 deletions(-) diff --git a/docs/source/reference/index.rst b/docs/source/reference/index.rst index 027325415..ffa48d827 100644 --- a/docs/source/reference/index.rst +++ b/docs/source/reference/index.rst @@ -22,3 +22,4 @@ References and Guides sensor_partitioning history secrets_masking + metrics diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index 14d0da90e..b74cdb6d4 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -50,33 +50,33 @@ This section describes which metrics are currently exposed by various |st2| serv | Name | Type | Service | Description | +============================================================+============+==============================================================================================================================================+ | st2.action.executions | counter | st2actionrunner | Number of action executions processed by st2actionrunner service. | -+--------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.action.executions | timer | st2actionrunner | How long it took to process (run) a particular action execution inside st2actionrunner service. | -+--------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| st2.action. | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | -+--------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ -| st2.action. | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| st2.action. | counter | st2actionrunner | Counter information for various final execution states (succeeded, failed, timeout). | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++-------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action..executions | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | ++-------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner | ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.action.executions. | counter | st2actionrunner | Counter information for various final execution states (succeeded, failed, timeout). | ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | counter | st2rulesengine | Numbers of rules processed by st2rulesengine service. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | timer | st2rulesengine | How long it took to process a particular rule (trigger instance) inside st2rulesengine. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| st2.rule. | counter | st2rulesengine | Number of particular rules processed by st2rulesengine. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ -| st2.trigger. | counter | st2rulesengine | Number of particular trigger types processed by st2rulesengine. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.rule..processed | counter | st2rulesengine | Number of particular rules processed by st2rulesengine. | ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ +| st2.trigger..processed | counter | st2rulesengine | Number of particular trigger types processed by st2rulesengine. | ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests | timer | st2auth, st2api, st2stream | How long it took to process a particular HTTP request. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests.method. | counter | st2auth, st2api, st2stream | Number of requests with particular HTTP method processed by st2auth / st2api / st2stream. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests.path. | counter | st2auth, st2api, st2stream | Number of requests to a particular HTTP path (controller endpoint) processed by st2auth / st2api / st2stream. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.responses.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | -+------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ Depending on the metric backend used and metric type, some of those metrics will also be averaged, aggregated and converted into a rate (operations / seconds for ``counter`` metrics), etc. From 9a81dbec2b15fdf8dc18b16dd65af51f1e760570 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Mon, 20 Aug 2018 15:14:44 +0200 Subject: [PATCH 04/14] Fix formatting. --- docs/source/reference/metrics.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index b74cdb6d4..830a228c2 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -52,9 +52,9 @@ This section describes which metrics are currently exposed by various |st2| serv | st2.action.executions | counter | st2actionrunner | Number of action executions processed by st2actionrunner service. | +------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.action.executions | timer | st2actionrunner | How long it took to process (run) a particular action execution inside st2actionrunner service. | -+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.action..executions | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | -+-------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner | +------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ | st2.action.executions. | counter | st2actionrunner | Counter information for various final execution states (succeeded, failed, timeout). | From 9ee98949b3b95870ef3556e2435e921a930fabca Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Mon, 20 Aug 2018 15:16:26 +0200 Subject: [PATCH 05/14] Fix formatting. --- docs/source/reference/metrics.rst | 34 +++++++++++++++---------------- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index 830a228c2..830364777 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -46,37 +46,37 @@ Exposed Metrics This section describes which metrics are currently exposed by various |st2| services. -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ -| Name | Type | Service | Description | -+============================================================+============+==============================================================================================================================================+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ +| Name | Type | Service | Description | ++============================================================+============+=============================+================================================================================================================+ | st2.action.executions | counter | st2actionrunner | Number of action executions processed by st2actionrunner service. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action.executions | timer | st2actionrunner | How long it took to process (run) a particular action execution inside st2actionrunner service. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action..executions | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action.executions. | counter | st2actionrunner | Counter information for various final execution states (succeeded, failed, timeout). | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | counter | st2rulesengine | Numbers of rules processed by st2rulesengine service. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | timer | st2rulesengine | How long it took to process a particular rule (trigger instance) inside st2rulesengine. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule..processed | counter | st2rulesengine | Number of particular rules processed by st2rulesengine. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.trigger..processed | counter | st2rulesengine | Number of particular trigger types processed by st2rulesengine. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests | timer | st2auth, st2api, st2stream | How long it took to process a particular HTTP request. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests.method. | counter | st2auth, st2api, st2stream | Number of requests with particular HTTP method processed by st2auth / st2api / st2stream. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.requests.path. | counter | st2auth, st2api, st2stream | Number of requests to a particular HTTP path (controller endpoint) processed by st2auth / st2api / st2stream. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.responses.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | -+------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------+ ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ Depending on the metric backend used and metric type, some of those metrics will also be averaged, aggregated and converted into a rate (operations / seconds for ``counter`` metrics), etc. From 3bd0b42062177be4d25aedf19f872227568d78a2 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Mon, 20 Aug 2018 16:01:42 +0200 Subject: [PATCH 06/14] Update metric names and descriptions. --- docs/source/reference/metrics.rst | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index 830364777..45536425f 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -49,33 +49,41 @@ This section describes which metrics are currently exposed by various |st2| serv +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | Name | Type | Service | Description | +============================================================+============+=============================+================================================================================================================+ -| st2.action.executions | counter | st2actionrunner | Number of action executions processed by st2actionrunner service. | +| st2.action.executions | counter | st2actionrunner | Current number of action executions being processed by st2actionrunner service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action.executions | timer | st2actionrunner | How long it took to process (run) a particular action execution inside st2actionrunner service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.action..executions | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | +| st2.action..executions | counter | st2actionrunner | Current number of action execution for a particular action being processed by st2actionrunner. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner | +| st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.action.executions. | counter | st2actionrunner | Counter information for various final execution states (succeeded, failed, timeout). | +| st2.action.executions. | counter | st2actionrunner | Number of executions which are currently in a particular state (succeeded, failed, timeout, delayed, etc). | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.rule.processed | counter | st2rulesengine | Numbers of rules processed by st2rulesengine service. | +| st2.rule.processed | counter | st2rulesengine | Number of rules (trigger instances) currently being processed by st2rulesengine service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | timer | st2rulesengine | How long it took to process a particular rule (trigger instance) inside st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule..processed | counter | st2rulesengine | Number of particular rules processed by st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ +| st2.rule.matched | counter | st2rulesengine | Number of trigger instances which matched a rule (criteria). | ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ +| st2.rule..matched | counter | st2rulesengine | Numbers of trigger instances which matched a particular rule (criteria). | ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.trigger..processed | counter | st2rulesengine | Number of particular trigger types processed by st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.requests | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | +| st2.trigger_instance..processed | timer | st2rulesengine | How long it took to process a particular trigger instance inside st2rulesengine. | ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.request.total | counter | st2auth, st2api, st2stream | Total number of requests processed by st2auth / st2api / st2stream. | ++------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ +| st2.{auth,api,stream}.request | counter | st2auth, st2api, st2stream | Number of requests currently being processed by st2auth / st2api / st2stream. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.requests | timer | st2auth, st2api, st2stream | How long it took to process a particular HTTP request. | +| st2.{auth,api,stream}.request | timer | st2auth, st2api, st2stream | How long it took to process a particular HTTP request. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.requests.method. | counter | st2auth, st2api, st2stream | Number of requests with particular HTTP method processed by st2auth / st2api / st2stream. | +| st2.{auth,api,stream}.request.method. | counter | st2auth, st2api, st2stream | Number of requests with particular HTTP method processed by st2auth / st2api / st2stream. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.requests.path. | counter | st2auth, st2api, st2stream | Number of requests to a particular HTTP path (controller endpoint) processed by st2auth / st2api / st2stream. | +| st2.{auth,api,stream}.request.path. | counter | st2auth, st2api, st2stream | Number of requests to a particular HTTP path (controller endpoint) processed by st2auth / st2api / st2stream. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.responses.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | +| st2.{auth,api,stream}.response.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ Depending on the metric backend used and metric type, some of those metrics will also be averaged, From a18f312af1dc5a9f453db81c079b2a0ab1b16bac Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Tue, 21 Aug 2018 12:16:06 +0200 Subject: [PATCH 07/14] Update descriptions. --- docs/source/reference/metrics.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index 45536425f..b53a98915 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -49,17 +49,17 @@ This section describes which metrics are currently exposed by various |st2| serv +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | Name | Type | Service | Description | +============================================================+============+=============================+================================================================================================================+ -| st2.action.executions | counter | st2actionrunner | Current number of action executions being processed by st2actionrunner service. | +| st2.action.executions | counter | st2actionrunner | Number of action executions processed by st2actionrunner service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action.executions | timer | st2actionrunner | How long it took to process (run) a particular action execution inside st2actionrunner service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.action..executions | counter | st2actionrunner | Current number of action execution for a particular action being processed by st2actionrunner. | +| st2.action..executions | counter | st2actionrunner | Number of action execution for a particular action processed by st2actionrunner. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action.executions. | counter | st2actionrunner | Number of executions which are currently in a particular state (succeeded, failed, timeout, delayed, etc). | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.rule.processed | counter | st2rulesengine | Number of rules (trigger instances) currently being processed by st2rulesengine service. | +| st2.rule.processed | counter | st2rulesengine | Number of rules (trigger instances) processed by st2rulesengine service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | timer | st2rulesengine | How long it took to process a particular rule (trigger instance) inside st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ @@ -73,9 +73,9 @@ This section describes which metrics are currently exposed by various |st2| serv +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.trigger_instance..processed | timer | st2rulesengine | How long it took to process a particular trigger instance inside st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.request.total | counter | st2auth, st2api, st2stream | Total number of requests processed by st2auth / st2api / st2stream. | +| st2.{auth,api,stream}.request.total | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.{auth,api,stream}.request | counter | st2auth, st2api, st2stream | Number of requests currently being processed by st2auth / st2api / st2stream. | +| st2.{auth,api,stream}.request | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.request | timer | st2auth, st2api, st2stream | How long it took to process a particular HTTP request. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ @@ -86,5 +86,5 @@ This section describes which metrics are currently exposed by various |st2| serv | st2.{auth,api,stream}.response.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -Depending on the metric backend used and metric type, some of those metrics will also be averaged, -aggregated and converted into a rate (operations / seconds for ``counter`` metrics), etc. +Depending on the metric backend used and metric type, some of those metrics will also be sampled, +averaged, aggregated and converted into a rate (operations / seconds for ``counter`` metrics), etc. From a291b13e8575b6a9e4a15c1743dbcd49bdd667f9 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Tue, 21 Aug 2018 12:18:05 +0200 Subject: [PATCH 08/14] Link to metrics section in the monitoring doc. --- docs/source/reference/metrics.rst | 2 +- docs/source/reference/monitoring.rst | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index b53a98915..d846dc891 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -11,7 +11,7 @@ Configuring and Enabling Metrics Collection .. note:: - This feature was added and is available in |st2| v2.8.0 and above. + This feature was added and is available in |st2| v2.9.0 and above. By default metrics collection is disabled. To enable it, you need to configure ``metrics.driver`` and depending on the driver, also ``metrics.host`` and ``metrics.port`` option in diff --git a/docs/source/reference/monitoring.rst b/docs/source/reference/monitoring.rst index 229b9de6e..3d5b2177f 100644 --- a/docs/source/reference/monitoring.rst +++ b/docs/source/reference/monitoring.rst @@ -92,7 +92,8 @@ Key metrics for |st2| administrators to watch are the number of running and sche the average execution time. Busy systems will need to scale out the number of ``st2actionrunner`` processes. -We recommend storing metrics in a time-series database, such as `InfluxDB `_ +|st2| exposes some of those metrics via statsd using the metrics framework. For more information, +please refert to :doc:`/reference/metrics` section. MongoDB ------- From e3788f4f68fb54576f617c9c1019348f9e5327e9 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Tue, 21 Aug 2018 12:27:11 +0200 Subject: [PATCH 09/14] Fix metric name. --- docs/source/reference/metrics.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index d846dc891..887f8ca61 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -71,7 +71,7 @@ This section describes which metrics are currently exposed by various |st2| serv +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.trigger..processed | counter | st2rulesengine | Number of particular trigger types processed by st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.trigger_instance..processed | timer | st2rulesengine | How long it took to process a particular trigger instance inside st2rulesengine. | +| st2.triggerinstance..processed | timer | st2rulesengine | How long it took to process a particular trigger instance inside st2rulesengine. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.{auth,api,stream}.request.total | counter | st2auth, st2api, st2stream | Number of requests processed by st2auth / st2api / st2stream. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ From 09661dd0a0610ac0aa1bffd8b6b3591efabde434 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Wed, 22 Aug 2018 13:10:28 +0200 Subject: [PATCH 10/14] Add a section on configuring statsd and include some sample configs. --- docs/source/reference/metrics.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index 887f8ca61..949558d24 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -37,6 +37,24 @@ For debugging and troubleshooting purposes, you can also set driver to ``echo``. |st2| to log under ``DEBUG`` log level any metrics operation which would have otherwise be performed (increasing a counter, timing an operation, etc.) without actually performing it. +Configuring StatsD +================== + +|st2| ``statsd`` metrics driver is compatible with any service which exposes statsd compatible +interface for receiving metrics via UDP. + +This includes original statsd service written in Node.js, but also compatible projects such as +Telegraf and others. + +This provides for a lot of flexibility and allows statsd service to submit those metrics to self +hosted or managed graphite instance or to other compatible projects and services such as InfluxDB +and hostedgraphite. + +Configuring those services is out of scope of this documentation, because it's very environment +specific (aggregation resolution, retention period, etc.), but some sample config which can help +you get started with statsd and self hosted graphite and carbon cache instance +can be found at https://github.com/StackStorm/st2/tree/master/conf/metrics. + Exposed Metrics =============== From 2f70e2d0e5c2bcb50f70d9b05959c27a506cb5df Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Wed, 22 Aug 2018 13:54:24 +0200 Subject: [PATCH 11/14] Document new prefix option. --- docs/source/reference/metrics.rst | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index 949558d24..d603be78a 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -24,6 +24,10 @@ Right now, the only supported driver is ``statsd``. To configure it, add the fol [metrics] driver = statsd + # Optional prefix which is prepended to each metric key. E.g. if prefix is + # "production" and key is "st2.action.executions" actual key would be + # "production.st2.action.executions". This comes handy when you want to + # utilize the same backend instance for multiple environments or similar. host = 127.0.0.1 # statsd collection and aggregation server address port = 8125 # statsd collection and aggregation server port From d7f982204a7b490dd5bdb3805fa07bac9806e992 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Wed, 22 Aug 2018 16:10:52 +0200 Subject: [PATCH 12/14] Update docs. --- docs/source/reference/metrics.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index d603be78a..e3ec57a9f 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -25,8 +25,8 @@ Right now, the only supported driver is ``statsd``. To configure it, add the fol [metrics] driver = statsd # Optional prefix which is prepended to each metric key. E.g. if prefix is - # "production" and key is "st2.action.executions" actual key would be - # "production.st2.action.executions". This comes handy when you want to + # "production" and key is "action.executions" actual key would be + # "st2.production.action.executions". This comes handy when you want to # utilize the same backend instance for multiple environments or similar. host = 127.0.0.1 # statsd collection and aggregation server address port = 8125 # statsd collection and aggregation server port @@ -79,7 +79,7 @@ This section describes which metrics are currently exposed by various |st2| serv +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.action..executions | timer | st2actionrunner | How long it took to process (run) action execution for a particular action inside st2actionrunner. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -| st2.action.executions. | counter | st2actionrunner | Number of executions which are currently in a particular state (succeeded, failed, timeout, delayed, etc). | +| st2.action.executions. | counter | st2actionrunner | Number of executions in a particular state (succeeded, failed, timeout, delayed, etc). | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ | st2.rule.processed | counter | st2rulesengine | Number of rules (trigger instances) processed by st2rulesengine service. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ From d967e2f027880ce739b24088283d9a89f2b9642a Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Wed, 22 Aug 2018 16:18:58 +0200 Subject: [PATCH 13/14] Update docs. --- docs/source/reference/metrics.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index e3ec57a9f..b508258ed 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -55,7 +55,7 @@ hosted or managed graphite instance or to other compatible projects and services and hostedgraphite. Configuring those services is out of scope of this documentation, because it's very environment -specific (aggregation resolution, retention period, etc.), but some sample config which can help +specific (aggregation resolution, retention period, etc.), but some sample configs which can help you get started with statsd and self hosted graphite and carbon cache instance can be found at https://github.com/StackStorm/st2/tree/master/conf/metrics. From 1cdc7ba5f0b6876c62201ad971ef22649063cd43 Mon Sep 17 00:00:00 2001 From: Tomaz Muraus Date: Wed, 22 Aug 2018 16:24:25 +0200 Subject: [PATCH 14/14] Add some docs on derived metrics. --- docs/source/reference/metrics.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/docs/source/reference/metrics.rst b/docs/source/reference/metrics.rst index b508258ed..730302f57 100644 --- a/docs/source/reference/metrics.rst +++ b/docs/source/reference/metrics.rst @@ -108,5 +108,14 @@ This section describes which metrics are currently exposed by various |st2| serv | st2.{auth,api,stream}.response.status. | counter | st2auth, st2api, st2stream | Number of requests which resulted in a response with a particular HTTP status code. | +------------------------------------------------------------+------------+-----------------------------+----------------------------------------------------------------------------------------------------------------+ -Depending on the metric backend used and metric type, some of those metrics will also be sampled, +Depending on the metric backend and metric type, some of those metrics will also be sampled, averaged, aggregated and converted into a rate (operations / seconds for ``counter`` metrics), etc. + +Keep in mind that for the counter metrics, statsd automatically calculates rates. If you are +interested in more than a rate (events per second), you will need to derive those metrics from the +raw "count" metric. + +For example, if you are interested in a total number of executions scheduled or a total number of +API requests in a particular time frame, you would use ``integral()`` graphite function (e.g. +``integral(stats.counters.st2.action.executions.scheduled.count)`` and +``integral(stats.counters.st2.api.requests.count)``).