From dddc2b8f7657d36097061678d96ecd967d82f20a Mon Sep 17 00:00:00 2001
From: Matthew Elwell <matthew.elwell@flagsmith.com>
Date: Wed, 4 Sep 2024 12:29:04 +0100
Subject: [PATCH 1/6] Add docs on Edge proxy health check configuration

---
 .../deployment/hosting/locally-edge-proxy.md  | 62 +++++++++++++++----
 1 file changed, 50 insertions(+), 12 deletions(-)

diff --git a/docs/docs/deployment/hosting/locally-edge-proxy.md b/docs/docs/deployment/hosting/locally-edge-proxy.md
index 29fa55191166..f6fac682a752 100644
--- a/docs/docs/deployment/hosting/locally-edge-proxy.md
+++ b/docs/docs/deployment/hosting/locally-edge-proxy.md
@@ -10,7 +10,9 @@ The Edge Proxy can be configured using a json configuration file (named `config.
 
 You can set the following configuration in `config.json` to control the behaviour of the Edge Proxy:
 
-### `environment_key_pairs`
+### Basic Settings
+
+#### `environment_key_pairs`
 
 An array of environment key pair objects:
 
@@ -21,7 +23,7 @@ An array of environment key pair objects:
 }]
 ```
 
-### `api_poll_frequency`
+#### `api_poll_frequency`
 
 :::note
 
@@ -37,7 +39,7 @@ Control how often the Edge Proxy is going to ping the server for changes, in sec
 
 Defaults to `10`.
 
-### `api_poll_timeout`
+#### `api_poll_timeout`
 
 :::note
 
@@ -53,7 +55,7 @@ Specify the request timeout when trying to retrieve new changes, in seconds:
 
 Defaults to `5`.
 
-### `api_url`
+#### `api_url`
 
 :::note
 
@@ -69,7 +71,7 @@ Set if you are running a self hosted version of Flagsmith:
 
 If not set, defaults to Flagsmith's Edge API.
 
-### `allow_origins`
+#### `allow_origins`
 
 :::note
 
@@ -85,7 +87,9 @@ Set a value for the `Access-Control-Allow-Origin` header.
 
 If not set, defaults to `*`.
 
-### `endpoint_caches`
+### Endpoint Caches
+
+#### `endpoint_caches`
 
 :::note
 
@@ -109,7 +113,9 @@ Optionally, specify the LRU cache size with `cache_max_size` (defaults to 128):
 }
 ```
 
-### `logging.log_level`
+### Logging
+
+#### `logging.log_level`
 
 :::note
 
@@ -123,7 +129,7 @@ Choose a logging level from `"CRITICAL"`, `"ERROR"`, `"WARNING"`, `"INFO"`, `"DE
 "logging": {"log_level": "DEBUG"}
 ```
 
-### `logging.log_format`
+#### `logging.log_format`
 
 :::note
 
@@ -137,7 +143,7 @@ Choose a logging forman between `"generic"` and `"json"`. Defaults to `"generic"
 "logging": {"log_format": "json"}
 ```
 
-### `logging.log_event_field_name`
+#### `logging.log_event_field_name`
 
 :::note
 
@@ -151,7 +157,7 @@ Set a name used for human-readable log entry field when logging events in JSON.
 "logging": {"log_event_field_name": "event"}
 ```
 
-### `logging.colour`
+#### `logging.colour`
 
 :::note
 
@@ -162,7 +168,7 @@ Set a name used for human-readable log entry field when logging events in JSON.
 
 Set to `false` to disable coloured output. Useful when outputting the log to a file.
 
-### `logging.override`
+#### `logging.override`
 
 :::note
 
@@ -229,7 +235,39 @@ Or, log access logs to file in generic format while logging everything else to s
 When adding logger configurations, you can use the `"default"` handler which writes to stdout and uses formatter
 specified by the [`"logging.log_format"`](#logginglog_format) setting.
 
-### `config.json` example
+### Health Check
+
+The health check can be configured depending on the use case for the Edge Proxy by adding the `health_check` object to
+the root of the settings file.
+
+```json
+{
+  ...
+  "health_check": {
+    "count_stale_documents_as_failing": true,
+    "grace_period_seconds": 30
+  }
+}
+```
+
+#### `count_stale_documents_as_failing`
+
+Setting this to False will mean that the health check returns a 200 response with `{"status": "ok", ...}` if the time at
+which the edge proxy was last updated is earlier than the given threshold. Usually this is helpful in environments where
+you want the Edge Proxy to continue to serve traffic in the case where the Flagsmith API is offline.
+
+#### `grace_period_seconds`
+
+The number of seconds to allow per environment key pair before the environment data stored by the Edge Proxy is
+considered stale. The calculation to work out how long before the data is considered stale is as follows (written in
+pseudo-python-code):
+
+```python
+current_time = datetime.now()
+total_grace_period_seconds = api_poll_frequency + (health_check.grace_period_seconds * len(environment_key_pairs))
+```
+
+### Example
 
 Here's an example of a minimal working Edge Proxy configuration:
 

From a873629dc6b343bbfc86fe7a8d8dc7eb468320c6 Mon Sep 17 00:00:00 2001
From: Matthew Elwell <matthew.elwell@flagsmith.com>
Date: Wed, 4 Sep 2024 16:22:03 +0100
Subject: [PATCH 2/6] Rewording based on PR feedback

---
 .../deployment/hosting/locally-edge-proxy.md  | 27 +++++++++----------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/docs/docs/deployment/hosting/locally-edge-proxy.md b/docs/docs/deployment/hosting/locally-edge-proxy.md
index f6fac682a752..c6ed24d99082 100644
--- a/docs/docs/deployment/hosting/locally-edge-proxy.md
+++ b/docs/docs/deployment/hosting/locally-edge-proxy.md
@@ -237,26 +237,23 @@ specified by the [`"logging.log_format"`](#logginglog_format) setting.
 
 ### Health Check
 
-The health check can be configured depending on the use case for the Edge Proxy by adding the `health_check` object to
-the root of the settings file.
+The Edge Proxy exposes a health check endpoint at `/proxy/health` that responds with a 200 status code if it was able to
+fetch all its configured environment documents. By default, if any environment document could not be fetched during the
+latest poll, it will respond with a 500 status code. In some cases, you may want the Edge Proxy to succeed its health
+checks even if it failed to fetch one or more environment documents, but only if it these documents were successfully
+fetched at some point in the past. You can achieve this using the settings defined below.
 
-```json
-{
-  ...
-  "health_check": {
-    "count_stale_documents_as_failing": true,
-    "grace_period_seconds": 30
-  }
-}
-```
+#### `health_check.count_stale_documents_as_failing`
 
-#### `count_stale_documents_as_failing`
+Default: `true`.
 
 Setting this to False will mean that the health check returns a 200 response with `{"status": "ok", ...}` if the time at
-which the edge proxy was last updated is earlier than the given threshold. Usually this is helpful in environments where
-you want the Edge Proxy to continue to serve traffic in the case where the Flagsmith API is offline.
+which the edge proxy was last updated is earlier than the allowed threshold. Usually this is helpful in environments
+where you want the Edge Proxy to continue to serve traffic in the case where the Flagsmith API is offline.
+
+#### `health_check.grace_period_seconds`
 
-#### `grace_period_seconds`
+Default: `30`.
 
 The number of seconds to allow per environment key pair before the environment data stored by the Edge Proxy is
 considered stale. The calculation to work out how long before the data is considered stale is as follows (written in

From d5cbb583c558273d6220e3da493d6e82e6f469c9 Mon Sep 17 00:00:00 2001
From: Matthew Elwell <matthew.elwell@flagsmith.com>
Date: Wed, 4 Sep 2024 17:03:11 +0100
Subject: [PATCH 3/6] Remove unnecessary response json

---
 docs/docs/deployment/hosting/locally-edge-proxy.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/docs/deployment/hosting/locally-edge-proxy.md b/docs/docs/deployment/hosting/locally-edge-proxy.md
index c6ed24d99082..063999673ee4 100644
--- a/docs/docs/deployment/hosting/locally-edge-proxy.md
+++ b/docs/docs/deployment/hosting/locally-edge-proxy.md
@@ -247,9 +247,9 @@ fetched at some point in the past. You can achieve this using the settings defin
 
 Default: `true`.
 
-Setting this to False will mean that the health check returns a 200 response with `{"status": "ok", ...}` if the time at
-which the edge proxy was last updated is earlier than the allowed threshold. Usually this is helpful in environments
-where you want the Edge Proxy to continue to serve traffic in the case where the Flagsmith API is offline.
+Setting this to False will mean that the health check returns a 200 response if the time at which the edge proxy was
+last updated is earlier than the allowed threshold. Usually this is helpful in environments where you want the Edge
+Proxy to continue to serve traffic in the case where the Flagsmith API is offline.
 
 #### `health_check.grace_period_seconds`
 

From a4dee27c78013217fcbf5e9edbaf573dcb1ba12c Mon Sep 17 00:00:00 2001
From: Matthew Elwell <matthew.elwell@flagsmith.com>
Date: Wed, 4 Sep 2024 18:08:24 +0100
Subject: [PATCH 4/6] Update to remove `count_stale_documents_as_failing`

---
 .../deployment/hosting/locally-edge-proxy.md     | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/docs/docs/deployment/hosting/locally-edge-proxy.md b/docs/docs/deployment/hosting/locally-edge-proxy.md
index 063999673ee4..9b7735e61b89 100644
--- a/docs/docs/deployment/hosting/locally-edge-proxy.md
+++ b/docs/docs/deployment/hosting/locally-edge-proxy.md
@@ -241,17 +241,10 @@ The Edge Proxy exposes a health check endpoint at `/proxy/health` that responds
 fetch all its configured environment documents. By default, if any environment document could not be fetched during the
 latest poll, it will respond with a 500 status code. In some cases, you may want the Edge Proxy to succeed its health
 checks even if it failed to fetch one or more environment documents, but only if it these documents were successfully
-fetched at some point in the past. You can achieve this using the settings defined below.
+fetched at some point in the past. You can achieve this using the `environment_update_grace_period_seconds` setting
+defined below.
 
-#### `health_check.count_stale_documents_as_failing`
-
-Default: `true`.
-
-Setting this to False will mean that the health check returns a 200 response if the time at which the edge proxy was
-last updated is earlier than the allowed threshold. Usually this is helpful in environments where you want the Edge
-Proxy to continue to serve traffic in the case where the Flagsmith API is offline.
-
-#### `health_check.grace_period_seconds`
+#### `health_check.environment_update_grace_period_seconds`
 
 Default: `30`.
 
@@ -264,6 +257,9 @@ current_time = datetime.now()
 total_grace_period_seconds = api_poll_frequency + (health_check.grace_period_seconds * len(environment_key_pairs))
 ```
 
+To disable this functionality, set the value to `null`. When set to `null`, the health check will only serve a 500 if
+the configured environments have never been retrieved.
+
 ### Example
 
 Here's an example of a minimal working Edge Proxy configuration:

From b53a6350f2fe1bba4101532bf3925afcf5e17573 Mon Sep 17 00:00:00 2001
From: Matthew Elwell <matthew.elwell@flagsmith.com>
Date: Wed, 4 Sep 2024 18:19:39 +0100
Subject: [PATCH 5/6] Further wording updates

---
 .../deployment/hosting/locally-edge-proxy.md  | 29 +++++++++++--------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/docs/docs/deployment/hosting/locally-edge-proxy.md b/docs/docs/deployment/hosting/locally-edge-proxy.md
index 9b7735e61b89..97eb6be82a4e 100644
--- a/docs/docs/deployment/hosting/locally-edge-proxy.md
+++ b/docs/docs/deployment/hosting/locally-edge-proxy.md
@@ -238,28 +238,33 @@ specified by the [`"logging.log_format"`](#logginglog_format) setting.
 ### Health Check
 
 The Edge Proxy exposes a health check endpoint at `/proxy/health` that responds with a 200 status code if it was able to
-fetch all its configured environment documents. By default, if any environment document could not be fetched during the
-latest poll, it will respond with a 500 status code. In some cases, you may want the Edge Proxy to succeed its health
-checks even if it failed to fetch one or more environment documents, but only if it these documents were successfully
-fetched at some point in the past. You can achieve this using the `environment_update_grace_period_seconds` setting
-defined below.
+fetch all its configured environment documents. By default, if any update of the configured environment documents takes
+longer than the allowed grace period (see below), then the health check will return with a 500 status code. In some
+cases, you may want the Edge Proxy to succeed its health checks even if it failed to fetch one or more environment
+documents, but only if it these documents were successfully fetched at some point in the past. You can achieve this
+using the `environment_update_grace_period_seconds` setting defined below.
 
 #### `health_check.environment_update_grace_period_seconds`
 
 Default: `30`.
 
 The number of seconds to allow per environment key pair before the environment data stored by the Edge Proxy is
-considered stale. The calculation to work out how long before the data is considered stale is as follows (written in
-pseudo-python-code):
+considered stale. When set to `null`, the cached environment documents are never considered stale and the health check
+will only return 500 if the documents have never been updated.
+
+Since the Edge Proxy updates all environments at once on each polling interval, it only stores when it was last updated
+once it's updated all documents. Thus, the calculation to work out how long before the data is considered stale is as
+follows (written in pseudo-python-code):
 
 ```python
-current_time = datetime.now()
-total_grace_period_seconds = api_poll_frequency + (health_check.grace_period_seconds * len(environment_key_pairs))
+total_grace_period_seconds = api_poll_frequency + (environment_update_grace_period_seconds * len(environment_key_pairs))
+if last_updated_all_environments_at < datetime.now() - timedelta(seconds=total_grace_period_seconds):
+    # Data is stale
+    return 500
+# Data is not stale
+return 200
 ```
 
-To disable this functionality, set the value to `null`. When set to `null`, the health check will only serve a 500 if
-the configured environments have never been retrieved.
-
 ### Example
 
 Here's an example of a minimal working Edge Proxy configuration:

From 5771c34b2d11e714ed22c473c624273669e1f8a3 Mon Sep 17 00:00:00 2001
From: Matthew Elwell <matthew.elwell@flagsmith.com>
Date: Wed, 4 Sep 2024 20:15:45 +0100
Subject: [PATCH 6/6] Update with changes suggested by Rodrigo

---
 .../deployment/hosting/locally-edge-proxy.md  | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/docs/docs/deployment/hosting/locally-edge-proxy.md b/docs/docs/deployment/hosting/locally-edge-proxy.md
index 97eb6be82a4e..45849e62d841 100644
--- a/docs/docs/deployment/hosting/locally-edge-proxy.md
+++ b/docs/docs/deployment/hosting/locally-edge-proxy.md
@@ -238,23 +238,22 @@ specified by the [`"logging.log_format"`](#logginglog_format) setting.
 ### Health Check
 
 The Edge Proxy exposes a health check endpoint at `/proxy/health` that responds with a 200 status code if it was able to
-fetch all its configured environment documents. By default, if any update of the configured environment documents takes
-longer than the allowed grace period (see below), then the health check will return with a 500 status code. In some
-cases, you may want the Edge Proxy to succeed its health checks even if it failed to fetch one or more environment
-documents, but only if it these documents were successfully fetched at some point in the past. You can achieve this
-using the `environment_update_grace_period_seconds` setting defined below.
+fetch all its configured environment documents. If any environment document could not be fetched during a configurable
+grace period, the health check will fail with a 500 status code. This allows the Edge Proxy to continue reporting as
+healthy even if the Flagsmith API is temporarily unavailable.
 
 #### `health_check.environment_update_grace_period_seconds`
 
 Default: `30`.
 
 The number of seconds to allow per environment key pair before the environment data stored by the Edge Proxy is
-considered stale. When set to `null`, the cached environment documents are never considered stale and the health check
-will only return 500 if the documents have never been updated.
+considered stale.
 
-Since the Edge Proxy updates all environments at once on each polling interval, it only stores when it was last updated
-once it's updated all documents. Thus, the calculation to work out how long before the data is considered stale is as
-follows (written in pseudo-python-code):
+When set to `null`, cached environment documents are never considered stale, and health checks will succeed if all
+environments were successfully fetched at some point since the Edge Proxy started.
+
+The effective grace period depends on how many environments the Edge Proxy is configured to serve. It can be calculated
+using the following pseudo-Python code:
 
 ```python
 total_grace_period_seconds = api_poll_frequency + (environment_update_grace_period_seconds * len(environment_key_pairs))