[Drivers] Note about Adaptive Rate Limiting#155
Conversation
dbx/adaptive-rate-limiting.rst
Outdated
| @@ -0,0 +1,13 @@ | |||
| .. note:: | |||
|
|
|||
| You may find errors with ``SystemOverloadedError`` or ``RetryableError`` in | |||
There was a problem hiding this comment.
An important thing to note here is that even with newer drivers, users can/will still see these errors if the overload is sustained and strong enough. This will happen when a driver exhausts its retry count for a given request or, in the future pending design finalization, when the driver's retry budget/token bucket is emptied. They will see a lot fewer of these errors, and in the cases of transient or minor overload, they won't see them at all.
So while this guidance to upgrade is certainly still true and recommended, users of newer drivers should still be encouraged to make an informed decision on their end in response to overload (e.g. alerting their systems, inducing their own queueing/throttling, etc).
For the purposes of this PR, I think we can reword this slightly to still encourage upgrade without implying the errors are purely an old-driver problem, and separately we come up with further guidance on how all users can handle these errors that may end up here or in the Atlas docs. We're having discussions with product on this currently.
dbx/adaptive-rate-limiting.rst
Outdated
| @@ -0,0 +1,14 @@ | |||
| .. note:: | |||
|
|
|||
| If you find errors with ``SystemOverloadedError`` or ``RetryableError`` | |||
There was a problem hiding this comment.
This language reads a bit vague to me: users will start getting application failures due to these errors, they won't just log errors and continue running.
There was a problem hiding this comment.
We need to clearly communicate to users that these errors WILL cause application failures when they occur without either code changes to explicitly ignore them or upgrading their driver to a backpressure-supported release.
There was a problem hiding this comment.
@NoahStapp could you take a look at my most recent changes?
dbx/adaptive-rate-limiting.rst
Outdated
| If errors with ``SystemOverloadedError`` or ``RetryableError`` | ||
| labels are causing application failures, or appearing in your application | ||
| logs, you can consider changes to your retry settings. One option is to | ||
| enable adaptive rate limiting. Adaptive rate limiting helps | ||
| manage server load by dynamically adjusting request rates based on current | ||
| conditions, while also managing client-side retry requests to mitigate | ||
| errors. This feature is available on MongoDB 8.3 and supported by the |
There was a problem hiding this comment.
The cause and effect are a little backward here. Adaptive rate limiting is one of the few features that can cause SystemOverloadedErrors to be returned.
There was a problem hiding this comment.
Yeah I think the old phrasing of this text was closer to what we want here, with the modifications I suggested before.
dbx/adaptive-rate-limiting.rst
Outdated
| If errors with ``SystemOverloadedError`` or ``RetryableError`` | ||
| labels are causing application failures, or appearing in your application | ||
| logs, you can consider changes to your retry settings. One option is to | ||
| enable adaptive rate limiting. Adaptive rate limiting helps | ||
| manage server load by dynamically adjusting request rates based on current | ||
| conditions, while also managing client-side retry requests to mitigate | ||
| errors. This feature is available on MongoDB 8.3 and supported by the |
There was a problem hiding this comment.
Yeah I think the old phrasing of this text was closer to what we want here, with the modifications I suggested before.
dbx/adaptive-rate-limiting.rst
Outdated
| driver may not be upgraded to a version that supports adaptive rate limiting. | ||
| We recommend upgrading your |driver-name| to version |ivm-compatible-version| | ||
| or later. If you continue to see these errors after upgrading, you may need | ||
| to review your Intelligent Workload Management (IWM) configuration. |
There was a problem hiding this comment.
As Patrick mentioned in one of his earlier comments, users may also address continued errors through code or system changes on the application side (custom error handling, client-side request throttling, etc), independent of IWM configuration on their cluster. It's possible that the specific situation causing their overload requires (or prefers) these application changes rather than being solvable through server-side configuration.
JIRA: https://jira.mongodb.org/browse/DOCSP-58635
Purpose: This note will be added to drivers that support the progressive back-off workflow.