Backoff retries in the activator.#1814
Conversation
|
/test pull-knative-serving-integration-tests |
1 similar comment
|
/test pull-knative-serving-integration-tests |
|
/assign @josephburnett |
| } | ||
|
|
||
| func (rrt *retryRoundTripper) CalculateDelay(retries int, minRetryInterval time.Duration) time.Duration { | ||
| return time.Duration(int(minRetryInterval/time.Millisecond)*retries*retries) * time.Millisecond |
There was a problem hiding this comment.
I believe this is quadratic, not exponential. What we want is an aggressive retry during normal activation times, but a quickly growing retry interval thereafter. Which is easier to achieve with exponential because of that hockey stick shape.
In my experience a small base like 1.3 is a good start. With the retry index as the exponent. Then multiply by the min retry.
E.g. return time.Duration(int(minRetryInterval/time.Millisecond)*(1.3^retries)) * time.Millisecond
It would look something like this. (The actual numbers should be tuned, but the point is to keep the curve low and fast until we leave normal operating conditions.)
There was a problem hiding this comment.
Doh. Of course it's quadratic... very much my bad. Thanks for pointing that out, I'll fix accordingly.
2c4c6b0 to
65be77e
Compare
65be77e to
52d5083
Compare
|
The following is the coverage report on pkg/.
|
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: josephburnett, markusthoemmes The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
|
/restest |
|
/retest |
1 similar comment
|
/retest |


Fixes #1229
Proposed Changes
Added an exponential backoff to the activator's retry logic. In the process, I lowered the timeout to start with (we might need to adjust that a bit to hit a sweet spot) and the total time to retry is now bounded by the elapsed time spent in retrying + requesting.
To determine a good retry interval, the following table can help. Production data on how many retries were needed in reality will help to adjust though.
Regarding tests: Didn't find any for this specific file. I'd love to add some but will need some guidance on how to do so if necessary.
Release Note