Add retries in activator and envoy timeout to avoid 503's#1226
Add retries in activator and envoy timeout to avoid 503's#1226google-prow-robot merged 2 commits intoknative:masterfrom akyyy:activator_retry_envoy_timeout
Conversation
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| ) | ||
|
|
||
| const sixtySecondsInMs = "60000" |
There was a problem hiding this comment.
change to a generic name like requestTimeoutMs ?
| Weight: 0, | ||
| }}, | ||
| }, getActivatorDestinationWeight(0), | ||
| { |
There was a problem hiding this comment.
if you move this to the previous line gofmt will indent better
josephburnett
left a comment
There was a problem hiding this comment.
Looks good.
/lgtm
/approve
|
/approve |
| } | ||
| ret = append(ret, activatorRoute) | ||
| } | ||
| activatorRoute := RevisionRoute{ |
There was a problem hiding this comment.
Please sync with the latest changes as I did the same thing in master branch.
There was a problem hiding this comment.
I copied your change over to avoid merge. :)
| }, | ||
| Weight: 100, | ||
| }}, | ||
| }, getActivatorDestinationWeight(0)}, |
There was a problem hiding this comment.
Please sync this file to the latest in master as well. Some of these changes are there as well.
| ) | ||
|
|
||
| const ( | ||
| maxRetry = 60 |
There was a problem hiding this comment.
Seems quite aggressive to retry up to 60 times per request. We should move this to be exponential backoff eventually. We should probably open a Github issue to tackle this later on and check this one in as is.
There was a problem hiding this comment.
Ah, Mustafa wondered the same thing as me. I didn't see this before I commented.
| type retryRoundTripper struct{} | ||
|
|
||
| func (rrt retryRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) { | ||
| transport := http.DefaultTransport.(*http.Transport) |
There was a problem hiding this comment.
Why is there a need for this cast?
|
/lgtm Awesome job :) Just a question (doesn't change my approval), but should we be worried about any backoff to the retries? I don't think so, but I'm just wondering if you've considered it. |
|
I just resolved the conflict and removed my hold request. Let's check this in and we can address my comments in a later review as they are not blockers. |
|
The following is the coverage report on pkg/. Say
*TestCoverage feature is being tested, do not rely on any info here yet |
|
The following is the coverage report on pkg/. Say
*TestCoverage feature is being tested, do not rely on any info here yet |
|
/lgtm |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: akyyy, josephburnett, nikkithurmond, tcnghia, vaikas-google The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Fixes #
Since activator uses the revision service name, which doesn't always have the pod ip, we saw 503's and 504's.
Proposed Changes