Skip to content

Add retries in activator and envoy timeout to avoid 503's#1226

Merged
google-prow-robot merged 2 commits intoknative:masterfrom
akyyy:activator_retry_envoy_timeout
Jun 15, 2018
Merged

Add retries in activator and envoy timeout to avoid 503's#1226
google-prow-robot merged 2 commits intoknative:masterfrom
akyyy:activator_retry_envoy_timeout

Conversation

@akyyy
Copy link
Copy Markdown
Contributor

@akyyy akyyy commented Jun 15, 2018

Fixes #
Since activator uses the revision service name, which doesn't always have the pod ip, we saw 503's and 504's.

Proposed Changes

  • Add retries in activator
  • Specify envoy timeout in request header

@akyyy akyyy self-assigned this Jun 15, 2018
@google-prow-robot google-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 15, 2018
Comment thread pkg/controller/route/istio_route.go Outdated
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

const sixtySecondsInMs = "60000"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to a generic name like requestTimeoutMs ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread pkg/controller/route/route_test.go Outdated
Weight: 0,
}},
}, getActivatorDestinationWeight(0),
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you move this to the previous line gofmt will indent better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. thanks!

Copy link
Copy Markdown
Contributor

@josephburnett josephburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

/lgtm
/approve

@google-prow-robot google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 15, 2018
@tcnghia
Copy link
Copy Markdown
Contributor

tcnghia commented Jun 15, 2018

/approve

}
ret = append(ret, activatorRoute)
}
activatorRoute := RevisionRoute{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sync with the latest changes as I did the same thing in master branch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied your change over to avoid merge. :)

},
Weight: 100,
}},
}, getActivatorDestinationWeight(0)},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sync this file to the latest in master as well. Some of these changes are there as well.

Comment thread cmd/activator/main.go
)

const (
maxRetry = 60
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems quite aggressive to retry up to 60 times per request. We should move this to be exponential backoff eventually. We should probably open a Github issue to tackle this later on and check this one in as is.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, Mustafa wondered the same thing as me. I didn't see this before I commented.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread cmd/activator/main.go Outdated
type retryRoundTripper struct{}

func (rrt retryRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) {
transport := http.DefaultTransport.(*http.Transport)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a need for this cast?

@nikkithurmond
Copy link
Copy Markdown
Contributor

/lgtm
/approve

Awesome job :) Just a question (doesn't change my approval), but should we be worried about any backoff to the retries? I don't think so, but I'm just wondering if you've considered it.

@google-prow-robot google-prow-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 15, 2018
@mdemirhan
Copy link
Copy Markdown
Contributor

I just resolved the conflict and removed my hold request. Let's check this in and we can address my comments in a later review as they are not blockers.

@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/activator/revision.go 79.5% 79.1% -0.5
pkg/controller/route/route_test.go 78.5% 78.7% 0.2

*TestCoverage feature is being tested, do not rely on any info here yet

@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/. Say /test pull-knative-serving-go-coverage to run the coverage report again

File Old Coverage New Coverage Delta
pkg/activator/revision.go 79.5% 79.1% -0.5
pkg/controller/route/route_test.go 78.5% 78.7% 0.2

*TestCoverage feature is being tested, do not rely on any info here yet

@mdemirhan
Copy link
Copy Markdown
Contributor

/lgtm

@google-prow-robot google-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jun 15, 2018
@vaikas
Copy link
Copy Markdown
Contributor

vaikas commented Jun 15, 2018

/lgtm
/approve

@google-prow-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: akyyy, josephburnett, nikkithurmond, tcnghia, vaikas-google

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-prow-robot google-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 15, 2018
@google-prow-robot google-prow-robot merged commit 9d5a63a into knative:master Jun 15, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants