Skip to content

Add activator metrics and dashboard#1567

Closed
akyyy wants to merge 11 commits intoknative:masterfrom
akyyy:activator-metrics1
Closed

Add activator metrics and dashboard#1567
akyyy wants to merge 11 commits intoknative:masterfrom
akyyy:activator-metrics1

Conversation

@akyyy
Copy link
Copy Markdown
Contributor

@akyyy akyyy commented Jul 10, 2018

Fixes #743

Proposed Changes

  • Add activator metrics
  • Add activator dashboard

@google-prow-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: akyyy
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: mattmoor

If they are not already assigned, you can assign the PR to them by writing /assign @mattmoor in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-prow-robot google-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 10, 2018
@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 10, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 10, 2018

/test pull-knative-serving-go-coverage

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 10, 2018

/test pull-knative-serving-go-coverage-dev

4 similar comments
@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 10, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 10, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 10, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage

1 similar comment
@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage-dev

2 similar comments
@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage-dev

1 similar comment
@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage-dev

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 11, 2018

/test pull-knative-serving-go-coverage

@akyyy
Copy link
Copy Markdown
Contributor Author

akyyy commented Jul 12, 2018

@mdemirhan @josephburnett gentle ping. It would be nice if I could get this in by the end of this week. Thanks!

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 12, 2018

/test pull-knative-serving-go-coverage

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 12, 2018

/test pull-knative-serving-go-coverage-dev

@mattmoor
Copy link
Copy Markdown
Member

/hold

Wait until after we cut first release.

@google-prow-robot google-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 12, 2018
@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 13, 2018

/test pull-knative-serving-go-coverage-dev --skipped

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 13, 2018

/test pull-knative-serving-go-coverage-dev
/skip

@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage-dev to run the coverage report again

File Old Coverage New Coverage Delta
pkg/activator/dedupe.go 93.3% 92.3% -1.0
pkg/activator/revision.go 80.0% 78.7% -1.3
pkg/activator/stats_reporter.go Do not exist 78.4%

@steuhs
Copy link
Copy Markdown
Contributor

steuhs commented Jul 13, 2018

/skip

Comment thread cmd/activator/main.go

if resp != nil {
rrt.logger.Infof("It took %d tries to get response code %d", i, resp.StatusCode)
namespace := r.Header.Get(controller.GetRevisionHeaderNamespace())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is these are empty? Should we still send metrics?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/s/is/if/

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's empty or doesn't exist, we'll get 500 errors based on

revision, err := revisionClient.Get(rev.name, metav1.GetOptions{})

I think we should report error for the given headers -- all requests routed to activator should have the headers set. If not, we can use the metrics to debug.

Comment thread cmd/activator/main.go
start time.Time
}

func (rrt retryRoundTripper) RoundTrip(r *http.Request) (*http.Response, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface implementation should be on the pointer, not the value type.

func (rrt *retryRoundTripper) RoundTrip(...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment thread cmd/activator/main.go
rrt.reporter.ReportResponseCount(namespace, config, name, resp.StatusCode, i, 1.0)
rrt.reporter.ReportResponseTime(namespace, config, name, resp.StatusCode, time.Now().Sub(rrt.start))
}
return resp, err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not report anything when all retries fail with an error (i.e. no response)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can do that. In that case, what status code dimension you would suggest to use?

Comment thread cmd/activator/main.go
}

func (a *activationHandler) handler(w http.ResponseWriter, r *http.Request) {
namespace := r.Header.Get(controller.GetRevisionHeaderNamespace())
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this new functionality in main.go have any unit tests - see #1585 that refactoring and adding unit tests. I would prefer waiting for that to get in and adding this functionality such that it is tested properly.

All of these should be easy to test - inject headers, test edge cases, see if the reporter is getting called, ...etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that pull request #1585 takes long time, I think we can get activator dashboard in first and then add unit tests based on #1585 later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, I moved #1585 to #1689

// status code indicating why it could not.
type Activator interface {
ActiveEndpoint(namespace, name string) (Endpoint, Status, error)
ActiveEndpoint(namespace, configuration, name string) (Endpoint, Status, error)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is configuration needed for activation? If not, it doesn't look right to add it here. Is there no better way to get this information for the stats?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The configuration parameter is needed for activator to report metrics.

Comment thread pkg/activator/dedupe.go
}
if reqs, ok := a.pendingRequests[id]; ok {
a.reporter.ReportRequest(id.namespace, id.configuration, id.name, state, float64(len(reqs)))
logger.Infof("Wrote request_count metric for revision %s for namespace %s with value %d", id.name, id.namespace, len(reqs))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not write logs to say that we wrong metrics. Seems extremely verbose. If you absolutely need this, I recommend using "Debug" rather than "Info"

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted the log.

var (
measurements = []*stats.Float64Measure{
RequestCountM: stats.Float64(
"revision_request_count",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it intentional to use the same name as revision request count from Istio? If so, do we have the same matching labels.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do.

@akyyy
Copy link
Copy Markdown
Contributor Author

akyyy commented Jul 23, 2018

@mdemirhan I added "start" field in retryRoundTripper struct to calculate the response time in function
func (rrt retryRoundTripper) RoundTrip

If we reuse one retryRoundTripper, we lost the request start time?

@akyyy
Copy link
Copy Markdown
Contributor Author

akyyy commented Jul 26, 2018

This is moved to #1726 after creating a new repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants