-
Notifications
You must be signed in to change notification settings - Fork 34
Add metrics to the catalog server #156
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,40 @@ | ||
| package metrics | ||
|
|
||
| import ( | ||
| "net/http" | ||
|
|
||
| "github.com/prometheus/client_golang/prometheus" | ||
| "github.com/prometheus/client_golang/prometheus/promhttp" | ||
| ) | ||
|
|
||
| const ( | ||
| RequestDurationMetricName = "catalogd_http_request_duration_seconds" | ||
| ) | ||
|
|
||
| // Sets up the necessary metrics for calculating the Apdex Score | ||
| // If using Grafana for visualization connected to a Prometheus data | ||
| // source that is scraping these metrics, you can create a panel that | ||
| // uses the following queries + expressions for calculating the Apdex Score where T = 0.5: | ||
| // Query A: sum(catalogd_http_request_duration_seconds_bucket{code!~"5..",le="0.5"}) | ||
| // Query B: sum(catalogd_http_request_duration_seconds_bucket{code!~"5..",le="2"}) | ||
| // Query C: sum(catalogd_http_request_duration_seconds_count) | ||
| // Expression for Apdex Score: ($A + (($B - $A) / 2)) / $C | ||
| var ( | ||
| RequestDurationMetric = prometheus.NewHistogramVec( | ||
| prometheus.HistogramOpts{ | ||
| Name: RequestDurationMetricName, | ||
| Help: "Histogram of request duration in seconds", | ||
| // create a bucket for each 100 ms up to 1s and ensure it multiplied by 4 also exists. | ||
| // Include a 10s bucket to capture very long running requests. This allows us to easily | ||
| // calculate Apdex Scores up to a T of 1 second, but using various mathmatical formulas we | ||
| // should be able to estimate Apdex Scores up to a T of 2.5. Having a larger range of buckets | ||
| // will allow us to more easily calculate health indicators other than the Apdex Score. | ||
| Buckets: []float64{0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.2, 1.6, 2, 2.4, 2.8, 3.2, 3.6, 4, 10}, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does it matter that our write timeout is 10s and the max bucket duration is 10s? Seems like we'll only ever get whatever error code maps to that timeout in the 10s bucket.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so. If anything I think it means that we now have buckets that capture all possible response times and that allows us to calculate more metrics on the fly. This is all going based on #156 (comment) . Since if no requests take more than 10s we will never have anything in the "Inf" bucket. That being said, I could be wrong - I don't have enough experience in this area to truly know and am making an assumption with what I currently know
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Any response time > 4s and <= 10s will fall in that 10s bucket
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, got it. sgtm. |
||
| }, | ||
| []string{"code"}, | ||
| ) | ||
| ) | ||
|
|
||
| func AddMetricsToHandler(handler http.Handler) http.Handler { | ||
| return promhttp.InstrumentHandlerDuration(RequestDurationMetric, handler) | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.