Refactor and fix metrics export tests.#1957
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: evankanderson The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
If splitting |
Codecov Report
@@ Coverage Diff @@
## master #1957 +/- ##
==========================================
+ Coverage 68.96% 69.08% +0.12%
==========================================
Files 209 209
Lines 8790 8786 -4
==========================================
+ Hits 6062 6070 +8
+ Misses 2453 2447 -6
+ Partials 275 269 -6
Continue to review full report at Codecov.
|
yanweiguo
left a comment
There was a problem hiding this comment.
Is the failing downstream test relative?
| // We unregister the views because this is one of two ways to flush | ||
| // the internal aggregation buffers; the other is to have the | ||
| // internal reporting period duration tick, which is at least | ||
| // [new duration] in the future. |
There was a problem hiding this comment.
Are the comments here still correct?
There was a problem hiding this comment.
Whoops, no. We have a function for that now.
| Value: m.Timeseries[0].Points[0].GetInt64Value(), | ||
| } | ||
| records = append(records, metric) | ||
| keys[metric.Key()] = struct{}{} |
There was a problem hiding this comment.
Could you add a comment that why using a set here fixes the problem?
There was a problem hiding this comment.
Done. I was convinced for a long time that the RPCs weren't actually going to the right place, but I finally figured out that we simply weren't reading enough off the channel to find them.
vagababov
left a comment
There was a problem hiding this comment.
LG in general
Left some stylistic comments.
| return fmt.Sprintf("%s:%d", m.Key(), m.Value) | ||
| } | ||
|
|
||
| func initSdFake(sdFake *stackDriverFake) error { |
There was a problem hiding this comment.
I guess?
| func initSdFake(sdFake *stackDriverFake) error { | |
| func initSDFake(sdFake *stackDriverFake) error { |
There was a problem hiding this comment.
Done. I have no idea, the product name is "Stackdriver", so I just spelled it out the 3 places it is used.
| resources := []*resource.Resource{ | ||
| { | ||
| Type: "revision", | ||
| Labels: map[string]string{ | ||
| "project": "p1", | ||
| "revision": "r1", | ||
| }, | ||
| }, | ||
| { | ||
| Type: "revision", | ||
| Labels: map[string]string{ | ||
| "project": "p1", | ||
| "revision": "r2", | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
| resources := []*resource.Resource{ | |
| { | |
| Type: "revision", | |
| Labels: map[string]string{ | |
| "project": "p1", | |
| "revision": "r1", | |
| }, | |
| }, | |
| { | |
| Type: "revision", | |
| Labels: map[string]string{ | |
| "project": "p1", | |
| "revision": "r2", | |
| }, | |
| }, | |
| } | |
| resources := []*resource.Resource{{ | |
| Type: "revision", | |
| Labels: map[string]string{ | |
| "project": "p1", | |
| "revision": "r1", | |
| }, | |
| },{ | |
| Type: "revision", | |
| Labels: map[string]string{ | |
| "project": "p1", | |
| "revision": "r2", | |
| }, | |
| }} |
| if err != nil { | ||
| t.Fatalf("failed to read prometheus response: %+v", err) | ||
| } | ||
| want := `# HELP testComponent_global_export_counts Count of exports via standard OpenCensus view. |
There was a problem hiding this comment.
| want := `# HELP testComponent_global_export_counts Count of exports via standard OpenCensus view. | |
| const want = `# HELP testComponent_global_export_counts Count of exports via standard OpenCensus view. |
| expected: []metricExtract{ | ||
| { | ||
| "knative.dev/serving/autoscaler/actual_pods", | ||
| label1, | ||
| 1, | ||
| }, | ||
| { | ||
| "knative.dev/serving/autoscaler/desired_pods", | ||
| label2, | ||
| 2, | ||
| }, | ||
| { | ||
| "custom.googleapis.com/knative.dev/autoscaler/not_ready_pods", | ||
| batchLabels, | ||
| 3, | ||
| }, | ||
| }, | ||
| }, { |
There was a problem hiding this comment.
| expected: []metricExtract{ | |
| { | |
| "knative.dev/serving/autoscaler/actual_pods", | |
| label1, | |
| 1, | |
| }, | |
| { | |
| "knative.dev/serving/autoscaler/desired_pods", | |
| label2, | |
| 2, | |
| }, | |
| { | |
| "custom.googleapis.com/knative.dev/autoscaler/not_ready_pods", | |
| batchLabels, | |
| 3, | |
| }, | |
| }, | |
| }, { | |
| expected: []metricExtract{{ | |
| "knative.dev/serving/autoscaler/actual_pods", | |
| label1, | |
| 1, | |
| },{ | |
| "knative.dev/serving/autoscaler/desired_pods", | |
| label2, | |
| 2, | |
| },{ | |
| "custom.googleapis.com/knative.dev/autoscaler/not_ready_pods", | |
| batchLabels, | |
| 3, | |
| }}, | |
| }, { |
| }, { | ||
| name: "Don't allow custom metrics", | ||
| allowCustomMetrics: "false", | ||
| expected: []metricExtract{ |
|
LG but the failed unit test indicates there may be another problem. |
I've seen prometheus fail to be able to listen on the port; I can choose a random port to see if that helps. Probably the other fix would be to add a SO_REUSEADDR to the prometheus server setup; let me see if that helps. |
|
I'm not sure that |
|
Regardless, the Prometheus code has not been changed, so I don't think that should be a barrier on this PR. (Though I'd love to figure out why the server sometimes won't respond for > 10s). |
| e, f, err := newMetricsExporter(newConfig, logger) | ||
| if err != nil { | ||
| logger.Errorw("Failed to update a new metrics exporter based on metric config", newConfig, zap.Error(err)) | ||
| logger.Errorw("Failed to update a new metrics exporter based on metric config", "config", newConfig, "error", err) |
There was a problem hiding this comment.
zap.String("config", newConfig), zap.Error(err)? Same below?
There was a problem hiding this comment.
Ugh... I want to replace this with a non-sugared logger. It looks like With takes both Field objects and key-value pairs. Unfortunately, this ended up with the key-value format where the value was a zap.Field.
WDYT?
There was a problem hiding this comment.
Well you do use sugared.
You don't need with, just logger.Error("...", zap.Error(err)) on the desugared one
There was a problem hiding this comment.
I switched this to put all the Field objects first so it's not possible to log things as "string" => Field, which was what the old code was doing.
Using logger.Error() means taking the error out of a separate JSON field, which makes it harder (for example) to filter all the logs for just the ones with errors in them.
There was a problem hiding this comment.
Yeah, I know. That's why when I joined I was surprised we;re using this crap (zap that is) :)
But that's the zap philosophy — use jq to parse :)
Changes
resource_view_testfiles.TestFlushExporter. Fixed that, too./kind bug
/kind cleanup
Fixes #1672
/assign @vagababov
It took embarassingly long to find this... I had to walk away twice for at least a week after bashing my head on the export code to figure out what was really going on. (The bug was inside the test all along!)