Do not modify config-map in autoscale test. by josephburnett · Pull Request #2144 · knative/serving

josephburnett · 2018-10-03T18:04:09Z

This is a step in removing the cluster-level flakiness that the autoscale test seems to introduce. One theory is that reducing the scale-to-zero-threshold is causing the blue-green test to scale down and flake (either not becoming ready or getting the wrong split from the activator).

Proposed Changes

Remove config map modification from the autoscale test.

Release Note

josephburnett · 2018-10-03T19:53:55Z

/assign @adrcunha

bbrowning

I have a couple of questions and noticed an erroneous log message, but nothing that would block this from being merged if it's a priority to get the flakes fixed asap.

bbrowning · 2018-10-04T17:26:19Z

 		isDeploymentScaledUp(),
-		"DeploymentIsScaledUp")
+		"DeploymentIsScaledUp",
+		2*time.Minute)


This is reducing the time we wait for initial scaleup from the old timeout of 6 minutes down to 2 minutes. Do we know from previous test runs that this always happens within 2 minutes, so that we're not introducing another potential flake?

I'm pretty certain that if scale up doesn't happen within 2 minutes, it never will. I saw this while developing these changes.

I poked through several previous test runs and the scaling up tens to happen in less than one second. I didn't see any cases where it even approached 2 minutes, so this seems fine.

Yeah, what makes it okay is that we aren't starting the time until we have 200 responses from all the requests we sent. So processing time can't affect this. It's just purely metrics pipeline and autoscaler latency.

bbrowning · 2018-10-04T17:26:24Z

@@ -249,7 +227,8 @@ func TestAutoscaleUpDownUp(t *testing.T) {
 		clients.KubeClient,


The log message about "Manually setting ScaleToZeroThreshold" a few lines above this is no longer relevant.

bbrowning · 2018-10-04T17:26:58Z

 		isDeploymentScaledUp(),
-		"DeploymentScaledUp")
+		"DeploymentScaledUp",
+		2*time.Minute)


Same comment with regards to the 2 minute timeout as the previous scaleup block. Lower is better, as long as the tests don't flake with the lower value.

Yeah, same deal. All the requests have succeeded. Now we're just giving the autoscaler time to respond. The metrics pipeline has a 60 second window, so 2 minutes is definitely enough to scale up.

bbrowning · 2018-10-04T17:36:54Z

/lgtm

srinivashegde86 · 2018-10-04T17:45:08Z

-		          cause the test to time out. Failing fast instead. %v`, err)
+		t.Fatalf("Unable to parse scale-to-zero-threshold as duration: %v", err)
 	}
+	scaleToZeroThreshold = threshold


[nit] No need of another var. can be
scaleToZeroThreshold, err := time.ParseDuration(configMap.Data["scale-to-zero-threshold"])

Oh yeah, good catch. Removed.

srinivashegde86 · 2018-10-04T17:45:46Z

/lgtm

srinivashegde86 · 2018-10-04T17:52:43Z

/lgtm
/approve

knative-prow-robot · 2018-10-04T17:52:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: josephburnett, srinivashegde86

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~test/OWNERS~~ [srinivashegde86]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Do not modify config-map.

da89245

knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Oct 3, 2018

knative-prow-robot requested review from srinivashegde86 and steuhs October 3, 2018 18:04

josephburnett changed the title ~~Do not modify config-map.~~ Do not modify config-map in autoscale test. Oct 3, 2018

knative-prow-robot assigned adrcunha Oct 3, 2018

markusthoemmes mentioned this pull request Oct 4, 2018

Run the e2e tests in parallel #1670

Closed

bbrowning reviewed Oct 4, 2018

View reviewed changes

Remove comment about changing config map.

68a5472

knative-prow-robot assigned bbrowning Oct 4, 2018

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2018

srinivashegde86 reviewed Oct 4, 2018

View reviewed changes

knative-prow-robot assigned srinivashegde86 Oct 4, 2018

Remove unnecessary variable.

c0212c0

knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2018

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 4, 2018

knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 4, 2018

knative-prow-robot merged commit efbbc3a into knative:master Oct 4, 2018

		@@ -249,7 +227,8 @@ func TestAutoscaleUpDownUp(t *testing.T) {
		clients.KubeClient,

Conversation

josephburnett commented Oct 3, 2018

Proposed Changes

Uh oh!

josephburnett commented Oct 3, 2018

Uh oh!

bbrowning left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bbrowning commented Oct 4, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

srinivashegde86 commented Oct 4, 2018

Uh oh!

srinivashegde86 commented Oct 4, 2018

Uh oh!

knative-prow-robot commented Oct 4, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants