Do not modify config-map in autoscale test.#2144
Conversation
|
/assign @adrcunha |
bbrowning
left a comment
There was a problem hiding this comment.
I have a couple of questions and noticed an erroneous log message, but nothing that would block this from being merged if it's a priority to get the flakes fixed asap.
| isDeploymentScaledUp(), | ||
| "DeploymentIsScaledUp") | ||
| "DeploymentIsScaledUp", | ||
| 2*time.Minute) |
There was a problem hiding this comment.
This is reducing the time we wait for initial scaleup from the old timeout of 6 minutes down to 2 minutes. Do we know from previous test runs that this always happens within 2 minutes, so that we're not introducing another potential flake?
There was a problem hiding this comment.
I'm pretty certain that if scale up doesn't happen within 2 minutes, it never will. I saw this while developing these changes.
There was a problem hiding this comment.
I poked through several previous test runs and the scaling up tens to happen in less than one second. I didn't see any cases where it even approached 2 minutes, so this seems fine.
There was a problem hiding this comment.
Yeah, what makes it okay is that we aren't starting the time until we have 200 responses from all the requests we sent. So processing time can't affect this. It's just purely metrics pipeline and autoscaler latency.
| @@ -249,7 +227,8 @@ func TestAutoscaleUpDownUp(t *testing.T) { | |||
| clients.KubeClient, | |||
There was a problem hiding this comment.
The log message about "Manually setting ScaleToZeroThreshold" a few lines above this is no longer relevant.
| isDeploymentScaledUp(), | ||
| "DeploymentScaledUp") | ||
| "DeploymentScaledUp", | ||
| 2*time.Minute) |
There was a problem hiding this comment.
Same comment with regards to the 2 minute timeout as the previous scaleup block. Lower is better, as long as the tests don't flake with the lower value.
There was a problem hiding this comment.
Yeah, same deal. All the requests have succeeded. Now we're just giving the autoscaler time to respond. The metrics pipeline has a 60 second window, so 2 minutes is definitely enough to scale up.
|
/lgtm |
| cause the test to time out. Failing fast instead. %v`, err) | ||
| t.Fatalf("Unable to parse scale-to-zero-threshold as duration: %v", err) | ||
| } | ||
| scaleToZeroThreshold = threshold |
There was a problem hiding this comment.
[nit] No need of another var. can be
scaleToZeroThreshold, err := time.ParseDuration(configMap.Data["scale-to-zero-threshold"])
There was a problem hiding this comment.
Oh yeah, good catch. Removed.
|
/lgtm |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: josephburnett, srinivashegde86 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is a step in removing the cluster-level flakiness that the autoscale test seems to introduce. One theory is that reducing the scale-to-zero-threshold is causing the blue-green test to scale down and flake (either not becoming ready or getting the wrong split from the activator).
Proposed Changes
Release Note