Revisions should wait for minScale replicas to report ready by joshrider · Pull Request #3493 · knative/serving

joshrider · 2019-03-21T21:29:08Z

Proposed Changes

Revisions with a minScale annotation are only marked as "Ready" when they have at least minScale number of ready replicas
Fixes a bug when repeated calls were made to the addEndpoint helper

knative-prow-robot

@pivotal-joshua-rider: 0 warnings.

Details

In response to this:

Fixes #3077

Proposed Changes

Revisions with a minScale annotation are marked Ready when they have at least minScale number of ready replicas

Revisions with a minScale annotation that are marked as Ready are marked as "Deploying" when they have fewer than minScale ready replicas.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

joshrider · 2019-03-21T21:32:01Z

/assign @grantr

joshrider · 2019-03-21T21:38:44Z

/ok-to-test
👨‍💻

knative-prow-robot · 2019-03-21T21:39:12Z

@pivotal-joshua-rider: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test
👨‍💻

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

andrew-su · 2019-03-21T21:53:59Z

/ok-to-test

grantr · 2019-03-21T21:59:43Z

/unassign

vagababov · 2019-03-21T22:16:24Z

+	return ms > 0
+}
+
+func getMinScale(rev *v1alpha1.Revision) (int, error) {


This is not an error. Just return 0.

vagababov · 2019-03-21T22:17:28Z

+}
+
+func hasMinimumEndpoints(e *corev1.Endpoints, minimum int) bool {
+	count := 0


To be in the vein with the following method

for es in range: min -= len(es.Adds) if min <= 0: return True return False

mattmoor · 2019-03-21T22:44:23Z

/assign
/hold

tl;dr I generally don't like this approach.

The minScale annotation isn't part of our API or Conformance, and implementing it shouldn't be a requirement for autoscaler implementations to plug-in. As such, I really don't want to introduce logic at the Revision-level that's semantically aware of this annotation.

I think that I'd like to see a way for the PodAutoscaler to gate Revision readiness, which we don't today. One way would be to have the PodAutoscaler take over handing this aspect of readiness and surfacing it here.

joshrider · 2019-03-22T13:25:32Z

The minScale annotation isn't part of our API or Conformance, and implementing it shouldn't be a requirement for autoscaler implementations to plug-in. As such, I really don't want to introduce logic at the Revision-level that's semantically aware of this annotation.

This makes a lot of sense. I'll dig in a bit further and come back with some questions. 👍

mattmoor · 2019-03-25T14:22:39Z

Thanks @pivotal-joshua-rider !

markusthoemmes · 2019-05-06T06:11:58Z

@joshrider Any news on this PR?

joshrider · 2019-05-06T14:31:56Z

@markusthoemmes I should be able to get something up shortly!

joshrider · 2019-05-08T14:19:49Z

/retest

joshrider · 2019-05-08T14:34:32Z

@markusthoemmes @mattmoor Feedback is appreciated!

Things generally go as expected, but I have occasionally noticed a lag between the last needed Replica becoming ready and the KPA becoming ready.

There is also lots of room for cleanup in the scaler, but I figure that's the job of a different PR.

~~edit: Also, does adding the guard to handleScaleToZero introduce too much unnecessary churn?~~ handled by #4036

vagababov

A few more test related items. :-)
But
/lgtm

vagababov

/lgtm

knative-metrics-robot · 2019-05-08T19:43:06Z

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File	Old Coverage	New Coverage	Delta
pkg/reconciler/autoscaling/kpa/kpa.go	91.4%	92.0%	0.6
pkg/reconciler/autoscaling/kpa/scaler.go	84.4%	86.0%	1.6

joshrider · 2019-05-09T20:26:29Z

Currently, the system brings 1 Pod to 'ready' before it can start scaling up to the minimum. Should we be scaling to the minimum from the start?

lvjing2 · 2019-05-10T02:13:31Z

This pr will make the code start much longer, cause it will need to wait to proxy request from activator to user pod until the minscale pod turn ready. I really don’t want to sacrifice this.

lvjing2 · 2019-05-10T02:29:11Z

Currently, the system brings 1 Pod to 'ready' before it can start scaling up to the minimum. Should we be scaling to the minimum from the start?

In fact, 1 Pod to 'ready' before it can start scaling up to the minimum is designed to detect whether the revision is ok from reconciling to setting up a ready pod, if it can't set up a ready pod, then the autoscaler would not scaling up even to the minScale. That is to say, if the revision is bad, then it will set up at most 1 crashing pod. If we scaling to the minScale from the start, then we will alway set up minScale crashing pods.

joshrider · 2019-05-10T14:23:04Z

That all makes sense.

What are the cases where the activator would be waiting with requests? Given that minScale keeps some number of pods warm, would it just be during the initial launch? Are we also worried about cases where a configuration is updated and we take longer getting the new revision up?

lvjing2 · 2019-05-11T01:28:18Z

What are the cases where the activator would be waiting with requests? Given that minScale keeps some number of pods warm, would it just be during the initial launch? Are we also worried about cases where a configuration is updated and we take longer getting the new revision up?

Yeah, you are right, this would only happen in the initial launch of a revision, then it would be fewer worry for me, but I think it still need superpower to consider this. @markusthoemmes WDYT

markusthoemmes · 2019-05-14T07:41:12Z

By the very design of this, it will take longer to even get to Ready. As you already mentioned, the activator isn't a concern in this case, as minScale prevents the activator from being hooked in in the first place. The activator will not be hooked in during initial launch either, we don't do that currently.

vagababov · 2019-05-14T16:48:16Z

The activator will not be hooked in during initial launch either, we don't do that currently.
The way SKS is written, until there is at least one ready pod, the revision will be backed by activator.

mattmoor

oops, a couple unsent comments...

mattmoor · 2019-05-16T15:20:07Z

 		// Don't scale-to-zero during activation
-		desiredScale = scaleUnknown
+		if min, _ := pa.ScaleBounds(); min == 0 {
+			return scaleUnknown, false


why not desiredScale = scaleUnknown still?

Note: I'm not asking about the condition, but the early return.

It was put into place to accommodate the moved desiredScale < 0 guard in the caller in order to maintain the existing functionality of returning scaleUnknown and not "applying" the scale.

Happy to move the desiredScale < 0 check back and switch this over if it's preferred.

mattmoor · 2019-05-27T00:00:15Z

/lgtm
/approve

knative-prow-robot · 2019-05-27T00:00:27Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joshrider, mattmoor, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/reconciler/autoscaling/OWNERS~~ [mattmoor,vagababov]
~~test/OWNERS~~ [mattmoor,vagababov]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

joshrider · 2019-05-27T14:14:51Z

@mattmoor are we good to remove the hold on this?

mattmoor · 2019-05-27T20:14:04Z

/hold cancel

Yeah, sorry I didn't realize that was still there! thanks for the reminder.

…3493) * revisions become ready with minScale reached * remove redundant condition * revert accidental spacing changes * cleanup test helpers * cleanup tests * remove copypasted comment

knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 21, 2019

knative-prow-robot requested review from mattmoor and tcnghia March 21, 2019 21:29

knative-prow-robot reviewed Mar 21, 2019

View reviewed changes

knative-prow-robot added area/API API objects and controllers needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 21, 2019

knative-prow-robot assigned grantr Mar 21, 2019

knative-prow-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 21, 2019

knative-prow-robot unassigned grantr Mar 21, 2019

vagababov reviewed Mar 21, 2019

View reviewed changes

knative-prow-robot assigned mattmoor Mar 21, 2019

knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2019

lvjing2 mentioned this pull request Mar 29, 2019

Consider to report stats in Queue-proxy only when it's user container is ready #3581

Closed

googlebot added the cla: yes Indicates the PR's author has signed the CLA. label May 6, 2019

joshrider force-pushed the min-scale-readiness branch 2 times, most recently from 43461e8 to 905c47c Compare May 8, 2019 14:12

knative-prow-robot added area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels May 8, 2019

cleanup test helpers

f60addb

vagababov reviewed May 8, 2019

View reviewed changes

Comment thread test/e2e/minscale_readiness_test.go Outdated

Comment thread test/e2e/minscale_readiness_test.go Outdated

Comment thread pkg/reconciler/autoscaling/kpa/kpa_test.go Outdated

knative-prow-robot assigned vagababov May 8, 2019

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

cleanup tests

2965aa0

knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

vagababov approved these changes May 8, 2019

View reviewed changes

knative-prow-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 8, 2019

remove copypasted comment

946e45c

knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019

mattmoor reviewed May 17, 2019

View reviewed changes

knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2019

knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2019

knative-prow-robot merged commit 8f5d48a into knative:master May 27, 2019

joshrider deleted the min-scale-readiness branch May 27, 2019 20:47

Conversation

joshrider commented Mar 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed Changes

Uh oh!

knative-prow-robot left a comment

Choose a reason for hiding this comment

Proposed Changes

Uh oh!

joshrider commented Mar 21, 2019

Uh oh!

joshrider commented Mar 21, 2019

Uh oh!

knative-prow-robot commented Mar 21, 2019

Uh oh!

andrew-su commented Mar 21, 2019

Uh oh!

grantr commented Mar 21, 2019

Uh oh!

vagababov Mar 21, 2019

Choose a reason for hiding this comment

Uh oh!

vagababov Mar 21, 2019

Choose a reason for hiding this comment

Uh oh!

mattmoor commented Mar 21, 2019

Uh oh!

joshrider commented Mar 22, 2019

Uh oh!

mattmoor commented Mar 25, 2019

Uh oh!

markusthoemmes commented May 6, 2019

Uh oh!

joshrider commented May 6, 2019

Uh oh!

joshrider commented May 8, 2019

Uh oh!

joshrider commented May 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vagababov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vagababov left a comment

Choose a reason for hiding this comment

Uh oh!

knative-metrics-robot commented May 8, 2019

Uh oh!

joshrider commented May 9, 2019

Uh oh!

lvjing2 commented May 10, 2019

Uh oh!

lvjing2 commented May 10, 2019

Uh oh!

joshrider commented May 10, 2019

Uh oh!

lvjing2 commented May 11, 2019

Uh oh!

markusthoemmes commented May 14, 2019

Uh oh!

vagababov commented May 14, 2019

Uh oh!

mattmoor left a comment

Choose a reason for hiding this comment

Uh oh!

mattmoor May 16, 2019

Choose a reason for hiding this comment

Uh oh!

mattmoor May 16, 2019

Choose a reason for hiding this comment

Uh oh!

joshrider May 21, 2019

Choose a reason for hiding this comment

Uh oh!

mattmoor commented May 27, 2019

Uh oh!

knative-prow-robot commented May 27, 2019

joshrider commented Mar 21, 2019 •

edited

Loading

joshrider commented May 8, 2019 •

edited

Loading