Skip to content

Revisions should wait for minScale replicas to report ready#3493

Merged
knative-prow-robot merged 6 commits intoknative:masterfrom
joshrider:min-scale-readiness
May 27, 2019
Merged

Revisions should wait for minScale replicas to report ready#3493
knative-prow-robot merged 6 commits intoknative:masterfrom
joshrider:min-scale-readiness

Conversation

@joshrider
Copy link
Copy Markdown
Contributor

@joshrider joshrider commented Mar 21, 2019

Fixes #3077

Proposed Changes

  • Revisions with a minScale annotation are only marked as "Ready" when they have at least minScale number of ready replicas
  • Fixes a bug when repeated calls were made to the addEndpoint helper

@knative-prow-robot knative-prow-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 21, 2019
Copy link
Copy Markdown
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pivotal-joshua-rider: 0 warnings.

Details

In response to this:

Fixes #3077

Proposed Changes

  • Revisions with a minScale annotation are marked Ready when they have at least minScale number of ready replicas
  • Revisions with a minScale annotation that are marked as Ready are marked as "Deploying" when they have fewer than minScale ready replicas.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added area/API API objects and controllers needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 21, 2019
@joshrider
Copy link
Copy Markdown
Contributor Author

/assign @grantr

@joshrider
Copy link
Copy Markdown
Contributor Author

/ok-to-test
👨‍💻

@knative-prow-robot
Copy link
Copy Markdown
Contributor

@pivotal-joshua-rider: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

Details

In response to this:

/ok-to-test
👨‍💻

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@andrew-su
Copy link
Copy Markdown
Contributor

/ok-to-test

@knative-prow-robot knative-prow-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 21, 2019
@grantr
Copy link
Copy Markdown
Contributor

grantr commented Mar 21, 2019

/unassign

return ms > 0
}

func getMinScale(rev *v1alpha1.Revision) (int, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an error. Just return 0.

}

func hasMinimumEndpoints(e *corev1.Endpoints, minimum int) bool {
count := 0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be in the vein with the following method

for es in range:
  min -= len(es.Adds)
  if min <= 0:
    return True
return False

@mattmoor
Copy link
Copy Markdown
Member

/assign
/hold

tl;dr I generally don't like this approach.

The minScale annotation isn't part of our API or Conformance, and implementing it shouldn't be a requirement for autoscaler implementations to plug-in. As such, I really don't want to introduce logic at the Revision-level that's semantically aware of this annotation.

I think that I'd like to see a way for the PodAutoscaler to gate Revision readiness, which we don't today. One way would be to have the PodAutoscaler take over handing this aspect of readiness and surfacing it here.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2019
@joshrider
Copy link
Copy Markdown
Contributor Author

The minScale annotation isn't part of our API or Conformance, and implementing it shouldn't be a requirement for autoscaler implementations to plug-in. As such, I really don't want to introduce logic at the Revision-level that's semantically aware of this annotation.

This makes a lot of sense. I'll dig in a bit further and come back with some questions. 👍

@mattmoor
Copy link
Copy Markdown
Member

Thanks @pivotal-joshua-rider !

@markusthoemmes
Copy link
Copy Markdown
Contributor

@joshrider Any news on this PR?

@joshrider
Copy link
Copy Markdown
Contributor Author

@markusthoemmes I should be able to get something up shortly!

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label May 6, 2019
@joshrider joshrider force-pushed the min-scale-readiness branch 2 times, most recently from 43461e8 to 905c47c Compare May 8, 2019 14:12
@knative-prow-robot knative-prow-robot added area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features labels May 8, 2019
@joshrider
Copy link
Copy Markdown
Contributor Author

/retest

@joshrider
Copy link
Copy Markdown
Contributor Author

joshrider commented May 8, 2019

@markusthoemmes @mattmoor Feedback is appreciated!

Things generally go as expected, but I have occasionally noticed a lag between the last needed Replica becoming ready and the KPA becoming ready.

There is also lots of room for cleanup in the scaler, but I figure that's the job of a different PR.

edit: Also, does adding the guard to handleScaleToZero introduce too much unnecessary churn? handled by #4036

Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more test related items. :-)
But
/lgtm

Comment thread test/e2e/minscale_readiness_test.go Outdated
Comment thread test/e2e/minscale_readiness_test.go Outdated
Comment thread pkg/reconciler/autoscaling/kpa/kpa_test.go Outdated
@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 8, 2019
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019
Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added lgtm Indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels May 8, 2019
@knative-prow-robot knative-prow-robot removed the lgtm Indicates that a PR is ready to be merged. label May 8, 2019
@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/autoscaling/kpa/kpa.go 91.4% 92.0% 0.6
pkg/reconciler/autoscaling/kpa/scaler.go 84.4% 86.0% 1.6

@joshrider
Copy link
Copy Markdown
Contributor Author

Currently, the system brings 1 Pod to 'ready' before it can start scaling up to the minimum. Should we be scaling to the minimum from the start?

@lvjing2
Copy link
Copy Markdown
Contributor

lvjing2 commented May 10, 2019

This pr will make the code start much longer, cause it will need to wait to proxy request from activator to user pod until the minscale pod turn ready. I really don’t want to sacrifice this.

@lvjing2
Copy link
Copy Markdown
Contributor

lvjing2 commented May 10, 2019

Currently, the system brings 1 Pod to 'ready' before it can start scaling up to the minimum. Should we be scaling to the minimum from the start?

In fact, 1 Pod to 'ready' before it can start scaling up to the minimum is designed to detect whether the revision is ok from reconciling to setting up a ready pod, if it can't set up a ready pod, then the autoscaler would not scaling up even to the minScale. That is to say, if the revision is bad, then it will set up at most 1 crashing pod. If we scaling to the minScale from the start, then we will alway set up minScale crashing pods.

@joshrider
Copy link
Copy Markdown
Contributor Author

That all makes sense.

What are the cases where the activator would be waiting with requests? Given that minScale keeps some number of pods warm, would it just be during the initial launch? Are we also worried about cases where a configuration is updated and we take longer getting the new revision up?

@lvjing2
Copy link
Copy Markdown
Contributor

lvjing2 commented May 11, 2019

What are the cases where the activator would be waiting with requests? Given that minScale keeps some number of pods warm, would it just be during the initial launch? Are we also worried about cases where a configuration is updated and we take longer getting the new revision up?

Yeah, you are right, this would only happen in the initial launch of a revision, then it would be fewer worry for me, but I think it still need superpower to consider this. @markusthoemmes WDYT

@markusthoemmes
Copy link
Copy Markdown
Contributor

By the very design of this, it will take longer to even get to Ready. As you already mentioned, the activator isn't a concern in this case, as minScale prevents the activator from being hooked in in the first place. The activator will not be hooked in during initial launch either, we don't do that currently.

@vagababov
Copy link
Copy Markdown
Contributor

The activator will not be hooked in during initial launch either, we don't do that currently.
The way SKS is written, until there is at least one ready pod, the revision will be backed by activator.

Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, a couple unsent comments...

// Don't scale-to-zero during activation
desiredScale = scaleUnknown
if min, _ := pa.ScaleBounds(); min == 0 {
return scaleUnknown, false
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not desiredScale = scaleUnknown still?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm not asking about the condition, but the early return.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was put into place to accommodate the moved desiredScale < 0 guard in the caller in order to maintain the existing functionality of returning scaleUnknown and not "applying" the scale.

Happy to move the desiredScale < 0 check back and switch this over if it's preferred.

@mattmoor
Copy link
Copy Markdown
Member

/lgtm
/approve

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label May 27, 2019
@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: joshrider, mattmoor, vagababov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@joshrider
Copy link
Copy Markdown
Contributor Author

@mattmoor are we good to remove the hold on this?

@mattmoor
Copy link
Copy Markdown
Member

/hold cancel

Yeah, sorry I didn't realize that was still there! thanks for the reminder.

@knative-prow-robot knative-prow-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 27, 2019
@knative-prow-robot knative-prow-robot merged commit 8f5d48a into knative:master May 27, 2019
@joshrider joshrider deleted the min-scale-readiness branch May 27, 2019 20:47
JRBANCEL pushed a commit to JRBANCEL/serving that referenced this pull request May 29, 2019
…3493)

* revisions become ready with minScale reached

* remove redundant condition

* revert accidental spacing changes

* cleanup test helpers

* cleanup tests

* remove copypasted comment
joshrider added a commit to joshrider/serving that referenced this pull request Jun 10, 2019
…3493)

* revisions become ready with minScale reached

* remove redundant condition

* revert accidental spacing changes

* cleanup test helpers

* cleanup tests

* remove copypasted comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Revisions should wait for minScale replicas to report ready

10 participants