Skip to content

Scale zero activating timeout#4232

Closed
lvjing2 wants to merge 3 commits intoknative:masterfrom
lvjing2:scale-zeor-activating-timeout
Closed

Scale zero activating timeout#4232
lvjing2 wants to merge 3 commits intoknative:masterfrom
lvjing2:scale-zeor-activating-timeout

Conversation

@lvjing2
Copy link
Copy Markdown
Contributor

@lvjing2 lvjing2 commented Jun 4, 2019

Fixes #
problematic pods can't be scale to zero forever.

Proposed Changes

check whether activating is timeout or not, if timeout, then scale to zero if min==0, and EnableScaleToZero=true

Release Note

enable to scale to zero when activating timeout when the pod can't be ready like crashlooping or some other status

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jun 4, 2019
@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 4, 2019
@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: lvjing2
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: greghaynes

If they are not already assigned, you can assign the PR to them by writing /assign @greghaynes in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 4, 2019
@knative-prow-robot
Copy link
Copy Markdown
Contributor

Hi @lvjing2. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a related issue?

Comment thread pkg/reconciler/autoscaling/kpa/scaler.go Outdated
Comment thread pkg/reconciler/autoscaling/kpa/scaler.go Outdated
Comment thread pkg/reconciler/autoscaling/kpa/scaler.go Outdated
Comment thread pkg/reconciler/autoscaling/kpa/scaler.go Outdated
@vagababov
Copy link
Copy Markdown
Contributor

/ok-to-test

I would like to see an integration test for this as well.

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 4, 2019
@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/autoscaling/v1alpha1/pa_lifecycle.go 98.2% 96.5% -1.7
pkg/reconciler/autoscaling/kpa/kpa.go 91.3% 91.4% 0.2
pkg/reconciler/autoscaling/kpa/scaler.go 83.7% 80.4% -3.3

@vagababov
Copy link
Copy Markdown
Contributor

Also, please run unit tests locally before posting PRs...

@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/autoscaling/v1alpha1/pa_lifecycle.go 98.2% 96.5% -1.7
pkg/reconciler/autoscaling/kpa/kpa.go 91.3% 91.4% 0.2
pkg/reconciler/autoscaling/kpa/scaler.go 83.7% 83.2% -0.5

@lvjing2
Copy link
Copy Markdown
Contributor Author

lvjing2 commented Jun 4, 2019

Is there a related issue?
I found #3456 is

@vagababov
Copy link
Copy Markdown
Contributor

Then it should be attached to the issue, so it can get closed and tracked :) You can edit the first message.

@knative-prow-robot
Copy link
Copy Markdown
Contributor

@lvjing2: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-knative-serving-integration-tests faa3dd5 link /test pull-knative-serving-integration-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

logger.Debug("Metrics are not yet being collected.")
return desiredScale, nil
// check is activating timeout
if pa.Status.IsActivatingTimeout(autoscalerConfig.ActivatingTimeout) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this is correct. It should not really matter in the negative case at all.

if pa.Status.IsActivating() { // Active=Unknown
// Don't scale-to-zero during activation
if min, _ := pa.ScaleBounds(); min == 0 {
if pa.Status.IsActivatingTimeout(config.ActivatingTimeout) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was the main code that did scale to zero, so it should've remained somewhere, right?


// IsActivatingTimeout checks whether the pod autoscaler has been in activating
// for at least the specified timeout period.
func (pas *PodAutoscalerStatus) IsActivatingTimeout(activatingTimeout time.Duration) bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here and in the config-map, etc "Activation Timeout" is a more correct name.


if pa.Status.IsActivating() {
timeout := config.FromContext(ctx).Autoscaler.ActivatingTimeout
logger.Infof("enqueue delay for check activating timeout after %v.", timeout)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logging statement.

Suggested change
logger.Infof("enqueue delay for check activating timeout after %v.", timeout)
logger.Infof("Enqueue with delay to check activation timeout after %v.", timeout)

Copy link
Copy Markdown
Contributor

@markusthoemmes markusthoemmes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to add to the pile but there is #4094 trying to fix this as well.

There have been discussions on how to best attack this and the kind of agreed upon solution is to surface the revision's status in the PodAutoscaler so it can act on that. We already have deployment timeouts and handling of defunct pods in the revision handling so that makes sure we're not duplicating things. The ActivatingTimeout you're proposing essentially duplicates the timeout logic that already lives on the revision level.

/hold

To make sure this gets discussed.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2019
@lvjing2
Copy link
Copy Markdown
Contributor Author

lvjing2 commented Jun 5, 2019

@markusthoemmes Hi, I didn't recognized that there is already timeout for revision. if so, I think it would be better if we can unify the timeout status of revision and kpa by #4094 .

@markusthoemmes
Copy link
Copy Markdown
Contributor

/close

Per last comment.

@knative-prow-robot
Copy link
Copy Markdown
Contributor

@markusthoemmes: Closed this PR.

Details

In response to this:

/close

Per last comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/API API objects and controllers area/autoscale cla: yes Indicates the PR's author has signed the CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants