Skip to content

Skip GC proces when revision is still used by route#4235

Closed
nak3 wants to merge 2 commits into
knative:masterfrom
nak3:keep-revision
Closed

Skip GC proces when revision is still used by route#4235
nak3 wants to merge 2 commits into
knative:masterfrom
nak3:keep-revision

Conversation

@nak3
Copy link
Copy Markdown
Contributor

@nak3 nak3 commented Jun 4, 2019

Currently GC process only skips LatestReadyRevision. Hence, even if
route uses non-LatestReadyRevision, GC removes the revision. It causes
service outage with RevisionMissing error suddenly.

This patch changes to stop GC when revision is still used by route.

/lint

Fixes #4234

(And I am wondering if #4208 (comment) original report was caused by this.)

Proposed Changes

  • Skip GC proces when revision is still used by route

Release Note

Revision is not GC'd when it is used by route

Currently GC process only skips `LatestReadyRevision`. Hence, even if
route uses non-LatestReadyRevision, GC removes the revision. It causes
service outage with `RevisionMissing` error suddenly.

This patch changes to stop GC when revision is still used by route.
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jun 4, 2019
@knative-prow-robot knative-prow-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 4, 2019
Copy link
Copy Markdown
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nak3: 0 warnings.

Details

In response to this:

Currently GC process only skips LatestReadyRevision. Hence, even if
route uses non-LatestReadyRevision, GC removes the revision. It causes
service outage with RevisionMissing error suddenly.

This patch changes to stop GC when revision is still used by route.

/lint

Fixes #4234

(And I am wondering if #4208 (comment) original report was caused by this.)

Proposed Changes

  • Skip GC proces when revision is still used by route

Release Note

Revision is not GC'd when it is used by route

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot
Copy link
Copy Markdown
Contributor

Hi @nak3. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jun 4, 2019
@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: nak3
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: mattmoor

If they are not already assigned, you can assign the PR to them by writing /assign @mattmoor in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Jun 4, 2019
return nil
}

route, err := c.routeLister.Routes(config.Namespace).Get(config.Name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on the assumption that the route name equals the configuration name, right? I think that might only coincidently be true for KnServices but is certainly not something we can depend on generally.

Copy link
Copy Markdown
Contributor Author

@nak3 nak3 Jun 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct. I assumed that config and route are always same. But now, updated to get the route by label's (serving.knative.dev/route) value.

@markusthoemmes
Copy link
Copy Markdown
Contributor

/ok-to-test

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jun 4, 2019
@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/configuration/configuration.go 88.7% 88.5% -0.1

@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on pkg/.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/reconciler/configuration/configuration.go 88.7% 87.6% -1.1

@duglin
Copy link
Copy Markdown

duglin commented Jun 4, 2019

I think this might address #4208 as well, but I'm pretty sure I saw the issue even with the longer/default GC times, but I'm not able to reliably reproduce it. So let's be optimistic and hope this got it all.

@mattmoor
Copy link
Copy Markdown
Member

mattmoor commented Jun 4, 2019

/hold

Currently GC process only skips LatestReadyRevision

cc @greghaynes

This is inaccurate. The GC process skips this, yes, but it also skips:

  • The last N generations of Revision
  • Any revision that has received a "heartbeat" in the last M seconds/minutes/hours

The contract with the route controller is that any routable revision receives heartbeats every resync period (currently 10h).

We should (and probably don't) reject GC settings that drop the period below the resync period, which won't end well.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 4, 2019
@vagababov
Copy link
Copy Markdown
Contributor

Any revision that has received a "heartbeat" in the last M seconds/minutes/hours is that code elsewhere?

@greghaynes
Copy link
Copy Markdown
Contributor

greghaynes commented Jun 4, 2019

Correct, as is any route to a revision (even with 0% traffic) should prevent a revision from being GC'd, if we've seen behavior otherwise I'd really like to know the scenario to cause this bug.

re: GC time - we've talked about decoupling the GC controller from config and IMO this is one good reason to do that: Right now our min GC time is a function of config controller resync period which seems like a odd coupling for a user.

@greghaynes
Copy link
Copy Markdown
Contributor

greghaynes commented Jun 4, 2019

@vagababov https://github.com/knative/serving/blob/master/pkg/reconciler/route/reconcile_resources.go#L256 is the heavy lifting bit of code. The gc revisions code in the config controller uses annotation to determine what to GC

@greghaynes
Copy link
Copy Markdown
Contributor

oh I see the issue now - It's what @mattmoor mentions about setting the GC period low. Since the heartbeat can only be guaranteed to happen as frequent as the route resync period (12h) if you lower your gc period below this then you'll GC revisions before the heartbeat is updated. An easy fix of sanity checking GC period seems fine to me

@greghaynes
Copy link
Copy Markdown
Contributor

Pushed #4245

@nak3
Copy link
Copy Markdown
Contributor Author

nak3 commented Jun 5, 2019

I agree and I am thinking #4245 is better.

Anyway, thank you all taking look at this PR.

@mattmoor
Copy link
Copy Markdown
Member

mattmoor commented Jun 5, 2019

Thanks @nak3

@mattmoor mattmoor closed this Jun 5, 2019
@nak3 nak3 deleted the keep-revision branch June 23, 2019 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/API API objects and controllers cla: yes Indicates the PR's author has signed the CLA. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vanishing revisions even when they are still referred from route

9 participants