Skip GC proces when revision is still used by route#4235
Conversation
Currently GC process only skips `LatestReadyRevision`. Hence, even if route uses non-LatestReadyRevision, GC removes the revision. It causes service outage with `RevisionMissing` error suddenly. This patch changes to stop GC when revision is still used by route.
knative-prow-robot
left a comment
There was a problem hiding this comment.
@nak3: 0 warnings.
Details
In response to this:
Currently GC process only skips
LatestReadyRevision. Hence, even if
route uses non-LatestReadyRevision, GC removes the revision. It causes
service outage withRevisionMissingerror suddenly.This patch changes to stop GC when revision is still used by route.
/lint
Fixes #4234
(And I am wondering if #4208 (comment) original report was caused by this.)
Proposed Changes
- Skip GC proces when revision is still used by route
Release Note
Revision is not GC'd when it is used by route
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
|
Hi @nak3. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nak3 If they are not already assigned, you can assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
| return nil | ||
| } | ||
|
|
||
| route, err := c.routeLister.Routes(config.Namespace).Get(config.Name) |
There was a problem hiding this comment.
This is based on the assumption that the route name equals the configuration name, right? I think that might only coincidently be true for KnServices but is certainly not something we can depend on generally.
There was a problem hiding this comment.
Yes, you are correct. I assumed that config and route are always same. But now, updated to get the route by label's (serving.knative.dev/route) value.
|
/ok-to-test |
|
The following is the coverage report on pkg/.
|
|
The following is the coverage report on pkg/.
|
|
I think this might address #4208 as well, but I'm pretty sure I saw the issue even with the longer/default GC times, but I'm not able to reliably reproduce it. So let's be optimistic and hope this got it all. |
|
/hold
cc @greghaynes This is inaccurate. The GC process skips this, yes, but it also skips:
The contract with the route controller is that any routable revision receives heartbeats every resync period (currently 10h). We should (and probably don't) reject GC settings that drop the period below the resync period, which won't end well. |
|
|
|
Correct, as is any route to a revision (even with 0% traffic) should prevent a revision from being GC'd, if we've seen behavior otherwise I'd really like to know the scenario to cause this bug. re: GC time - we've talked about decoupling the GC controller from config and IMO this is one good reason to do that: Right now our min GC time is a function of config controller resync period which seems like a odd coupling for a user. |
|
@vagababov https://github.com/knative/serving/blob/master/pkg/reconciler/route/reconcile_resources.go#L256 is the heavy lifting bit of code. The gc revisions code in the config controller uses annotation to determine what to GC |
|
oh I see the issue now - It's what @mattmoor mentions about setting the GC period low. Since the heartbeat can only be guaranteed to happen as frequent as the route resync period (12h) if you lower your gc period below this then you'll GC revisions before the heartbeat is updated. An easy fix of sanity checking GC period seems fine to me |
|
Pushed #4245 |
|
I agree and I am thinking #4245 is better. Anyway, thank you all taking look at this PR. |
|
Thanks @nak3 |
Currently GC process only skips
LatestReadyRevision. Hence, even ifroute uses non-LatestReadyRevision, GC removes the revision. It causes
service outage with
RevisionMissingerror suddenly.This patch changes to stop GC when revision is still used by route.
/lint
Fixes #4234
(And I am wondering if #4208 (comment) original report was caused by this.)
Proposed Changes
Release Note