Refactor garbage collection from configuration reconciler#4876
Conversation
knative-prow-robot
left a comment
There was a problem hiding this comment.
@taragu: 7 warnings.
Details
In response to this:
/lint
Fixes #3910
Proposed Changes
- Separate GC into it's own Reconciler
Release Note
NONE
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
|
|
||
| type cfgKey struct{} | ||
|
|
||
| // +k8s:deepcopy-gen=false |
There was a problem hiding this comment.
Golint comments: comment on exported type Config should be of the form "Config ..." (with optional leading article). More info.
| RevisionGC *gc.Config | ||
| } | ||
|
|
||
| func FromContext(ctx context.Context) *Config { |
There was a problem hiding this comment.
Golint comments: exported function FromContext should have comment or be unexported. More info.
| return ctx.Value(cfgKey{}).(*Config) | ||
| } | ||
|
|
||
| func ToContext(ctx context.Context, c *Config) context.Context { |
There was a problem hiding this comment.
Golint comments: exported function ToContext should have comment or be unexported. More info.
| return context.WithValue(ctx, cfgKey{}, c) | ||
| } | ||
|
|
||
| // +k8s:deepcopy-gen=false |
There was a problem hiding this comment.
Golint comments: comment on exported type Store should be of the form "Store ..." (with optional leading article). More info.
| *configmap.UntypedStore | ||
| } | ||
|
|
||
| func (s *Store) ToContext(ctx context.Context) context.Context { |
There was a problem hiding this comment.
Golint comments: exported method Store.ToContext should have comment or be unexported. More info.
| return ToContext(ctx, s.Load()) | ||
| } | ||
|
|
||
| func (s *Store) Load() *Config { |
There was a problem hiding this comment.
Golint comments: exported method Store.Load should have comment or be unexported. More info.
| } | ||
| } | ||
|
|
||
| func NewStore(logger configmap.Logger, minRevisionTimeout time.Duration) *Store { |
There was a problem hiding this comment.
Golint comments: exported function NewStore should have comment or be unexported. More info.
|
Hi @taragu. Thanks for your PR. I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
|
/assign @markusthoemmes |
|
/ok-to-test |
vagababov
left a comment
There was a problem hiding this comment.
Please fix all the linter errors as well.
| // Reconcile this copy of the configuration and then write back any status | ||
| // updates regardless of whether the reconciliation errored out. | ||
| reconcileErr := c.reconcile(ctx, config) | ||
| if equality.Semantic.DeepEqual(original.Status, config.Status) { |
There was a problem hiding this comment.
you are not really updating any statuses, so this whole block is redundant.
| c.Logger.Info("Setting up event handlers") | ||
| configurationInformer.Informer().AddEventHandler(controller.HandleAll(impl.Enqueue)) | ||
|
|
||
| revisionInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{ |
There was a problem hiding this comment.
I am not sure that this is how you want to set the handlers.
You don't really care about changes to the revisions or configs (I guess creation of a revision will trigger a change to the configuration.latestCreatedRevisionName)...
Probably just global resync every N hours would suffice.
| // If the spec has changed, then assume we need an upgrade and issue a patch to trigger | ||
| // the webhook to upgrade via defaulting. Status updates do not trigger this due to the | ||
| // use of the /status resource. | ||
| if !equality.Semantic.DeepEqual(original.Spec, config.Spec) { |
There was a problem hiding this comment.
nor you're changing any specs.
|
/lint |
knative-prow-robot
left a comment
There was a problem hiding this comment.
@taragu: 0 warnings.
Details
In response to this:
/lint
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
907fedd to
4f4739b
Compare
|
@vagababov I've updated the resync period to every 4 hours. @dgerd WDYT? |
| cmw configmap.Watcher, | ||
| ) *controller.Impl { | ||
|
|
||
| configurationInformer := configurationinformer.Get(ctx) |
There was a problem hiding this comment.
You should use
to get actually resync behaviour you're looking for.| ) *controller.Impl { | ||
|
|
||
| // Globally resync every couple of hours | ||
| controller.WithResyncPeriod(ctx, resyncInterval) |
There was a problem hiding this comment.
The way this works is:
ctx := controller.With...
The old context remains, in theory, immutable.
| ) | ||
|
|
||
| // Reconciler implements controller.Reconciler for Garbage Collection resources. | ||
| type Reconciler struct { |
There was a problem hiding this comment.
There is no reason for this to be a public type.
| original, err := c.configurationLister.Configurations(namespace).Get(name) | ||
| if errors.IsNotFound(err) { | ||
| // The resource no longer exists, in which case we stop processing. | ||
| logger.Errorf("configuration %q in work queue no longer exists", key) |
There was a problem hiding this comment.
| logger.Errorf("configuration %q in work queue no longer exists", key) | |
| logger.Errorf("Configuration %q in work queue no longer exists", key) |
| } | ||
|
|
||
| // Don't modify the informer's copy. | ||
| config := original.DeepCopy() |
There was a problem hiding this comment.
you don't really modify configs, so this is redundant.
| // Don't modify the informer's copy. | ||
| config := original.DeepCopy() | ||
|
|
||
| // Reconcile this copy of the configuration and then write back any status |
There was a problem hiding this comment.
Comment does not reflect the reality.
| return revs[j].CreationTimestamp.Before(&revs[i].CreationTimestamp) | ||
| }) | ||
|
|
||
| for _, rev := range revs[gcSkipOffset:] { |
There was a problem hiding this comment.
I am not a fan of this logic, since let's presume I have
r1, r2,..., r10 and gcskipoffset is 8 and traffic uses r8 and r9 -- we'll never collect anything.
But I guess it depends on the point of view
/rant
@dgerd
There was a problem hiding this comment.
I think its mainly about letting users have so many latest revisions and not worrying too much about potentially overshooting revision count (they arent that expensive, especially when scaled down).
Either way though - IMO we should try and leave behavior as is for this change and do any modifications to behavior afterward.
There was a problem hiding this comment.
Well they aren't cheap. It's at least 3 K8s services/endpoints pairs and additional work in the autoscaler. It's not very expensive, but not cheap either.
It's more of a rant, than a call for action :-)
| err := c.ServingClientSet.ServingV1alpha1().Revisions(rev.Namespace).Delete(rev.Name, &metav1.DeleteOptions{}) | ||
| if err != nil { | ||
| logger.Errorf("Failed to delete stale revision: %v", err) | ||
| return err |
There was a problem hiding this comment.
Do we want perhaps to continue collecting the rest of them?
Also error.Wrapf() would be nice to include the actual revision name being deleted
| cfg := configns.FromContext(ctx).RevisionGC | ||
| logger := logging.FromContext(ctx) | ||
|
|
||
| if config.Status.LatestReadyRevisionName == rev.Name { |
There was a problem hiding this comment.
Well, I guess you can move this to the top, since it does not use the first two variables, so let's save a few cycles.
greghaynes
left a comment
There was a problem hiding this comment.
Just minor comment, overall I think this looks great!
| // The set of controllers this controller process runs. | ||
| "knative.dev/pkg/injection/sharedmain" | ||
| "knative.dev/serving/pkg/reconciler/configuration" | ||
| "knative.dev/serving/pkg/reconciler/gc" // This defines the shared main for injected controllers. |
There was a problem hiding this comment.
I think the comment here can be removed (its no longer valid)
|
|
||
| const ( | ||
| controllerAgentName = "gc-controller" | ||
| resyncInterval = 4 * time.Hour |
There was a problem hiding this comment.
Nit: I think its fine for us to decrease this value but with a change like this I prefer to keep our existing behavior first and then update configuration like this in a later change - makes it easier to notice if we broke anything with this change. Currently I believe this value is 10 * time.Hour.
There was a problem hiding this comment.
Actually, rather than statically set this to 10, can we keep using controller.GetResyncPeriod(ctx)?
There was a problem hiding this comment.
@greghaynes I was talking to @vagababov, he mentioned he had a discussion with @dgerd and decided to resync every hour here. @vagababov what's the reasoning behind it?
There was a problem hiding this comment.
Dan's point was that if users set their GC collection time to a short window, e.g. 1 hour, keeping them around for 10 hours is not very nice.
There was a problem hiding this comment.
Taking a look at config-gc.yaml it looks like we prevent (or maybe just highly encourage) stale-revision-timeout from going lower than the resync period.
# Duration since a route has been pointed at a revision before it should be GC'd
# This minus lastpinned-debounce be longer than the controller resync period (10 hours)
stale-revision-timeout: "15h"
We likely will want to make this at least twice as fast as the stale-revision-timeout to avoid having it just miss the resync, but I am going to agree with greg on this to keep it the same for this change and follow up with lowering it.
| return revs[j].CreationTimestamp.Before(&revs[i].CreationTimestamp) | ||
| }) | ||
|
|
||
| for _, rev := range revs[gcSkipOffset:] { |
There was a problem hiding this comment.
I think its mainly about letting users have so many latest revisions and not worrying too much about potentially overshooting revision count (they arent that expensive, especially when scaled down).
Either way though - IMO we should try and leave behavior as is for this change and do any modifications to behavior afterward.
|
@vagababov @dgerd @greghaynes I've updated the PR to keep the existing resync period. Would you please review this PR again? |
|
/test pull-knative-serving-go-coverage |
mattmoor
left a comment
There was a problem hiding this comment.
/hold
Couple of comments inline, one that's quite important...
Given that it takes bake time to uncover GC bugs (like those found here) I am inclined to hold this until after the 0.8 cut and land it right after.
-M
| "knative.dev/serving/pkg/reconciler/service" | ||
|
|
||
| // This defines the shared main for injected controllers. | ||
| "knative.dev/pkg/injection/sharedmain" |
There was a problem hiding this comment.
Can you leave this where it is? It is now grouped under an irrelevant comment.
| } | ||
|
|
||
| return c.gcRevisions(ctx, config) | ||
| return nil |
There was a problem hiding this comment.
I think there's a bunch more code you can blast from orbit because config-gc is the only configmap consumed by this controller:
If you do it here, then the same logic for the GC controller will show up as moves and be easier to review
| "knative.dev/serving/pkg/apis/serving/v1beta1" | ||
| listers "knative.dev/serving/pkg/client/listers/serving/v1alpha1" | ||
| pkgreconciler "knative.dev/serving/pkg/reconciler" | ||
| configns "knative.dev/serving/pkg/reconciler/configuration/config" |
There was a problem hiding this comment.
As above, this should move to this controller's own directory.
| ) | ||
|
|
||
| const ( | ||
| controllerAgentName = "gc-controller" |
| revisionLister: revisionInformer.Lister(), | ||
| } | ||
| impl := controller.NewImpl(c, c.Logger, "Garbage Collection") | ||
|
|
There was a problem hiding this comment.
You aren't enqueuing any events, so this controller won't do anything :(
There was a problem hiding this comment.
Something like:
serving/pkg/reconciler/configuration/controller.go
Lines 51 to 57 in 52a6e2c
There was a problem hiding this comment.
@mattmoor I see. In this case, I don't think this controller needs to respond to any specific configuration or revision events, because we just want to sync globally every few hours. Does the following do that?
configStore := configns.NewStore(c.Logger.Named("config-store"), controller.GetResyncPeriod(ctx))
configStore.WatchConfigs(c.ConfigMapWatcher)
There was a problem hiding this comment.
No, that just registers callbacks when the configmap changes to update the store.
You need something like this:
serving/pkg/reconciler/revision/controller.go
Lines 111 to 117 in 74e366a
It looks like to add global resync will require a change to the store to expose onAfterStore like this:
| c.Logger.Info("Setting up ConfigMap receivers") | ||
| configStore := configns.NewStore(c.Logger.Named("config-store"), controller.GetResyncPeriod(ctx)) | ||
| configStore.WatchConfigs(c.ConfigMapWatcher) | ||
| c.configStore = configStore |
There was a problem hiding this comment.
We should probably have a global resync on changes to the configmap
|
It seems we need to add handler, but just process everything. 🤦♂️
…On Tuesday, July 30, 2019, Tara Gu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In pkg/reconciler/gc/controller.go
<#4876 (comment)>:
> +// NewController creates a new Garbage Collection controller
+func NewController(
+ ctx context.Context,
+ cmw configmap.Watcher,
+) *controller.Impl {
+
+ configurationInformer := configurationinformer.Get(ctx)
+ revisionInformer := revisioninformer.Get(ctx)
+
+ c := &reconciler{
+ Base: pkgreconciler.NewBase(ctx, controllerAgentName, cmw),
+ configurationLister: configurationInformer.Lister(),
+ revisionLister: revisionInformer.Lister(),
+ }
+ impl := controller.NewImpl(c, c.Logger, "Garbage Collection")
+
@mattmoor <https://github.com/mattmoor> I see. In this case, I don't
think this controller needs to respond to any specific configuration or
revision events, because we just want to sync globally every few hours.
Does the following do that?
configStore := configns.NewStore(c.Logger.Named("config-store"), controller.GetResyncPeriod(ctx))
configStore.WatchConfigs(c.ConfigMapWatcher)
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#4876?email_source=notifications&email_token=AAF2WX5WMUSUQPJN54EJT23QCA3YZA5CNFSM4IFZKTLKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOB77Q7WY#discussion_r308707334>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAF2WXYXESLOQFQA5PR4GT3QCA3YZANCNFSM4IFZKTLA>
.
|
2131906 to
01a059c
Compare
|
@vagababov I've talked to @mattmoor on slack, and agreed that we should match the event handlers of revision gc controller and the existing configuration controller to minimize risk as we head into v1.0. I've updated the gc controller accordingly. |
|
|
||
| // Inject the fake informers we need. | ||
| _ "knative.dev/serving/pkg/client/injection/informers/serving/v1alpha1/configuration/fake" | ||
| _ "knative.dev/serving/pkg/client/injection/informers/serving/v1alpha1/revision/fake" |
There was a problem hiding this comment.
Can you leave these where they are (with comment)?
| revisioninformer "knative.dev/serving/pkg/client/injection/informers/serving/v1alpha1/revision" | ||
| "knative.dev/serving/pkg/reconciler" | ||
| configns "knative.dev/serving/pkg/reconciler/configuration/config" | ||
| configns "knative.dev/serving/pkg/reconciler/gc/config" |
There was a problem hiding this comment.
This should not be needed anymore.
bdc9ec8 to
5c6af10
Compare
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: mattmoor, taragu The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test pull-knative-serving-integration-tests |
|
The following tests are currently flaky. Running them again to verify...
Automatically retrying... |
|
/test pull-knative-serving-integration-tests |
1 similar comment
|
/test pull-knative-serving-integration-tests |
|
/test pull-knative-serving-build-tests |
/lint
Fixes #3910
Proposed Changes
Release Note