Skip to content

Refactor garbage collection from configuration reconciler#4876

Merged
knative-prow-robot merged 1 commit into
knative:masterfrom
taragu:gc-reconciler
Aug 7, 2019
Merged

Refactor garbage collection from configuration reconciler#4876
knative-prow-robot merged 1 commit into
knative:masterfrom
taragu:gc-reconciler

Conversation

@taragu
Copy link
Copy Markdown
Contributor

@taragu taragu commented Jul 22, 2019

/lint

Fixes #3910

Proposed Changes

  • Separate GC into it's own Reconciler

Release Note

NONE

@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jul 22, 2019
@knative-prow-robot knative-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jul 22, 2019
@knative-prow-robot knative-prow-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Jul 22, 2019
Copy link
Copy Markdown
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taragu: 7 warnings.

Details

In response to this:

/lint

Fixes #3910

Proposed Changes

  • Separate GC into it's own Reconciler

Release Note

NONE

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.


type cfgKey struct{}

// +k8s:deepcopy-gen=false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: comment on exported type Config should be of the form "Config ..." (with optional leading article). More info.

RevisionGC *gc.Config
}

func FromContext(ctx context.Context) *Config {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function FromContext should have comment or be unexported. More info.

return ctx.Value(cfgKey{}).(*Config)
}

func ToContext(ctx context.Context, c *Config) context.Context {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function ToContext should have comment or be unexported. More info.

return context.WithValue(ctx, cfgKey{}, c)
}

// +k8s:deepcopy-gen=false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: comment on exported type Store should be of the form "Store ..." (with optional leading article). More info.

*configmap.UntypedStore
}

func (s *Store) ToContext(ctx context.Context) context.Context {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported method Store.ToContext should have comment or be unexported. More info.

return ToContext(ctx, s.Load())
}

func (s *Store) Load() *Config {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported method Store.Load should have comment or be unexported. More info.

Comment thread pkg/reconciler/gc/config/store.go Outdated
}
}

func NewStore(logger configmap.Logger, minRevisionTimeout time.Duration) *Store {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function NewStore should have comment or be unexported. More info.

@knative-prow-robot
Copy link
Copy Markdown
Contributor

Hi @taragu. Thanks for your PR.

I'm waiting for a knative member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Jul 22, 2019
@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Jul 22, 2019

/assign @markusthoemmes

@greghaynes
Copy link
Copy Markdown
Contributor

/ok-to-test

@knative-prow-robot knative-prow-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jul 22, 2019
Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix all the linter errors as well.

Comment thread pkg/reconciler/gc/gc.go Outdated
// Reconcile this copy of the configuration and then write back any status
// updates regardless of whether the reconciliation errored out.
reconcileErr := c.reconcile(ctx, config)
if equality.Semantic.DeepEqual(original.Status, config.Status) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are not really updating any statuses, so this whole block is redundant.

c.Logger.Info("Setting up event handlers")
configurationInformer.Informer().AddEventHandler(controller.HandleAll(impl.Enqueue))

revisionInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that this is how you want to set the handlers.
You don't really care about changes to the revisions or configs (I guess creation of a revision will trigger a change to the configuration.latestCreatedRevisionName)...
Probably just global resync every N hours would suffice.

Comment thread pkg/reconciler/gc/gc.go Outdated
// If the spec has changed, then assume we need an upgrade and issue a patch to trigger
// the webhook to upgrade via defaulting. Status updates do not trigger this due to the
// use of the /status resource.
if !equality.Semantic.DeepEqual(original.Spec, config.Spec) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nor you're changing any specs.

@knative-prow-robot knative-prow-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Jul 22, 2019
@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Jul 22, 2019

/lint

Copy link
Copy Markdown
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@taragu: 0 warnings.

Details

In response to this:

/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@taragu taragu force-pushed the gc-reconciler branch 2 times, most recently from 907fedd to 4f4739b Compare July 24, 2019 17:29
@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Jul 24, 2019

@vagababov I've updated the resync period to every 4 hours. @dgerd WDYT?

cmw configmap.Watcher,
) *controller.Impl {

configurationInformer := configurationinformer.Get(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use

func WithResyncPeriod(ctx context.Context, resync time.Duration) context.Context {
to get actually resync behaviour you're looking for.

Comment thread pkg/reconciler/gc/controller.go Outdated
) *controller.Impl {

// Globally resync every couple of hours
controller.WithResyncPeriod(ctx, resyncInterval)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this works is:
ctx := controller.With...
The old context remains, in theory, immutable.

Comment thread pkg/reconciler/gc/gc.go Outdated
)

// Reconciler implements controller.Reconciler for Garbage Collection resources.
type Reconciler struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason for this to be a public type.

Comment thread pkg/reconciler/gc/gc.go Outdated
original, err := c.configurationLister.Configurations(namespace).Get(name)
if errors.IsNotFound(err) {
// The resource no longer exists, in which case we stop processing.
logger.Errorf("configuration %q in work queue no longer exists", key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Errorf("configuration %q in work queue no longer exists", key)
logger.Errorf("Configuration %q in work queue no longer exists", key)

Comment thread pkg/reconciler/gc/gc.go Outdated
}

// Don't modify the informer's copy.
config := original.DeepCopy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't really modify configs, so this is redundant.

Comment thread pkg/reconciler/gc/gc.go Outdated
// Don't modify the informer's copy.
config := original.DeepCopy()

// Reconcile this copy of the configuration and then write back any status
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment does not reflect the reality.

Comment thread pkg/reconciler/gc/gc.go
return revs[j].CreationTimestamp.Before(&revs[i].CreationTimestamp)
})

for _, rev := range revs[gcSkipOffset:] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a fan of this logic, since let's presume I have
r1, r2,..., r10 and gcskipoffset is 8 and traffic uses r8 and r9 -- we'll never collect anything.
But I guess it depends on the point of view

/rant
@dgerd

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its mainly about letting users have so many latest revisions and not worrying too much about potentially overshooting revision count (they arent that expensive, especially when scaled down).

Either way though - IMO we should try and leave behavior as is for this change and do any modifications to behavior afterward.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well they aren't cheap. It's at least 3 K8s services/endpoints pairs and additional work in the autoscaler. It's not very expensive, but not cheap either.
It's more of a rant, than a call for action :-)

Comment thread pkg/reconciler/gc/gc.go Outdated
err := c.ServingClientSet.ServingV1alpha1().Revisions(rev.Namespace).Delete(rev.Name, &metav1.DeleteOptions{})
if err != nil {
logger.Errorf("Failed to delete stale revision: %v", err)
return err
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want perhaps to continue collecting the rest of them?
Also error.Wrapf() would be nice to include the actual revision name being deleted

Comment thread pkg/reconciler/gc/gc.go
cfg := configns.FromContext(ctx).RevisionGC
logger := logging.FromContext(ctx)

if config.Status.LatestReadyRevisionName == rev.Name {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I guess you can move this to the top, since it does not use the first two variables, so let's save a few cycles.

Copy link
Copy Markdown
Contributor

@greghaynes greghaynes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just minor comment, overall I think this looks great!

Comment thread cmd/controller/main.go Outdated
// The set of controllers this controller process runs.
"knative.dev/pkg/injection/sharedmain"
"knative.dev/serving/pkg/reconciler/configuration"
"knative.dev/serving/pkg/reconciler/gc" // This defines the shared main for injected controllers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment here can be removed (its no longer valid)

Comment thread pkg/reconciler/gc/controller.go Outdated

const (
controllerAgentName = "gc-controller"
resyncInterval = 4 * time.Hour
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think its fine for us to decrease this value but with a change like this I prefer to keep our existing behavior first and then update configuration like this in a later change - makes it easier to notice if we broke anything with this change. Currently I believe this value is 10 * time.Hour.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, rather than statically set this to 10, can we keep using controller.GetResyncPeriod(ctx)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greghaynes I was talking to @vagababov, he mentioned he had a discussion with @dgerd and decided to resync every hour here. @vagababov what's the reasoning behind it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dan's point was that if users set their GC collection time to a short window, e.g. 1 hour, keeping them around for 10 hours is not very nice.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a look at config-gc.yaml it looks like we prevent (or maybe just highly encourage) stale-revision-timeout from going lower than the resync period.

    # Duration since a route has been pointed at a revision before it should be GC'd
    # This minus lastpinned-debounce be longer than the controller resync period (10 hours)
    stale-revision-timeout: "15h"

We likely will want to make this at least twice as fast as the stale-revision-timeout to avoid having it just miss the resync, but I am going to agree with greg on this to keep it the same for this change and follow up with lowering it.

Comment thread pkg/reconciler/gc/gc.go
return revs[j].CreationTimestamp.Before(&revs[i].CreationTimestamp)
})

for _, rev := range revs[gcSkipOffset:] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its mainly about letting users have so many latest revisions and not worrying too much about potentially overshooting revision count (they arent that expensive, especially when scaled down).

Either way though - IMO we should try and leave behavior as is for this change and do any modifications to behavior afterward.

@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Jul 26, 2019

@vagababov @dgerd @greghaynes I've updated the PR to keep the existing resync period. Would you please review this PR again?

@vagababov
Copy link
Copy Markdown
Contributor

/test pull-knative-serving-go-coverage

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2019
Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

Couple of comments inline, one that's quite important...

Given that it takes bake time to uncover GC bugs (like those found here) I am inclined to hold this until after the 0.8 cut and land it right after.
-M

Comment thread cmd/controller/main.go
"knative.dev/serving/pkg/reconciler/service"

// This defines the shared main for injected controllers.
"knative.dev/pkg/injection/sharedmain"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave this where it is? It is now grouped under an irrelevant comment.

}

return c.gcRevisions(ctx, config)
return nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bunch more code you can blast from orbit because config-gc is the only configmap consumed by this controller:

gc.ConfigName: gc.NewConfigFromConfigMapFunc(logger, minRevisionTimeout),

If you do it here, then the same logic for the GC controller will show up as moves and be easier to review

Comment thread pkg/reconciler/gc/gc.go Outdated
"knative.dev/serving/pkg/apis/serving/v1beta1"
listers "knative.dev/serving/pkg/client/listers/serving/v1alpha1"
pkgreconciler "knative.dev/serving/pkg/reconciler"
configns "knative.dev/serving/pkg/reconciler/configuration/config"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this should move to this controller's own directory.

Comment thread pkg/reconciler/gc/controller.go Outdated
)

const (
controllerAgentName = "gc-controller"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: revision-gc-controller

revisionLister: revisionInformer.Lister(),
}
impl := controller.NewImpl(c, c.Logger, "Garbage Collection")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't enqueuing any events, so this controller won't do anything :(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like:

c.Logger.Info("Setting up event handlers")
configurationInformer.Informer().AddEventHandler(controller.HandleAll(impl.Enqueue))
revisionInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{
FilterFunc: controller.Filter(v1alpha1.SchemeGroupVersion.WithKind("Configuration")),
Handler: controller.HandleAll(impl.EnqueueControllerOf),
})

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattmoor I see. In this case, I don't think this controller needs to respond to any specific configuration or revision events, because we just want to sync globally every few hours. Does the following do that?

	configStore := configns.NewStore(c.Logger.Named("config-store"), controller.GetResyncPeriod(ctx))
	configStore.WatchConfigs(c.ConfigMapWatcher)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that just registers callbacks when the configmap changes to update the store.

You need something like this:

resync := configmap.TypeFilter(configsToResync...)(func(string, interface{}) {
// Triggers syncs on all revisions when configuration
// changes
impl.GlobalResync(revisionInformer.Informer())
})
configStore := config.NewStore(c.Logger.Named("config-store"), resync)

It looks like to add global resync will require a change to the store to expose onAfterStore like this:

func NewStore(logger configmap.Logger, onAfterStore ...func(name string, value interface{})) *Store {

c.Logger.Info("Setting up ConfigMap receivers")
configStore := configns.NewStore(c.Logger.Named("config-store"), controller.GetResyncPeriod(ctx))
configStore.WatchConfigs(c.ConfigMapWatcher)
c.configStore = configStore
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably have a global resync on changes to the configmap

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2019
@vagababov
Copy link
Copy Markdown
Contributor

vagababov commented Jul 30, 2019 via email

@knative-prow-robot knative-prow-robot added area/test-and-release It flags unit/e2e/conformance/perf test issues for product features and removed lgtm Indicates that a PR is ready to be merged. labels Jul 30, 2019
@taragu taragu force-pushed the gc-reconciler branch 2 times, most recently from 2131906 to 01a059c Compare July 30, 2019 19:00
@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Jul 30, 2019

@vagababov I've talked to @mattmoor on slack, and agreed that we should match the event handlers of revision gc controller and the existing configuration controller to minimize risk as we head into v1.0. I've updated the gc controller accordingly.

Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine by me.
I'll let @mattmoor to do the final review.


// Inject the fake informers we need.
_ "knative.dev/serving/pkg/client/injection/informers/serving/v1alpha1/configuration/fake"
_ "knative.dev/serving/pkg/client/injection/informers/serving/v1alpha1/revision/fake"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you leave these where they are (with comment)?

revisioninformer "knative.dev/serving/pkg/client/injection/informers/serving/v1alpha1/revision"
"knative.dev/serving/pkg/reconciler"
configns "knative.dev/serving/pkg/reconciler/configuration/config"
configns "knative.dev/serving/pkg/reconciler/gc/config"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be needed anymore.

@taragu taragu force-pushed the gc-reconciler branch 2 times, most recently from bdc9ec8 to 5c6af10 Compare August 6, 2019 17:30
Comment thread pkg/reconciler/configuration/configuration_test.go Outdated
Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel
/lgtm
/approve

@knative-prow-robot knative-prow-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Aug 7, 2019
@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, taragu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 7, 2019
@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Aug 7, 2019

/test pull-knative-serving-integration-tests

@knative-test-reporter-robot
Copy link
Copy Markdown

The following tests are currently flaky. Running them again to verify...

Test name Retries
pull-knative-serving-integration-tests 2/3

Automatically retrying...
/test pull-knative-serving-integration-tests

@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Aug 7, 2019

/test pull-knative-serving-integration-tests

1 similar comment
@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Aug 7, 2019

/test pull-knative-serving-integration-tests

@taragu
Copy link
Copy Markdown
Contributor Author

taragu commented Aug 7, 2019

/test pull-knative-serving-build-tests

@knative-prow-robot knative-prow-robot merged commit a6a69f0 into knative:master Aug 7, 2019
@taragu taragu deleted the gc-reconciler branch November 12, 2019 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/test-and-release It flags unit/e2e/conformance/perf test issues for product features cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Separate GC into it's own Reconciler

9 participants