Skip to content

Add leader election#6683

Merged
knative-prow-robot merged 13 commits into
knative:masterfrom
pmorie:leader-election
Feb 27, 2020
Merged

Add leader election#6683
knative-prow-robot merged 13 commits into
knative:masterfrom
pmorie:leader-election

Conversation

@pmorie
Copy link
Copy Markdown
Contributor

@pmorie pmorie commented Jan 29, 2020

Fixes knative/pkg#1007

Proposed Changes

  • depends on add leader election support to sharedmain pkg#1019 for changes to sharedmain, which is used by:
    • controller
    • autoscaler-hpa
    • networking-certmanager
    • networking-istio
    • networking-ns-cert
    • webhook
  • /config is changed so that it is simple to enable leader election, but it is not enabled by default

Release Note

Add support for leader election

@knative-prow-robot knative-prow-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 29, 2020
@googlebot googlebot added the cla: yes Indicates the PR's author has signed the CLA. label Jan 29, 2020
@knative-prow-robot knative-prow-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jan 29, 2020
Comment thread config/config-leader-election.yaml Outdated
Comment thread config/config-leader-election.yaml Outdated
Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll TAL at the pkg PR next.

Comment thread Gopkg.toml Outdated
[[override]]
name = "knative.dev/pkg"
branch = "master"
source = "github.com/pmorie/pkg"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pro tip: I usually throw a // TODO(mattmoor): DO NOT SUBMIT around this, so that the bot catches it and leaves a comment.

Comment thread cmd/autoscaler/main.go Outdated
if !leConfig.LeaderElect {
log.Printf("autoscaler will not run in leader-elected mode")
run(ctx)
panic("unreachable")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this not hit when the context is cancelled?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(which happens when K8s sends SIGTERM)

Comment thread cmd/autoscaler/main.go
return err == nil, nil
}); perr != nil {
log.Fatal("Timed out attempting to get k8s version: ", err)
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you'd left this as is, then you could save some kubeclient.Get calls below.

Comment thread cmd/autoscaler/main.go Outdated
if err != nil {
logger.Fatalw("Failed to get hostname for leader election", zap.Error(err))
}
id = id + "_" + string(uuid.NewUUID())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hoisting this little bit into a helper would document it in the most straightforward way. At first I read the comment above and was like: "That's not what os.Hostname() does...?" :)

# giving up; 10 seconds is the value used by core kubernetes controllers.
renewDeadline: "10s"
# retryPeriod is how long the leader election client waits between tries of
# actions; 2 seconds is the value used by core kuberntes controllers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# actions; 2 seconds is the value used by core kuberntes controllers.
# actions; 2 seconds is the value used by core kubernetes controllers.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems to be a problem

Comment thread config/core/configmaps/leader-election.yaml
Comment thread cmd/autoscaler/main.go Outdated
logger.Fatalw("leaderelection lost")
},
},
// WatchDog: electionChecker,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this?

Comment thread cmd/autoscaler/main.go Outdated
if err := eg.Wait(); err != nil && err != http.ErrServerClosed {
logger.Errorw("Error while running server", zap.Error(err))
}
panic("unreachable")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use logger.Fatal in place of panic

Comment thread cmd/autoscaler/main.go Outdated
eg, egCtx := errgroup.WithContext(ctx)
eg.Go(func() error {
return customMetricsAdapter.Run(ctx.Done())
leaderelection.RunOrDie(ctx, leaderelection.LeaderElectionConfig{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC we get all of this for free in the other binaries due to their use of sharedmain?

})
if err != nil {
logger.Fatalw("Failed to create admission controller", zap.Error(err))
logger.Fatalw("error creating lock: %v", err)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we leaking resources on uninstall?

@knative-prow-robot knative-prow-robot added the area/API API objects and controllers label Feb 6, 2020
@pmorie
Copy link
Copy Markdown
Contributor Author

pmorie commented Feb 24, 2020

/retest

@pmorie pmorie changed the title WIP: Add leader election Add leader election Feb 24, 2020
@knative-prow-robot knative-prow-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 24, 2020
@pmorie
Copy link
Copy Markdown
Contributor Author

pmorie commented Feb 25, 2020

/retest

@pmorie pmorie closed this Feb 25, 2020
@pmorie pmorie reopened this Feb 25, 2020
Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lint

Copy link
Copy Markdown
Contributor

@knative-prow-robot knative-prow-robot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vagababov: 1 warning.

Details

In response to this:

/lint

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment thread pkg/apis/config/validation/leader_election.go Outdated
Comment thread pkg/apis/config/validation/leader_election.go
Copy link
Copy Markdown
Contributor

@vagababov vagababov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2020
Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve
/hold

Mostly nits. If we want to get this baking, feel free to /unhold and follow-up, or I can restamp if you want to knock these off now.

Comment thread cmd/webhook/main.go Outdated
// config validation constructors
tracingconfig "knative.dev/pkg/tracing/config"
defaultconfig "knative.dev/serving/pkg/apis/config"
configvalidation "knative.dev/serving/pkg/apis/config/validation"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of an odd place to put this. Can we make it knative.dev/serving/pkg/leaderelection?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: the defaulting config is here because defaulting happens as part of the pkg/apis stuff

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. I had something similar in eventing that already went in, mind if we do this in a follow-up?

@@ -0,0 +1,69 @@
# Copyright 2018 The Knative Authors
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2018 The Knative Authors
# Copyright 2020 The Knative Authors

resourceLock: "leases"
leaseDuration: "15s"
renewDeadline: "10s"
retryPeriod: "2s"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the above should come from defaults, right? If so we generally let the defaulting provide them so that folks can safely tweak the knobs via kubectl edit and not upset the three-way merge.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, I will do that in a follow-up.

# giving up; 10 seconds is the value used by core kubernetes controllers.
renewDeadline: "10s"
# retryPeriod is how long the leader election client waits between tries of
# actions; 2 seconds is the value used by core kuberntes controllers.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems to be a problem


// ValidateLeaderElectionConfig enriches the leader election config validation
// with extra validations specific to serving.
func ValidateLeaderElectionConfig(configMap *corev1.ConfigMap) (*kle.Config, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go with pkg/leaderelection (comment above) then we should rename ValidateConfig to avoid stutter.

@knative-prow-robot knative-prow-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 27, 2020
@knative-prow-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mattmoor, pmorie

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow-robot knative-prow-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 27, 2020
@pmorie
Copy link
Copy Markdown
Contributor Author

pmorie commented Feb 27, 2020

/unhold

@knative-prow-robot knative-prow-robot removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Feb 27, 2020
@pmorie
Copy link
Copy Markdown
Contributor Author

pmorie commented Feb 27, 2020

/retest

Looks like the failure in auto tls tests may be related to #7018

Comment thread config/config-leader-election.yaml
@knative-prow-robot
Copy link
Copy Markdown
Contributor

knative-prow-robot commented Feb 27, 2020

@pmorie: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-knative-serving-integration-tests-go114 9799bb1 link /test pull-knative-serving-integration-tests-go114

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@knative-metrics-robot
Copy link
Copy Markdown

The following is the coverage report on the affected files.
Say /test pull-knative-serving-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/leaderelection/config.go Do not exist 85.7%

Copy link
Copy Markdown
Member

@mattmoor mattmoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@knative-prow-robot knative-prow-robot added the lgtm Indicates that a PR is ready to be merged. label Feb 27, 2020
@pmorie
Copy link
Copy Markdown
Contributor Author

pmorie commented Feb 27, 2020

/retest

@knative-prow-robot knative-prow-robot merged commit 37852e6 into knative:master Feb 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/API API objects and controllers area/autoscale cla: yes Indicates the PR's author has signed the CLA. lgtm Indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Track] Active/passive HA

7 participants