Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions pkg/controller/kubelet-config/kubelet_config_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,8 @@ const (
// With the current rate-limiter in use (5ms*2^(maxRetries-1)) the following numbers represent the times
// a machineconfig pool is going to be requeued:
//
// 5ms, 10ms, 20ms, 40ms, 80ms, 160ms, 320ms, 640ms, 1.3s, 2.6s, 5.1s, 10.2s, 20.4s, 41s, 82s
maxRetries = 15
// 18 allows for retries up to about 10 minutes to allow for slower machines to catchup.
maxRetries = 18
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the commit message such that you mention why this is being bumped

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kubelet config is dependent on the controller config here: https://github.com/harche/machine-config-operator/blob/f28b2c2a4102228af4b6754522e13384711969e7/pkg/controller/kubelet-config/kubelet_config_controller.go#L321-L341

Ideally, initial rollout would be nice, but that isn't how the feature works today. Changes to kubelet configurations require a reboot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah my bad. I missed that connection. I think this code does make sense to make the wait explicit. that said, I'm not sure if this will achieve the desired effect of waiting for the initial configuration of the pool to settle (the linked bugzilla), since

return ctrl.syncStatusOnly(cfg, err, "could not get kubelet config key: %v", err)

was blocking it before now anyways. The race between the KCC generating MCs from kubeletconfigs and the in-cluster MCC generating the initial node configuration still exists I think.

If we want to go this route, maybe

we should instead check if the pool has a rendered config set to the pool's spec. Then the kubeletconfig should always get generated after an initial rendered configs gets generated (we want that, since it won't contain kubeletconfigs like bootstrap), and thus bypass the "not found" issue.

Alternatively, I still think it might be a cleaner solution just to add a bootstrap reader mode to the kubeletconfigcontoller and have the bootstrap MCC call it (achieving day 1), which if we want to do in the future, we'd have to revert our proposed fix here,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, we need to update the code to run at bootstrap. Perhaps @QiWang19 can look into updating the KubeletConfig as a second pass. I would like to unblock Omer.

The node_controller does something similar to see if the pool is ready...
https://github.com/harche/machine-config-operator/blob/5527f842eda1fd51538590a592d3c95a3cd679d9/pkg/controller/node/node_controller.go#L720-L727

)

var (
Expand Down Expand Up @@ -394,6 +394,11 @@ func (ctrl *Controller) syncKubeletConfig(key string) error {
glog.V(4).Infof("Finished syncing kubeletconfig %q (%v)", key, time.Since(startTime))
}()

// Wait to apply a kubelet config if the controller config is not completed
if err := mcfgv1.IsControllerConfigCompleted(ctrlcommon.ControllerConfigName, ctrl.ccLister.Get); err != nil {
return err
}

_, name, err := cache.SplitMetaNamespaceKey(key)
if err != nil {
return err
Expand Down Expand Up @@ -458,6 +463,14 @@ func (ctrl *Controller) syncKubeletConfig(key string) error {
}

for _, pool := range mcpPools {
if pool.Spec.Configuration.Name == "" {
updateDelay := 5 * time.Second
// Previously we spammed the logs about empty pools.
// Let's just pause for a bit here to let the renderer
// initialize them.
time.Sleep(updateDelay)
return fmt.Errorf("Pool %s is unconfigured, pausing %v for renderer to initialize", pool.Name, updateDelay)
}
role := pool.Name
// Get MachineConfig
managedKey, err := getManagedKubeletConfigKey(pool, ctrl.client, cfg)
Expand Down
20 changes: 20 additions & 0 deletions pkg/controller/kubelet-config/kubelet_config_controller_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ import (
osev1 "github.com/openshift/api/config/v1"
oseconfigfake "github.com/openshift/client-go/config/clientset/versioned/fake"
oseinformersv1 "github.com/openshift/client-go/config/informers/externalversions"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/api/equality"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
Expand Down Expand Up @@ -111,6 +112,25 @@ func newControllerConfig(name string, platform osev1.PlatformType) *mcfgv1.Contr
},
},
},
Status: mcfgv1.ControllerConfigStatus{
Conditions: []mcfgv1.ControllerConfigStatusCondition{
{
Type: mcfgv1.TemplateControllerCompleted,
Status: corev1.ConditionTrue,
Message: "",
},
{
Type: mcfgv1.TemplateControllerRunning,
Status: corev1.ConditionFalse,
Message: "",
},
{
Type: mcfgv1.TemplateControllerFailing,
Status: corev1.ConditionFalse,
Message: "",
},
},
},
}
return cc
}
Expand Down