Skip to content

MCO-821: Introduce MachineConfigNode types#1596

Merged
openshift-ci[bot] merged 1 commit intoopenshift:masterfrom
cdoern:machine-state
Nov 1, 2023
Merged

MCO-821: Introduce MachineConfigNode types#1596
openshift-ci[bot] merged 1 commit intoopenshift:masterfrom
cdoern:machine-state

Conversation

@cdoern
Copy link
Copy Markdown
Contributor

@cdoern cdoern commented Sep 21, 2023

For Openshift 4.15, the MCO is working on a new state reporting mechanism: The MachineConfigStateController. The main type behind this architecture is the MachineConfigNode. A MachineConfigNode tracks an upgrade progression that relates behaviors between a node, its machineconfigpool, and the MCD processes within the MCO.

Each of these will have its own CR in the MCO which necessitates a CRD that I have also included in this PR.

where doing oc describe machineconfignode/<name> will of course display more in depth information of the type metav1.Condition. These conditions are composed of a state, phase, reason, and time. The state refers to what part of the upgrade we are in (UpdatePreparing, Completing etc), Phase is a in depth description of what part of a State we are in, and the Reason is a quick one or two word statement describing what "Stopping Point" we are at: ReconcilingConfigs, StoppingCfgDrift. The following accepted Condition types are:

	// MachineConfigNodeUpdatePrepared describes a machine that is preparing in the daemon to trigger an update
	MachineConfigNodeUpdatePrepared StateProgress = "UpdatePrepared"
	// MachineConfigNodeUpdateExecuted describes a machine that has executed the body of the upgrade
	MachineConfigNodeUpdateExecuted StateProgress = "UpdateExecuted"
	// MachineConfigNodeUpdatePostActionComplete describes a machine that has executed its post update action
	MachineConfigNodeUpdatePostActionComplete StateProgress = "UpdatePostActionComplete"
	// MachineConfigNodeUpdateComplete describes a machine that has completed the core parts of an upgrade.
	MachineConfigNodeUpdateComplete StateProgress = "UpdateComplete"
	// MachineConfigNodeUpdated describes a machine that has a matching desired and current config after executing an update
	MachineConfigNodeUpdated StateProgress = "Updated"
	// MachineConfigNodeUpdateResumed describes a machine that has resumed normal processes
	MachineConfigNodeResumed StateProgress = "Resumed"
	// MachineConfigNodeUpdateCompatible the part of the preparing phase where the mco decides whether it can update
	MachineConfigNodeUpdateCompatible StateProgress = "UpdateCompatible"
	// MachineConfigNodeUpdateDrained describes the part of the inprogress phase where the node drains
	MachineConfigNodeUpdateDrained StateProgress = "Drained"
	// MachineConfigNodeUpdateFilesAndOS describes the part of the inprogress phase where the nodes file and OS config change
	MachineConfigNodeUpdateFilesAndOS StateProgress = "AppliedFilesAndOS"
	// MachineConfigNodeUpdateCordoned describes the part of the completing phase where the node cordons
	MachineConfigNodeUpdateCordoned StateProgress = "Cordoned"
	// MachineConfigNodeUpdateRebooted describes the part of the post action phase where the node reboots itself
	MachineConfigNodeUpdateRebooted StateProgress = "RebootedNode"
	// MachineConfigNodeUpdateReloaded describes the part of the post action phase where the node reloads its CRIO service
	MachineConfigNodeUpdateReloaded StateProgress = "ReloadedCRIO"

The standard oc get machineconfignode/... will display only Updated UpdatePreparing UpdateInProgress UpdatePostAction UpdateCompleting and Resuming. The rest of these granular states will be shown via -o wide or oc describe

An example of oc get machineconfignodes would look like:

mcnodes

The other information accessible via oc describe will be:

  1. the last instance of each Condition type
  2. the generation of this MachineConfigNode
  3. the pool that this node is tied to
  4. the desired and current machineconfigs for the node

this is an example of the above oc describe command that is not fully operational yet.

mcnodedescribe

As well as this information located at the bottom of oc describe

extra_info

This feature is linked to https://issues.redhat.com/browse/MCO-452 and more detail can be found there.

These types are basically an attempt to aggregate and display the state of a cluster's machines in a user friendly or at least digestible way. Each Condition is also associated closely with a StateProgress. All of these StateProgresses are phases that I have broken down key MCO operations into.

The structure and types of this PR are being used in: openshift/machine-config-operator#3970 which is working quite well.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Sep 21, 2023

Hello @cdoern! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci Bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Sep 21, 2023
@openshift-ci openshift-ci Bot requested review from JoelSpeed and deads2k September 21, 2023 20:31
@cdoern cdoern force-pushed the machine-state branch 11 times, most recently from 000e721 to d5de927 Compare September 25, 2023 20:25
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go
Comment thread machineconfiguration/v1/types.go Outdated

// MachineState describes the health of the Machines on the system
//
// Compatibility level 1: Stable within a major release for a minimum of 12 months or 3 minor releases (whichever is longer).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

born into compat-level 1 is very confident. Are we already that confident that we are getting these types right, and won't need tough follow-up changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what the protocol is here. I was under the impression that anything user facing should be level 1?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to use a feature gate for this, which I think we should try, then you want to make this a v1alpha1 API first, at compatibility level 4

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am planning on featuregating, do you think I need to introduce that in this PR?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do the feature gate as a separate PR, that's a pretty easy thing to add so can probably get that in and merged before the API

Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
Comment thread machineconfiguration/v1/types.go Outdated
@petr-muller
Copy link
Copy Markdown
Member

/cc

@cdoern cdoern force-pushed the machine-state branch 3 times, most recently from 9785364 to fdc7328 Compare October 24, 2023 14:11
Comment thread machineconfiguration/v1alpha1/Makefile Outdated
Comment thread machineconfiguration/v1alpha1/techpreview.machineconfignode.testsuite.yaml Outdated
Comment thread machineconfiguration/v1alpha1/techpreview.machineconfignode.testsuite.yaml Outdated
Comment thread machineconfiguration/v1alpha1/techpreview.machineconfignode.testsuite.yaml Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
@cdoern cdoern force-pushed the machine-state branch 3 times, most recently from 63502ed to 85ca700 Compare October 26, 2023 15:54
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 30, 2023

@cdoern: This pull request references MCO-821 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.15.0" version, but no target version was set.

Details

In response to this:

For Openshift 4.15, the MCO is working on a new state reporting mechanism: The MachineConfigStateController. The main type behind this architecture is the MachineConfigNode. A MachineConfigNode tracks an upgrade progression that relates behaviors between a node, its machineconfigpool, and the MCD processes within the MCO.

Each of these will have its own CR in the MCO which necessitates a CRD that I have also included in this PR.

where doing oc describe machineconfignode/<name> will of course display more in depth information of the type metav1.Condition. These conditions are composed of a state, phase, reason, and time. The state refers to what part of the upgrade we are in (UpdatePreparing, Completing etc), Phase is a in depth description of what part of a State we are in, and the Reason is a quick one or two word statement describing what "Stopping Point" we are at: ReconcilingConfigs, StoppingCfgDrift. The following accepted Condition types are:

  // MachineConfigNodeUpdatePrepared describes a machine that is preparing in the daemon to trigger an update
  MachineConfigNodeUpdatePrepared StateProgress = "UpdatePrepared"
  // MachineConfigNodeUpdateExecuted describes a machine that has executed the body of the upgrade
  MachineConfigNodeUpdateExecuted StateProgress = "UpdateExecuted"
  // MachineConfigNodeUpdatePostActionComplete describes a machine that has executed its post update action
  MachineConfigNodeUpdatePostActionComplete StateProgress = "UpdatePostActionComplete"
  // MachineConfigNodeUpdateComplete describes a machine that has completed the core parts of an upgrade.
  MachineConfigNodeUpdateComplete StateProgress = "UpdateComplete"
  // MachineConfigNodeUpdated describes a machine that has a matching desired and current config after executing an update
  MachineConfigNodeUpdated StateProgress = "Updated"
  // MachineConfigNodeUpdateResumed describes a machine that has resumed normal processes
  MachineConfigNodeResumed StateProgress = "Resumed"
  // MachineConfigNodeUpdateCompatible the part of the preparing phase where the mco decides whether it can update
  MachineConfigNodeUpdateCompatible StateProgress = "UpdateCompatible"
  // MachineConfigNodeUpdateDrained describes the part of the inprogress phase where the node drains
  MachineConfigNodeUpdateDrained StateProgress = "Drained"
  // MachineConfigNodeUpdateFilesAndOS describes the part of the inprogress phase where the nodes file and OS config change
  MachineConfigNodeUpdateFilesAndOS StateProgress = "AppliedFilesAndOS"
  // MachineConfigNodeUpdateCordoned describes the part of the completing phase where the node cordons
  MachineConfigNodeUpdateCordoned StateProgress = "Cordoned"
  // MachineConfigNodeUpdateRebooted describes the part of the post action phase where the node reboots itself
  MachineConfigNodeUpdateRebooted StateProgress = "RebootedNode"
  // MachineConfigNodeUpdateReloaded describes the part of the post action phase where the node reloads its CRIO service
  MachineConfigNodeUpdateReloaded StateProgress = "ReloadedCRIO"

The standard oc get machineconfignode/... will display only Updated UpdatePreparing UpdateInProgress UpdatePostAction UpdateCompleting and Resuming. The rest of these granular states will be shown via -o wide or oc describe

An example of oc get machineconfignodes would look like:

mcnodes

The other information accessible via oc describe will be:

  1. the last instance of each Condition type
  2. the generation of this MachineConfigNode
  3. the pool that this node is tied to
  4. the desired and current machineconfigs for the node

this is an example of the above oc describe command that is not fully operational yet.

mcnodedescribe

As well as this information located at the bottom of oc describe

extra_info

This feature is linked to https://issues.redhat.com/browse/MCO-452 and more detail can be found there.

These types are basically an attempt to aggregate and display the state of a cluster's machines in a user friendly or at least digestible way. Each Condition is also associated closely with a StateProgress. All of these StateProgresses are phases that I have broken down key MCO operations into.

The structure and types of this PR are being used in: openshift/machine-config-operator#3970 which is working quite well.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cdoern
Copy link
Copy Markdown
Contributor Author

cdoern commented Oct 30, 2023

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Oct 30, 2023

@cdoern: This pull request references MCO-821 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
@JoelSpeed
Copy link
Copy Markdown
Contributor

/lgtm
/hold

I would like some acknowledgement from others on the EP that they are happy for the API to merge even though the EP still has some details to iron out, have posted a comment over on the EP openshift/enhancements#1490 (comment)

@cdoern
Copy link
Copy Markdown
Contributor Author

cdoern commented Oct 31, 2023

/test e2e-upgrade

Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go
@JoelSpeed
Copy link
Copy Markdown
Contributor

/approve cancel

I think the description of the spec and status desired versions are the wrong way around, can we check

Comment thread machineconfiguration/v1alpha1/types_machineconfignode.go Outdated
Signed-off-by: Charlie Doern <cdoern@redhat.com>
@JoelSpeed
Copy link
Copy Markdown
Contributor

/lgtm

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 31, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern, JoelSpeed, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JoelSpeed
Copy link
Copy Markdown
Contributor

/test all

@JoelSpeed
Copy link
Copy Markdown
Contributor

/hold cancel

We have enough consensus to get this in, and iterate with what's left of 4.15

@cdoern
Copy link
Copy Markdown
Contributor Author

cdoern commented Oct 31, 2023

/test integration

@cdoern
Copy link
Copy Markdown
Contributor Author

cdoern commented Nov 1, 2023

/test e2e-aws-serial-techpreview

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Nov 1, 2023

@cdoern: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants