Skip to content

OCPNODE-1515 : Support Evented PLEG feature in Openshift#1458

Closed
sairameshv wants to merge 2 commits intoopenshift:masterfrom
sairameshv:evented_pleg
Closed

OCPNODE-1515 : Support Evented PLEG feature in Openshift#1458
sairameshv wants to merge 2 commits intoopenshift:masterfrom
sairameshv:evented_pleg

Conversation

@sairameshv
Copy link
Copy Markdown
Member

@sairameshv sairameshv commented May 10, 2023

  1. CRI-O sends the container events to the Kubelet so that the pod cache can be updated based on the received events.
    More about the Evented PLEG is here - KEP Reference
  2. This feature can be enabled in OCP by adding a new field in the node config custom resource
    that can be monitored by the MCO and update both the Kubelet and CRI-O configurations
    Enhancement PR: OCPNODE-1525: Support Evented PLEG in Openshift enhancements#1368

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 10, 2023
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 10, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented May 10, 2023

Hello @sairameshv! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci Bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 10, 2023
@sairameshv sairameshv changed the title Support Evented PLEG feature in Openshift OCPNODE-1515 : Support Evented PLEG feature in Openshift May 10, 2023
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 22, 2023
Comment thread config/v1/0000_10_config-operator_01_node.crd.yaml Outdated
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 24, 2023
@sairameshv sairameshv marked this pull request as ready for review May 24, 2023 16:11
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 24, 2023
@openshift-ci openshift-ci Bot requested review from bparees and sjenning May 24, 2023 16:12
@sairameshv
Copy link
Copy Markdown
Member Author

/test verify

Comment thread config/v1/types_node.go Outdated
@sairameshv sairameshv marked this pull request as draft May 25, 2023 15:55
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 25, 2023
@openshift-ci openshift-ci Bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels May 26, 2023
@sairameshv sairameshv marked this pull request as ready for review May 26, 2023 11:08
@openshift-ci openshift-ci Bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 26, 2023
@openshift-ci openshift-ci Bot requested a review from JoelSpeed May 26, 2023 11:09
@sairameshv sairameshv force-pushed the evented_pleg branch 2 times, most recently from 7661898 to 6c1e4f0 Compare May 29, 2023 12:07
@openshift-ci openshift-ci Bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 29, 2023
Comment thread config/v1/types_node.go Outdated
// +optional
WorkerLatencyProfile WorkerLatencyProfileType `json:"workerLatencyProfile,omitempty"`

// EventedPleg enables event based PLEG between the kubelet and the CRI-O
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to expand this comment to provide more detail.

Questions I would be asking would be:

  • What are the valid values?
  • What happens if i didn't specify any value? Is this behaviour likely to change over time?

Nit, the comment should start with the json tag version of the name not the Go field version
So I'd be expecting something more along the lines of:

Suggested change
// EventedPleg enables event based PLEG between the kubelet and the CRI-O
// eventedPLEG enables event based PLEG between the kubelet and the CRI-O.
// Valid values are `Enabled`, `Disabled` and omitted.
// When omitted, this means no opinion and the platform is left to choose a reasonable default
// which is subject to change over time.
// The current default is `Disabled`.

Also, is it worthing rather than saying CRI-O, spelling that out in human readable prose?

I assume a user will know what PLEG is if they are turning this on?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated as suggested along with a reference to the KEP !!

Comment thread config/v1/types_node.go
DefaultUpdateDefaultReaction WorkerLatencyProfileType = "Default"
)

// +kubebuilder:validation:Enum=Enabled;Disabled;""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using omitempty which I think you are doing because this is a workload API rather than a configuration API, you don't need to have "" in place, when the field is omitted the validation for the enum will not execute

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Removed the "" option and I think nodes.config API is a configuration API as this resource is unique cluster-wide

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a singleton resource or is there one per Node in the cluster? If it's a singleton then yep, it's a configuration API, in which case I would suggest including "" in the enum and dropping omitempty. This improves the discoverability of the API for the user to configure it

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a singleton resource named "cluster" that has some of the node related configs like cgroupMode, workerlatencyprofiles etc.
I agree that removing omitempty improves the discoverability of the API for the user. At the same time, I want to maintain consistency with the other API fields already present which are again optional
WDYT?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the same time, I want to maintain consistency with the other API fields already present which are again optional

We tend to say we don't repeat the sins of the past here. We have conventions that evolve over time so new fields should follow current conventions even when they make the field look inconsistent with older fields.

IMO this should be no omitempty, but allow empty string on the enum please.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with your point, Updated !!

@JoelSpeed
Copy link
Copy Markdown
Contributor

How does this interact with #1471? What's the goal in 4.14?

Looks like 1471 enabled the feature by default in tech preview clusters, so is the intention here that a user will have the choice? It will be on by default but can be disabled by using the fields added here?

@sairameshv
Copy link
Copy Markdown
Member Author

How does this interact with #1471? What's the goal in 4.14?

Looks like 1471 enabled the feature by default in tech preview clusters, so is the intention here that a user will have the choice? It will be on by default but can be disabled by using the fields added here?

#1471 Just provides a way to enable this feature via featuregate. I don't think it enables the EventedPLEG without getting this PR & the related MCO PR merged.
Default behavior of the EventedPLEG is disabled in the cluster.

@JoelSpeed
Copy link
Copy Markdown
Contributor

I don't think it enables the EventedPLEG without getting this PR & the related MCO PR merged.

It does enable it for TechPreviewNoUpgrade clusters. MCO gets the feature gate passed through to kubelet and so it gets enabled by default on any TechPreviewNoUpgrade cluster, guessing that wasn't the intention?

Note, this has been noticed because of debugging issues in openshift/machine-config-operator#3688

Comment thread config/v1/types_node.go
// eventedPleg enables the event based PLEG between the kubelet and CRI-O
// Reference: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3386-kubelet-evented-pleg/README.md
// Valid values are `Enabled`, `Disabled` and ""
// By default, the evented pleg feature is not enabled in the cluster
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We typically have a particular prose we use for this, can you update please

Suggested change
// By default, the evented pleg feature is not enabled in the cluster
// When omitted, this means no opinion and the platform is left to choose a reasonable default, which is subject to change over time.
// The current default value is Disabled.

Comment thread config/v1/types_node.go
WorkerLatencyProfile WorkerLatencyProfileType `json:"workerLatencyProfile,omitempty"`

// eventedPleg enables the event based PLEG between the kubelet and CRI-O
// Reference: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3386-kubelet-evented-pleg/README.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a user friendly explanation we can include? A link to a KEP is quite a lot of context.

Is the intention when a user enables this feature that CRIO and Kubelet are both configured? Is there explicit configuration required for both?

EventedPLEG is an upstream feature gate, what happens when that is enabled by default?

@sairameshv sairameshv marked this pull request as draft January 24, 2024 11:12
@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 24, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 24, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sairameshv
Once this PR has been reviewed and has the lgtm label, please assign mfojtik for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

1. Incase of Evented PLEG, CRI-O sends the container events to the Kubelet so that the pod cache can be updated based on the received events.
KEP Reference: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/3386-kubelet-evented-pleg/README.md
2. This feature can be enabled in OCP by adding a new field in the node config custom resource
that can be monitored by the MCO and update both the required Kubelet and CRI-O configurations
Enhancement PR: openshift/enhancements#1368

Signed-off-by: Sai Ramesh Vanka <svanka@redhat.com>
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2024
@openshift-merge-robot
Copy link
Copy Markdown
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-bot
Copy link
Copy Markdown

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 18, 2024
@openshift-bot
Copy link
Copy Markdown

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 11, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Aug 15, 2024

@sairameshv: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-aws-ovn 8b0fb6b link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-techpreview 8b0fb6b link true /test e2e-aws-ovn-techpreview
ci/prow/e2e-aws-serial-techpreview 8b0fb6b link true /test e2e-aws-serial-techpreview
ci/prow/e2e-aws-serial 8b0fb6b link true /test e2e-aws-serial
ci/prow/e2e-aws-ovn-hypershift 8b0fb6b link true /test e2e-aws-ovn-hypershift
ci/prow/e2e-upgrade-minor 8b0fb6b link true /test e2e-upgrade-minor
ci/prow/e2e-upgrade 8b0fb6b link true /test e2e-upgrade
ci/prow/minor-e2e-upgrade-minor 8b0fb6b link true /test minor-e2e-upgrade-minor
ci/prow/minor-images 8b0fb6b link true /test minor-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci Bot closed this Sep 15, 2024
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Sep 15, 2024

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants