Skip to content

e2e/kdump_test: Add kdump e2e test using mco#3186

Closed
gursewak1997 wants to merge 1 commit intoopenshift:masterfrom
gursewak1997:kdump-e2e-test
Closed

e2e/kdump_test: Add kdump e2e test using mco#3186
gursewak1997 wants to merge 1 commit intoopenshift:masterfrom
gursewak1997:kdump-e2e-test

Conversation

@gursewak1997
Copy link
Copy Markdown

Add e2e test for OCP CI that validates enabling kdump and generating kernel core via machine config successfully. This is also one of the step to take kdump feature out of tech preview.

- What I did
Added e2e test to test kdump feature
- How to verify it
Run the e2e test locally or verify it on CI job.
- Description for the changelog
Add e2e test for kdump

Add e2e test for OCP CI that validates enabling kdump and
generating kernel core via machine config successfully
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 14, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gursewak1997
To complete the pull request process, please assign yuqi-zhang after the PR has been reviewed.
You can assign the PR to them by writing /assign @yuqi-zhang in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment thread test/e2e/kdump_test.go
}
t.Logf("Node %s has expected craskkernel karg", infraNode.Name)

helpers.ExecCmdOnNode(t, cs, infraNode, "/bin/sh", "-c", string("chroot /rootfs systemctl reboot"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we manually doing node reboot instead of using a MachineConfig? Is this how user is supposed to be enabling kdump?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread test/e2e/kdump_test.go

helpers.ExecCmdOnNode(t, cs, infraNode, "/bin/sh", "-c", string("chroot /rootfs systemctl reboot"))
// Waiting for the node to come back up after reboot
time.Sleep(time.Minute)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the reliable way to know that node is up and accessible via cluster. Would be good to check for node annotation for Ready state.

@kikisdeliveryservice
Copy link
Copy Markdown
Contributor

@gursewak1997 Can you link to something that explains what kdump is?

@gursewak1997
Copy link
Copy Markdown
Author

@gursewak1997 Can you link to something that explains what kdump is?

Sure, this is the latest kdump doc: https://docs.openshift.com/container-platform/4.10/support/troubleshooting/troubleshooting-operating-system-issues.html
Another article which is a bit older but was helpful for me to understand kdump was this.

@cgwalters
Copy link
Copy Markdown
Member

So we debated places for testing over here recently: openshift/os#746

This one in particular is something that's quite unlikely to be broken by any changes to the MCO. The bits of the MCO that are involved here boils down to passing kernel arguments, which is already covered by other tests.

The thing most likely to break kdump is changes to RHCOS...but we already have a test for that right?

@gursewak1997
Copy link
Copy Markdown
Author

The thing most likely to break kdump is changes to RHCOS...but we already have a test for that right?

True, we do have kernel args test already in this repo and we do test kdump wrt changes in RHCOS in fedora-coreos-config. So overall, it should be good.

@sinnykumari
Copy link
Copy Markdown
Contributor

The thing most likely to break kdump is changes to RHCOS...but we already have a test for that right?

True, we do have kernel args test already in this repo and we do test kdump wrt changes in RHCOS in fedora-coreos-config. So overall, it should be good.

Seems to me then we don't need to add this in our e2e test, right? If kdump is not going to break with MCO PRs and doesn't add additional value, it would be ideal to not add this test as it will use ci resources un-necessary.

@travier
Copy link
Copy Markdown
Member

travier commented Jun 17, 2022

We have decided that this is not strictly required to get kdump out of Tech Preview (https://issues.redhat.com/browse/COS-158).

I think that testing kdump support in the MCO will make sense once we have direct support for it in MachineConfigs: https://issues.redhat.com/browse/MCO-42

@kikisdeliveryservice kikisdeliveryservice added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 17, 2022
@cgwalters
Copy link
Copy Markdown
Member

To reiterate, I think we should push for having more OS tests in openshift/origin: openshift/os#746 (comment)

In fact, we could probably take some of the MCO tests here and put them in origin...for example TestExtensions.

One thing related to this is that openshift/origin already has code which does machineset scaling tests; and we could use that to transiently spin up a new worker node for some of this OS-targeted testing.

@cgwalters
Copy link
Copy Markdown
Member

openshift/origin@dc64c6f is another example of a test case there

@sinnykumari
Copy link
Copy Markdown
Contributor

To reiterate, I think we should push for having more OS tests in openshift/origin: openshift/os#746 (comment)

In fact, we could probably take some of the MCO tests here and put them in origin...for example TestExtensions.

+1

One thing related to this is that openshift/origin already has code which does machineset scaling tests; and we could use that to transiently spin up a new worker node for some of this OS-targeted testing.

Nice, MCO could definitely make use of these tests for Day1 workflow.

@openshift-bot
Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 26, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Oct 6, 2022

@gursewak1997: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-upgrade 961d11a link true /test e2e-agnostic-upgrade
ci/prow/e2e-gcp-op 961d11a link true /test e2e-gcp-op
ci/prow/okd-scos-images 961d11a link true /test okd-scos-images

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-bot
Copy link
Copy Markdown
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 6, 2022
@openshift-bot
Copy link
Copy Markdown
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci Bot closed this Dec 6, 2022
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Dec 6, 2022

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants