Bug 2086728: Improves Config Drift Monitor e2e tests by cheesesashimi · Pull Request #3146 · openshift/machine-config-operator

cheesesashimi · 2022-05-11T20:56:25Z

- What I did

The Config Drift Monitor tests were broken by #3141. The breakage was determined to be an issue with how we determine whether the Config Drift Monitor has started: We were grabbing the logs and indiscriminately searching for the startup text. Instead, we should grab the logs and scan from the bottom up, searching for the startup message absent a similar shutdown message. This PR was tested against master as well as the aforementioned PR.

Additionally, to provide better signal around the Config Drift Monitor, we should check whether the node reboots as a result of the test. Use of the Forcefile should cause a reboot, whereas file content reversion should not.

- How to verify it

Run the attached e2e test suite.

- Description for the changelog
Get better signal from Config Drift Monitor tests

yuqi-zhang

Generally looks good, some comments inline

yuqi-zhang · 2022-05-11T21:01:31Z

I think this is problematic if the logs if empty. We can probably work around it with i<=0 though

Good catch, thanks!

yuqi-zhang · 2022-05-11T21:03:33Z

We do something similar in mcd_test and sno_mcd_test. Doesn't have to be part of this PR, but we should maybe share the implementation

100% agreed. I wrote these helpers with that in mind. Once this lands, I'll open a separate PR to fix those.

cheesesashimi · 2022-05-11T21:32:36Z

Looks like an infra issue occurred.

/test e2e-gcp-op

cheesesashimi · 2022-05-11T21:39:19Z

This should also run against an SNO installation.

/test e2e-gcp-op-single-node

yuqi-zhang

More infra issues

/test e2e-gcp-op

Also soft approval pending some test data

kikisdeliveryservice · 2022-05-11T23:31:20Z

There's a gcp issue, no need to retest that job until resolved.

kikisdeliveryservice · 2022-05-11T23:32:43Z

Also this PR sounds like a fix so we'll get it in regardless ;)

kikisdeliveryservice · 2022-05-11T23:45:03Z

+		splitLogs := strings.Split(string(logs), "\n")
+
+		// Scan the logs from the bottom up, looking for either a shutdown or startup message.
+		for i := len(splitLogs) - 1; i >= 0; i-- {


(I might be missing context here..)

Q: is there no other way to check if configdrifmonitor is running other than checking logs? Can we not use IsRunning() since that's the canonical way to test if it's up what we use in other tests and in the daemon itself.

This is dependent on the logs/other parts of the cluster working correctly as opposed to directly checking the thing we're interested in.

Hmm, if I understand correctly, this is from the perspective of the main testing pod which spun up a cluster, and is now interacting with it to e2e test. We're in the perspective of a user and not the actual MCD pod itself.

Or do you mean that we can debug into the MCD pod and check the running processes to see if the config drift monitor is running?

@yuqi-zhang re: perspective, you're right. 👍

second point is more what I'm generally asking - is there any way to actually just check if the configdriftmonitor is running other than checking for a pod log that it started? Reading the func comments it seems that this may run after mcd is done so we can't rely on mcd state but overall wondering if there is a more direct vs indirect way to check if it's running.

Presently, there isn't a more direct way. That said, I wish we had a more direct way of detecting that such as with a node annotation or via a /livez endpoint.

ah ok, that settles that for now then. thanks @cheesesashimi

kikisdeliveryservice · 2022-05-12T18:25:45Z

retesting to see if gcp is cooperating

/e2e-gcp-op

cheesesashimi · 2022-05-12T18:46:02Z

I think they need the /test in front of them:

/test e2e-gcp-op
/test e2e-gcp-op-single-node

cheesesashimi · 2022-05-12T20:06:26Z

Looks like infra is still unhappy.

kikisdeliveryservice · 2022-05-12T20:13:56Z

I think they need the /test in front of them:

/test e2e-gcp-op /test e2e-gcp-op-single-node

yeah that was my typo 😅

kikisdeliveryservice · 2022-05-12T20:16:46Z

still waiting for gcp issues to resolve.. i think we need openshift/installer#5898 to merge. =/

kikisdeliveryservice · 2022-05-12T22:01:24Z

gcp is still having issues, but since this is a fix the deadline isn't relevant.

sinnykumari · 2022-05-13T11:48:50Z

Since gcp-op test is green now, retrying
/test e2e-gcp-op

cgwalters · 2022-05-13T16:03:12Z

/retest
/lgtm

openshift-ci · 2022-05-13T16:08:38Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cgwalters, cheesesashimi, yuqi-zhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [cgwalters,yuqi-zhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kikisdeliveryservice · 2022-05-13T17:26:57Z

@cheesesashimi can you link this to the related bug so it can merge?

openshift-bot · 2022-05-13T22:35:00Z

/retest-required