refactor how tests are run #27516

deads2k · 2022-11-02T20:03:09Z

This eliminates flexibility in how tests are started, inlines anonyous functions, and attempts to build a pipeline of data-in to data-out.

It got pretty big and I haven't run it locally yet. need to prove the correct number of tests run and that certain tests are run serially.

TBH, I don't know if it's easier to review as a diff or easier to review as new code for running tests.

deads2k · 2022-11-03T12:07:21Z

The test numbers aren't matching up exactly, but they are close. I'll have to open a no-op PR on the same base here and write something to compare which test are missing. But I think we're ready for a review.

deads2k · 2022-11-03T12:11:22Z

/test all

deads2k · 2022-11-03T12:11:39Z

/hold

until we verify against #27519

stbenjam · 2022-11-03T12:12:54Z

pkg/test/ginkgo/cmd_runsuite.go

-	// Run kube, storage, openshift, and must-gather tests. If user specified a count of -1,
+	// RunTestInNewProcess kube, storage, openshift, and must-gather tests. If user specified a count of -1,


Should this have been renamed?

Should this have been renamed?

No, I should fix.

stbenjam · 2022-11-03T12:31:42Z

pkg/test/ginkgo/queue.go

+	// could be running at the same time. While these are technically [Serial], ginkgo
+	// parallel mode provides this guarantee. Doing this for all suites would be too


I understand we're using this to mark some vendored tests from certain packages to be always serial, but
I don't get this comment:

While these are technically [Serial], ginkgo parallel mode provides this guarantee.

What guarantee?

I don't get this comment, it's very confusing.
While these are technically [Serial], ginkgo parallel mode provides this guarantee.
What guarantee?

TBH, I don't actually know. It came from clayton and I preserved it.

Continuing this thread, according to maciej, we don't even need this block because it has been broken for four releases.

stbenjam · 2022-11-03T12:33:01Z

/retest-required

CI jobs didn't get scheduled

deads2k · 2022-11-03T13:02:15Z

/retest

stbenjam · 2022-11-03T12:54:38Z

pkg/test/ginkgo/cmd_runsuite.go

+	testOutputLock := &sync.Mutex{}
+	testOutputConfig := newTestOutputConfig(testOutputLock, opt.Out, monitorEventRecorder, includeSuccess)


Now that there's a real lock here, do we know yet how much if any this slows things down?

Now that there's a real lock here, do we know yet how much if any this slows things down?

looks like at most 10 minutes on parallel runs. Most runs appear to be about the same.

deads2k · 2022-11-03T17:47:23Z

the counts are checking out ok to me.

/hold cancel

deads2k · 2022-11-03T18:24:59Z

The following tests are no longer run:

: [bz-[sig-apps][Feature:OpenShiftControllerManager]] clusteroperator/[sig-apps][Feature:OpenShiftControllerManager] should not change condition/Available
: [bz-[sig-apps][Feature:OpenShiftControllerManager]] clusteroperator/[sig-apps][Feature:OpenShiftControllerManager] should not change condition/Degraded
: [bz-[sig-network][Feature:EgressFirewall]] clusteroperator/[sig-network][Feature:EgressFirewall] should not change condition/Available
: [bz-[sig-network][Feature:EgressFirewall]] clusteroperator/[sig-network][Feature:EgressFirewall] should not change condition/Degraded
: [bz-[sig-network][Feature:Network] clusteroperator/[sig-network][Feature:Network should not change condition/Available
: [bz-[sig-network][Feature:Network] clusteroperator/[sig-network][Feature:Network should not change condition/Degraded
: [bz-[sig-scheduling][Early]] clusteroperator/[sig-scheduling][Early] should not change condition/Available
: [bz-[sig-scheduling][Early]] clusteroperator/[sig-scheduling][Early] should not change condition/Degraded

kikisdeliveryservice · 2022-11-03T20:39:57Z

pkg/test/ginkgo/cmd_runsuite.go

 	}

+	timeout := opt.Timeout
+	if timeout == 0 {


Can someone double check my understanding of this double timeout == 0 check?
It seems like the logic is essentially:
if opt.Timeout == 0:
| if suite.TestTimeout == 0:
|| timeout = 15*time.Minute
| else: timeout = suite.TestTimeout
else:
| timeout = opt.Timeout

I think so.

We're picking the first non-zero value from this:

opt.Timeout

suite.TestTimeout

15 minutes

DennisPeriquet · 2022-11-04T12:38:39Z

pkg/test/ginkgo/queue.go

-		r = r.Next()
-	}
-	q.queue = r
+	remainingParallelTests := make(chan *testCase, 100)


My assumption is 100 here is the max number of tests that will run in parallel -- do I have that right?

You could run any number of tests in parallel. This is replacing the ring list it used earlier. The channel holds up to 100 test cases but it constantly gets refed by the go routine below (queueAllTests). Then if our parallelism is say, 30, it launches 30 go routines in the for loop below that each consume test cases from channel.

ah, so I thought the for-loop below was launching n (n=parallelism) parallel tests at a time but actually it's launching n go funcs which all pull 1 test at a time from the channel which results in running up to n parallel tests at a time.

stbenjam · 2022-11-04T13:10:14Z

/lgtm
/retest-required

openshift-ci · 2022-11-04T13:10:45Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: deads2k, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/OWNERS~~ [deads2k,stbenjam]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2022-11-04T17:10:06Z

@deads2k: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-gcp-ovn-rt-upgrade	`6482d6f`	link	false	`/test e2e-gcp-ovn-rt-upgrade`
ci/prow/e2e-metal-ipi-ovn-ipv6	`6482d6f`	link	false	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-aws-ovn-single-node-upgrade	`6482d6f`	link	false	`/test e2e-aws-ovn-single-node-upgrade`
ci/prow/e2e-gcp-csi	`6482d6f`	link	false	`/test e2e-gcp-csi`
ci/prow/e2e-agnostic-ovn-cmd	`6482d6f`	link	false	`/test e2e-agnostic-ovn-cmd`
ci/prow/e2e-openstack-ovn	`6482d6f`	link	false	`/test e2e-openstack-ovn`
ci/prow/e2e-aws-ovn-single-node-serial	`6482d6f`	link	false	`/test e2e-aws-ovn-single-node-serial`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

In PR openshift#27516 we suspect reporting of flakes broke due to a missed assumption that test.flake accompanied test.success. Our new goal is to more clearly have just one status set, so we're going to lean into the new approach and properly break out the flake state into it's own case.

deads2k added 2 commits November 2, 2022 10:38

separate out printing commands from the execution path

b49b84b

refactor the run path for test execution

08fee42

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 2, 2022

openshift-ci bot requested review from csrwng and spadgett November 2, 2022 20:04

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 2, 2022

simplifying the test queue

a4383d6

deads2k force-pushed the refactor-launch-01 branch from b0c07b9 to a4383d6 Compare November 2, 2022 23:19

deads2k changed the title ~~[wip] refactor how tests are run~~ refactor how tests are run Nov 3, 2022

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 3, 2022

deads2k mentioned this pull request Nov 3, 2022

no op #27519

Closed

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2022

stbenjam reviewed Nov 3, 2022

View reviewed changes

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 3, 2022

the paths don't match because of go.mod, so remove this check

6482d6f

kikisdeliveryservice reviewed Nov 3, 2022

View reviewed changes

DennisPeriquet reviewed Nov 4, 2022

View reviewed changes

openshift-ci bot assigned stbenjam Nov 4, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 4, 2022

openshift-merge-robot merged commit 97b36df into openshift:master Nov 4, 2022

This was referenced Nov 15, 2022

Revert "refactor how tests are run" #27552

Closed

OCPBUGS-3633: Fix flake reporting for certain tests. #27553

Merged

		// Run kube, storage, openshift, and must-gather tests. If user specified a count of -1,
		// RunTestInNewProcess kube, storage, openshift, and must-gather tests. If user specified a count of -1,

		// could be running at the same time. While these are technically [Serial], ginkgo
		// parallel mode provides this guarantee. Doing this for all suites would be too

		testOutputLock := &sync.Mutex{}
		testOutputConfig := newTestOutputConfig(testOutputLock, opt.Out, monitorEventRecorder, includeSuccess)

refactor how tests are run #27516

refactor how tests are run #27516

Uh oh!

Conversation

deads2k commented Nov 2, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deads2k commented Nov 3, 2022

Uh oh!

deads2k commented Nov 3, 2022

Uh oh!

deads2k commented Nov 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam commented Nov 3, 2022

Uh oh!

deads2k commented Nov 3, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

deads2k commented Nov 3, 2022

Uh oh!

deads2k commented Nov 3, 2022

Uh oh!

kikisdeliveryservice Nov 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stbenjam commented Nov 4, 2022

Uh oh!

openshift-ci bot commented Nov 4, 2022

Uh oh!

openshift-ci bot commented Nov 4, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

deads2k commented Nov 2, 2022 •

edited

Loading

stbenjam Nov 3, 2022 •

edited

Loading

deads2k Nov 3, 2022 •

edited

Loading

kikisdeliveryservice Nov 3, 2022 •

edited

Loading