[TEST] Spliting the python tests to test_reduction_single by jamesETsmith · Pull Request #20 · ROCm/quadrants

jamesETsmith · 2026-04-28T00:15:26Z

Summary

This PR splits the python tests into two calls. All the same tests are still run. We now run the test_reduction_single* tests separately bc they timeout when other workers are using the gpu too. Splitting the tests up reducing the test time from 50 to 8 min.

the savings come from "avoid timeouts" rather than from "more parallelism."

…hen other workers are using the gpu

yaoliu13

LGTM

gpinkert · 2026-04-28T06:09:20Z

+# The test_reduction_single* tests put the GPU under a lot of stress and can run
+# out of time during the test run with lots of GPU workers. We should run them separately.
+python tests/run_tests.py -v -r 3 -a amdgpu -t 16 -k "not test_reduction_single"
+python tests/run_tests.py -v -r 3 -a amdgpu -t 16 -k "test_reduction_single"


I am a little confused on how this helps. The test_reduction_single are still in contention with the other tests since worksteal initially distributes at the test item level. Do we just get lucky that splitting the tests pulls the test_reduction_single tests back from a timeout cliff?

As of now, 2 tests keep failing. Pre-submit pipelines showed that this PR fixed this issue.

There are only 10 reduction tests. We were running the tests with 16 threads so even if the all 10 test reduction tests were running, there were still an additional 6 tests which were hammering the gpu. I didn't dig in to find out exactly which tests didn't play nice with the test_reduction_single, but my guess is that it was more than a couple tests.

lohiaj · 2026-04-28T06:19:34Z

hey @jamesETsmith, was the 50 to 8 min purely from removing the two timeouts * -r 3 retries? If so, that's totally fine, but it's worth saying so explicitly in the PR description so future readers (or someone tempted to revert this) understand the savings come from "avoid timeouts" rather than from "more parallelism."

lohiaj

Reviewed offline with @jamesETsmith. Diff is +4/-1 in 4_test.sh: same suite, just split into two pytest invocations via -k so the heavy test_reduction_single_* tests don't contend with the other 15 GPU workers. Coverage preserved (union of -k 'not X' and -k 'X' = full suite), and run_tests.py already treats pytest exit code 5 as success so an empty -k match is safe. Trusting Yao's pre-submit validation. Merging.

Running the test_reduction_single* tests separately bc they timeout w…

382c022

…hen other workers are using the gpu

jamesETsmith self-assigned this Apr 28, 2026

jamesETsmith added the enhancement New feature or request label Apr 28, 2026

jamesETsmith requested a review from yaoliu13 April 28, 2026 00:18

yaoliu13 approved these changes Apr 28, 2026

View reviewed changes

gpinkert reviewed Apr 28, 2026

View reviewed changes

lohiaj approved these changes Apr 28, 2026

View reviewed changes

lohiaj merged commit 2a990fd into amd-integration Apr 28, 2026
33 of 42 checks passed

jamesETsmith deleted the test/jets/split_reduction_tests branch April 28, 2026 13:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TEST] Spliting the python tests to test_reduction_single#20

[TEST] Spliting the python tests to test_reduction_single#20
lohiaj merged 1 commit intoamd-integrationfrom
test/jets/split_reduction_tests

jamesETsmith commented Apr 28, 2026 •

edited by yaoliu13

Loading

Uh oh!

yaoliu13 left a comment

Uh oh!

gpinkert Apr 28, 2026

Uh oh!

yaoliu13 Apr 28, 2026

Uh oh!

jamesETsmith Apr 28, 2026

Uh oh!

lohiaj commented Apr 28, 2026

Uh oh!

lohiaj left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jamesETsmith commented Apr 28, 2026 • edited by yaoliu13 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

yaoliu13 left a comment

Choose a reason for hiding this comment

Uh oh!

gpinkert Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

yaoliu13 Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

jamesETsmith Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

lohiaj commented Apr 28, 2026

Uh oh!

lohiaj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jamesETsmith commented Apr 28, 2026 •

edited by yaoliu13

Loading