Only run itergration tests on AWS GPU and CPU runners #1538

mikemhenry · 2025-09-24T20:19:18Z

Checklist

Added a news entry
-m MARKEXPR Only run tests matching given mark expression. For example: -m 'mark1 and not mark2'.

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

Fixes #1527

… + intergration tests

mikemhenry · 2025-09-24T20:20:10Z

Testing CPU here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17988568861

mikemhenry · 2025-09-24T20:22:21Z

.github/workflows/ci.yaml

+      - ".github/workflows/cpu-long-tests.yaml"
+      - ".github/workflows/gpu-integration-tests.yaml"


Adding skips here so that when we make some small change to our action that runs on AWS we don't fire off the whole testing matrix.

mikemhenry · 2025-09-24T20:47:49Z

I actually forgot that we don't run the integration tests on the CPU runner -- do we only want to run the slow tests on the CPU runner @IAlibay @atravitz ? Right now we run all the "normal" tests + the slow tests on the CPU aws runner.

… runner

mikemhenry · 2025-09-24T21:14:45Z

GPU testing here https://github.com/OpenFreeEnergy/openfe/actions/runs/17989853748

codecov · 2025-09-24T21:40:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.88%. Comparing base (fe2d11c) to head (a2782fe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1538      +/-   ##
==========================================
- Coverage   95.33%   92.88%   -2.45%     
==========================================
  Files         183      183              
  Lines       15763    15763              
==========================================
- Hits        15028    14642     -386     
- Misses        735     1121     +386

Flag	Coverage Δ
fast-tests	`92.88% <ø> (?)`
slow-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mikemhenry · 2025-09-25T13:58:55Z

GPU test here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17991765097/job/51183190807

Looks like one failed -- should we add some retry logic to the test?

FAILED openfe/tests/protocols/openmm_ahfe/test_ahfe_slow.py::test_openmm_run_engine[CPU] - openmmtools.multistate.utils.SimulationNaNError: Propagating replica 0 at state 0 resulted in a NaN!
The state of the system and integrator before the error were saved in /tmp/pytest-of-root/pytest-0/popen-gw3/test_openmm_run_engine_CPU_3/shared_AbsoluteSolvationSolventUnit-

mikemhenry · 2025-09-25T13:59:26Z

Took 40 minutes:

======= 1 failed, 7 passed, 158 warnings, 3 rerun in 2346.37s (0:39:06) ========

mikemhenry · 2025-10-09T19:45:40Z

@atravitz @IAlibay Ready for review! The goal here was to just run the integration tests and not our other tests when using the GPU and CPU runner.

atravitz · 2025-10-09T22:24:48Z

GPU test here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17991765097/job/51183190807

Looks like one failed -- should we add some retry logic to the test?
FAILED openfe/tests/protocols/openmm_ahfe/test_ahfe_slow.py::test_openmm_run_engine[CPU] - openmmtools.multistate.utils.SimulationNaNError: Propagating replica 0 at state 0 resulted in a NaN!
The state of the system and integrator before the error were saved in /tmp/pytest-of-root/pytest-0/popen-gw3/test_openmm_run_engine_CPU_3/shared_AbsoluteSolvationSolventUnit-

tagging @IAlibay to ask how expected this is.

atravitz · 2025-10-09T22:26:19Z

I actually forgot that we don't run the integration tests on the CPU runner -- do we only want to run the slow tests on the CPU runner @IAlibay @atravitz ? Right now we run all the "normal" tests + the slow tests on the CPU aws runner.

Hmm, I think the CPU runner can just be slow tests - I know @IAlibay specifically wanted this for when he was doing development away from his workstation. I think unit tests locally and then slow tests on the runner should still meet his need?

mikemhenry · 2025-10-10T16:32:02Z

Okay CPU runner just doing the slow tests now, testing that here:
https://github.com/OpenFreeEnergy/openfe/actions/runs/18412521347

IAlibay · 2025-10-10T17:03:05Z

GPU test here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17991765097/job/51183190807
Looks like one failed -- should we add some retry logic to the test?
FAILED openfe/tests/protocols/openmm_ahfe/test_ahfe_slow.py::test_openmm_run_engine[CPU] - openmmtools.multistate.utils.SimulationNaNError: Propagating replica 0 at state 0 resulted in a NaN!
The state of the system and integrator before the error were saved in /tmp/pytest-of-root/pytest-0/popen-gw3/test_openmm_run_engine_CPU_3/shared_AbsoluteSolvationSolventUnit-
tagging @IAlibay to ask how expected this is.

Shouldn't be happening very often, we can throw a retry in there though, wouldn't hurt.

IAlibay · 2025-10-10T17:10:12Z

I actually forgot that we don't run the integration tests on the CPU runner -- do we only want to run the slow tests on the CPU runner @IAlibay @atravitz ? Right now we run all the "normal" tests + the slow tests on the CPU aws runner.

Hmm, I think the CPU runner can just be slow tests - I know @IAlibay specifically wanted this for when he was doing development away from his workstation. I think unit tests locally and then slow tests on the runner should still meet his need?

Here's what my view is on what we want:

AWS CPU runners

Normal tests: yes
Slow tests: yes
Integration tests: ~ no (see below)

AWS GPU runners

Normal tests: if we have to
Slow tests: no
Integration tests: ~ yes (see below)

What's wrong with our integration tests?

The most important thing is that we don't use GPU runners in places we don't need them, i.e. we don't want to use GPU dollars running a bunch of CPU-only tests.

What we would want, is an additional flag that ONLY runs GPU tests, and then we can run any integration tests that need CPU on a specific CPU integration runner.

IAlibay

The main issue I have with this is that we're going to spend lots of expensive GPU time number crunching on the CPU. It may be order a few hundred dollars, but it's going to add up real quick as we add more slow tests.

Ideally what we need to do it decouple integration from slow, or add a special "integration only" flag here:

openfe/openfe/tests/conftest.py

Lines 73 to 82 in 6581c95

    
           def pytest_collection_modifyitems(self, items, config): 
        
               if (config.getoption('--integration') or 
        
                   os.getenv("OFE_INTEGRATION_TESTS", default="false").lower() == 'true'): 
        
                   return 
        
               elif (config.getoption('--runslow') or 
        
                     os.getenv("OFE_SLOW_TESTS", default="false").lower() == 'true'): 
        
                   self._modify_integration(items, config) 
        
               else: 
        
                   self._modify_integration(items, config) 
        
                   self._modify_slow(items, config)

IAlibay · 2025-10-10T17:04:53Z

.github/workflows/cpu-long-tests.yaml

          OFE_INTEGRATION_TESTS: FALSE
        run: |
-          pytest -n logical -vv --durations=10 openfecli/tests/ openfe/tests/
+          pytest -n logical -vv --durations=10 -m slow openfecli/tests/ openfe/tests/


Why do we need -m slow? I would have though the env variable would be enough.

I had thought we only wanted to run the slow tests (and not the normal ones) on the CPU runner, will fix this!

That could be ok too.

.github/workflows/gpu-integration-tests.yaml

mikemhenry · 2025-10-10T18:24:04Z

Okay to expand on what I wrote here: #1527 (comment)

The -m flag

Only run tests matching given mark expression. For example: -m 'mark1 and not mark2'.

Which means -m integration ONLY runs tests with the @pytest.mark.integration (so since normal and slow tests do not have that mark, they do not get run).

This will satisfy what you want for the GPU runners.

For the CPU runners, since we want slow + normal, that is easy to do with flags/args.

What we would want, is an additional flag that ONLY runs GPU tests, and then we can run any integration tests that need CPU on a specific CPU integration runner.

The -m integration flag does exactly that! So we should be good on the GPU runner 😎

IAlibay · 2025-10-10T18:29:07Z

Ah ok thanks for the explanation @mikemhenry , I hadn't caught that.

mikemhenry · 2025-10-10T18:34:40Z

No worries! I could have sworn I explained the -m flag but I didn't, and even in the issue I linked I actually didn't explain it either. I will add a comment to the workflow since I will forget about the flag I'm sure

.github/workflows/gpu-integration-tests.yaml

mikemhenry · 2025-10-16T17:14:22Z

I guess it depends if all integration tests are GPU tests, or if all GPU tests will be integration tests. If we don't think there is anything interesting there, then we can rename integration -> gpu

Thoughts @IAlibay

IAlibay · 2025-10-17T18:55:34Z

I guess it depends if all integration tests are GPU tests, or if all GPU tests will be integration tests. If we don't think there is anything interesting there, then we can rename integration -> gpu

Thoughts @IAlibay

All integration tests are GPU, but not all integration tests are ONLY GPU.

Very roughly, if we can control the OpenMM platform with pytest marks, then that's the answer. We stick on a gpu and cpu mark and then use the name of the active mark to pick the platform (or skip the test).

If you gimme some pseudo code I'm happy to show a demo of what I mean.

IAlibay · 2025-10-17T19:02:28Z

This! https://stackoverflow.com/a/74804492

We just need to create a gpu and cpu mark and then check for the mark's presence and pick from a dictionary, with a prerference for GPU over CPU if both are available.

mikemhenry · 2025-10-17T21:48:26Z

I think the three of us are either kinda talking past each other or I don't fully understand the scope of what we are trying to do. As this PR currently stands:

Only tests marked integration run on the GPU runner
"normal" tests and tests marked slow run on the CPU runner.

If this isn't quite what we are trying to do, let me know!

It sounds like there are some integration tests that we want to run on a CPU, which ones are those (do you have some in mind or is there some heuristic to choosing these)?

IAlibay · 2025-10-17T23:12:15Z

I think the three of us are either kinda talking past each other or I don't fully understand the scope of what we are trying to do. As this PR currently stands:

Only tests marked integration run on the GPU runner

"normal" tests and tests marked slow run on the CPU runner.

If this isn't quite what we are trying to do, let me know!

It sounds like there are some integration tests that we want to run on a CPU, which ones are those (do you have some in mind or is there some heuristic to choosing these)?

I agree, we're going out of scope, the untangling of CPU & GPU tests for the integration tests can be done in a separate PR. @atravitz are you happy with that?

mikemhenry · 2025-10-18T03:04:23Z

I think @atravitz main point was to re-name the integration mark to gpu since it functional (now) marks tests we want to run on the GPU. I think we should keep them labeled as integration and then we can untangle them (maybe integration-gpu and integration-cpu) in another PR.

atravitz · 2025-10-21T15:54:00Z

I think we should keep them labeled as integration and then we can untangle them (maybe integration-gpu and integration-cpu) in another PR.

I agree!

github-actions · 2025-10-27T16:48:56Z

No API break detected ✅

only run the intergration tests on AWS CPU runners (not all the tests…

3fc5b59

… + intergration tests

mikemhenry marked this pull request as draft September 24, 2025 20:19

skip tests when editing only AWS runner actions

76330f2

mikemhenry commented Sep 24, 2025

View reviewed changes

fixes #1537

6945807

keep CPU runner behavior the same, only run intergration tests on GPU…

d42910b

… runner

mikemhenry added 3 commits September 24, 2025 15:03

I just can't spell

654813b

update to newest ami

8bd31e4

See if pinning to an older cuda-version works

8b28f82

mikemhenry marked this pull request as ready for review September 26, 2025 15:51

mikemhenry requested review from IAlibay and atravitz September 26, 2025 15:51

mikemhenry added 2 commits October 1, 2025 07:10

Merge branch 'main' into feat/add_gpu_test_labels

8e0c9b5

Merge branch 'main' into feat/add_gpu_test_labels

db5d847

Just run slow tests on CPU runner

4e5e110

IAlibay requested changes Oct 10, 2025

View reviewed changes

IAlibay approved these changes Oct 10, 2025

View reviewed changes

mikemhenry and others added 4 commits October 10, 2025 11:37

run slow and normal test on CPU runner

033b800

make it more clear what runs with the GPU tests

caffd37

fix flag name

c42da65

Merge branch 'main' into feat/add_gpu_test_labels

56c2e7f

atravitz requested changes Oct 10, 2025

View reviewed changes

.github/workflows/gpu-integration-tests.yaml Show resolved Hide resolved

Merge branch 'main' into feat/add_gpu_test_labels

2be6521

Merge branch 'main' into feat/add_gpu_test_labels

06c51ec

jameseastwood assigned atravitz Oct 21, 2025

Merge branch 'main' into feat/add_gpu_test_labels

8816ac4

atravitz approved these changes Oct 21, 2025

View reviewed changes

mikemhenry and others added 4 commits October 22, 2025 09:47

Merge branch 'main' into feat/add_gpu_test_labels

31da066

Merge branch 'main' into feat/add_gpu_test_labels

628ba7f

Merge branch 'main' into feat/add_gpu_test_labels

5bed9d9

Merge branch 'main' into feat/add_gpu_test_labels

a2782fe

mikemhenry enabled auto-merge (squash) October 27, 2025 17:50

mikemhenry merged commit 1152edd into main Oct 27, 2025
13 checks passed

mikemhenry deleted the feat/add_gpu_test_labels branch October 27, 2025 18:36

atravitz mentioned this pull request Dec 4, 2025

release/1.8.0 #1734

Merged

7 tasks

		- ".github/workflows/cpu-long-tests.yaml"
		- ".github/workflows/gpu-integration-tests.yaml"

	def pytest_collection_modifyitems(self, items, config):
	if (config.getoption('--integration') or
	os.getenv("OFE_INTEGRATION_TESTS", default="false").lower() == 'true'):
	return
	elif (config.getoption('--runslow') or
	os.getenv("OFE_SLOW_TESTS", default="false").lower() == 'true'):
	self._modify_integration(items, config)
	else:
	self._modify_integration(items, config)
	self._modify_slow(items, config)

Only run itergration tests on AWS GPU and CPU runners #1538

Only run itergration tests on AWS GPU and CPU runners #1538

Uh oh!

Conversation

mikemhenry commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Developers certificate of origin

Uh oh!

mikemhenry commented Sep 24, 2025

Uh oh!

mikemhenry Sep 24, 2025

Choose a reason for hiding this comment

Uh oh!

mikemhenry commented Sep 24, 2025

Uh oh!

mikemhenry commented Sep 24, 2025

Uh oh!

codecov bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mikemhenry commented Sep 25, 2025

Uh oh!

mikemhenry commented Sep 25, 2025

Uh oh!

mikemhenry commented Oct 9, 2025

Uh oh!

atravitz commented Oct 9, 2025

Uh oh!

atravitz commented Oct 9, 2025

Uh oh!

mikemhenry commented Oct 10, 2025

Uh oh!

IAlibay commented Oct 10, 2025

Uh oh!

IAlibay commented Oct 10, 2025

AWS CPU runners

AWS GPU runners

What's wrong with our integration tests?

Uh oh!

IAlibay left a comment

Choose a reason for hiding this comment

Uh oh!

IAlibay Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

mikemhenry Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

IAlibay Oct 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikemhenry commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

IAlibay commented Oct 10, 2025

Uh oh!

mikemhenry commented Oct 10, 2025

Uh oh!

Uh oh!

mikemhenry commented Oct 16, 2025

Uh oh!

IAlibay commented Oct 17, 2025

Uh oh!

IAlibay commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 17, 2025

Uh oh!

IAlibay commented Oct 17, 2025

Uh oh!

mikemhenry commented Oct 18, 2025

Uh oh!

atravitz commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 27, 2025

Uh oh!

Uh oh!

Reviewers

mikemhenry commented Sep 24, 2025 •

edited

Loading

codecov bot commented Sep 24, 2025 •

edited

Loading

mikemhenry Oct 10, 2025 •

edited

Loading

mikemhenry commented Oct 10, 2025 •

edited

Loading