Skip to content

Conversation

@mikemhenry
Copy link
Contributor

@mikemhenry mikemhenry commented Sep 24, 2025

Checklist

  • Added a news entry
    -m MARKEXPR Only run tests matching given mark expression. For example: -m 'mark1 and not mark2'.

Developers certificate of origin

Fixes #1527

@mikemhenry mikemhenry marked this pull request as draft September 24, 2025 20:19
@mikemhenry
Copy link
Contributor Author

Comment on lines +12 to +13
- ".github/workflows/cpu-long-tests.yaml"
- ".github/workflows/gpu-integration-tests.yaml"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding skips here so that when we make some small change to our action that runs on AWS we don't fire off the whole testing matrix.

@mikemhenry
Copy link
Contributor Author

I actually forgot that we don't run the integration tests on the CPU runner -- do we only want to run the slow tests on the CPU runner @IAlibay @atravitz ? Right now we run all the "normal" tests + the slow tests on the CPU aws runner.

@mikemhenry
Copy link
Contributor Author

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.88%. Comparing base (fe2d11c) to head (a2782fe).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1538      +/-   ##
==========================================
- Coverage   95.33%   92.88%   -2.45%     
==========================================
  Files         183      183              
  Lines       15763    15763              
==========================================
- Hits        15028    14642     -386     
- Misses        735     1121     +386     
Flag Coverage Δ
fast-tests 92.88% <ø> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mikemhenry
Copy link
Contributor Author

GPU test here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17991765097/job/51183190807

Looks like one failed -- should we add some retry logic to the test?

FAILED openfe/tests/protocols/openmm_ahfe/test_ahfe_slow.py::test_openmm_run_engine[CPU] - openmmtools.multistate.utils.SimulationNaNError: Propagating replica 0 at state 0 resulted in a NaN!
The state of the system and integrator before the error were saved in /tmp/pytest-of-root/pytest-0/popen-gw3/test_openmm_run_engine_CPU_3/shared_AbsoluteSolvationSolventUnit-

@mikemhenry
Copy link
Contributor Author

Took 40 minutes:

======= 1 failed, 7 passed, 158 warnings, 3 rerun in 2346.37s (0:39:06) ========

@mikemhenry mikemhenry marked this pull request as ready for review September 26, 2025 15:51
@mikemhenry
Copy link
Contributor Author

@atravitz @IAlibay Ready for review! The goal here was to just run the integration tests and not our other tests when using the GPU and CPU runner.

@atravitz
Copy link
Contributor

atravitz commented Oct 9, 2025

GPU test here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17991765097/job/51183190807

Looks like one failed -- should we add some retry logic to the test?

FAILED openfe/tests/protocols/openmm_ahfe/test_ahfe_slow.py::test_openmm_run_engine[CPU] - openmmtools.multistate.utils.SimulationNaNError: Propagating replica 0 at state 0 resulted in a NaN!
The state of the system and integrator before the error were saved in /tmp/pytest-of-root/pytest-0/popen-gw3/test_openmm_run_engine_CPU_3/shared_AbsoluteSolvationSolventUnit-

tagging @IAlibay to ask how expected this is.

@atravitz
Copy link
Contributor

atravitz commented Oct 9, 2025

I actually forgot that we don't run the integration tests on the CPU runner -- do we only want to run the slow tests on the CPU runner @IAlibay @atravitz ? Right now we run all the "normal" tests + the slow tests on the CPU aws runner.

Hmm, I think the CPU runner can just be slow tests - I know @IAlibay specifically wanted this for when he was doing development away from his workstation. I think unit tests locally and then slow tests on the runner should still meet his need?

@mikemhenry
Copy link
Contributor Author

Okay CPU runner just doing the slow tests now, testing that here:
https://github.com/OpenFreeEnergy/openfe/actions/runs/18412521347

@IAlibay
Copy link
Member

IAlibay commented Oct 10, 2025

GPU test here: https://github.com/OpenFreeEnergy/openfe/actions/runs/17991765097/job/51183190807
Looks like one failed -- should we add some retry logic to the test?

FAILED openfe/tests/protocols/openmm_ahfe/test_ahfe_slow.py::test_openmm_run_engine[CPU] - openmmtools.multistate.utils.SimulationNaNError: Propagating replica 0 at state 0 resulted in a NaN!
The state of the system and integrator before the error were saved in /tmp/pytest-of-root/pytest-0/popen-gw3/test_openmm_run_engine_CPU_3/shared_AbsoluteSolvationSolventUnit-

tagging @IAlibay to ask how expected this is.

Shouldn't be happening very often, we can throw a retry in there though, wouldn't hurt.

@IAlibay
Copy link
Member

IAlibay commented Oct 10, 2025

I actually forgot that we don't run the integration tests on the CPU runner -- do we only want to run the slow tests on the CPU runner @IAlibay @atravitz ? Right now we run all the "normal" tests + the slow tests on the CPU aws runner.

Hmm, I think the CPU runner can just be slow tests - I know @IAlibay specifically wanted this for when he was doing development away from his workstation. I think unit tests locally and then slow tests on the runner should still meet his need?

Here's what my view is on what we want:

AWS CPU runners

Normal tests: yes
Slow tests: yes
Integration tests: ~ no (see below)

AWS GPU runners

Normal tests: if we have to
Slow tests: no
Integration tests: ~ yes (see below)

What's wrong with our integration tests?

The most important thing is that we don't use GPU runners in places we don't need them, i.e. we don't want to use GPU dollars running a bunch of CPU-only tests.

What we would want, is an additional flag that ONLY runs GPU tests, and then we can run any integration tests that need CPU on a specific CPU integration runner.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue I have with this is that we're going to spend lots of expensive GPU time number crunching on the CPU. It may be order a few hundred dollars, but it's going to add up real quick as we add more slow tests.

Ideally what we need to do it decouple integration from slow, or add a special "integration only" flag here:

def pytest_collection_modifyitems(self, items, config):
if (config.getoption('--integration') or
os.getenv("OFE_INTEGRATION_TESTS", default="false").lower() == 'true'):
return
elif (config.getoption('--runslow') or
os.getenv("OFE_SLOW_TESTS", default="false").lower() == 'true'):
self._modify_integration(items, config)
else:
self._modify_integration(items, config)
self._modify_slow(items, config)

OFE_INTEGRATION_TESTS: FALSE
run: |
pytest -n logical -vv --durations=10 openfecli/tests/ openfe/tests/
pytest -n logical -vv --durations=10 -m slow openfecli/tests/ openfe/tests/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need -m slow? I would have though the env variable would be enough.

Copy link
Contributor Author

@mikemhenry mikemhenry Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought we only wanted to run the slow tests (and not the normal ones) on the CPU runner, will fix this!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be ok too.

@mikemhenry
Copy link
Contributor Author

mikemhenry commented Oct 10, 2025

Okay to expand on what I wrote here: #1527 (comment)

The -m flag

Only run tests matching given mark expression. For example: -m 'mark1 and not mark2'.

Which means -m integration ONLY runs tests with the @pytest.mark.integration (so since normal and slow tests do not have that mark, they do not get run).

This will satisfy what you want for the GPU runners.

For the CPU runners, since we want slow + normal, that is easy to do with flags/args.

What we would want, is an additional flag that ONLY runs GPU tests, and then we can run any integration tests that need CPU on a specific CPU integration runner.

The -m integration flag does exactly that! So we should be good on the GPU runner 😎

@IAlibay
Copy link
Member

IAlibay commented Oct 10, 2025

Ah ok thanks for the explanation @mikemhenry , I hadn't caught that.

@mikemhenry
Copy link
Contributor Author

No worries! I could have sworn I explained the -m flag but I didn't, and even in the issue I linked I actually didn't explain it either. I will add a comment to the workflow since I will forget about the flag I'm sure

@mikemhenry
Copy link
Contributor Author

I guess it depends if all integration tests are GPU tests, or if all GPU tests will be integration tests. If we don't think there is anything interesting there, then we can rename integration -> gpu

Thoughts @IAlibay

@IAlibay
Copy link
Member

IAlibay commented Oct 17, 2025

I guess it depends if all integration tests are GPU tests, or if all GPU tests will be integration tests. If we don't think there is anything interesting there, then we can rename integration -> gpu

Thoughts @IAlibay

All integration tests are GPU, but not all integration tests are ONLY GPU.

Very roughly, if we can control the OpenMM platform with pytest marks, then that's the answer. We stick on a gpu and cpu mark and then use the name of the active mark to pick the platform (or skip the test).

If you gimme some pseudo code I'm happy to show a demo of what I mean.

@IAlibay
Copy link
Member

IAlibay commented Oct 17, 2025

This! https://stackoverflow.com/a/74804492

We just need to create a gpu and cpu mark and then check for the mark's presence and pick from a dictionary, with a prerference for GPU over CPU if both are available.

@mikemhenry
Copy link
Contributor Author

I think the three of us are either kinda talking past each other or I don't fully understand the scope of what we are trying to do. As this PR currently stands:

  • Only tests marked integration run on the GPU runner
  • "normal" tests and tests marked slow run on the CPU runner.

If this isn't quite what we are trying to do, let me know!

It sounds like there are some integration tests that we want to run on a CPU, which ones are those (do you have some in mind or is there some heuristic to choosing these)?

@IAlibay
Copy link
Member

IAlibay commented Oct 17, 2025

I think the three of us are either kinda talking past each other or I don't fully understand the scope of what we are trying to do. As this PR currently stands:

  • Only tests marked integration run on the GPU runner
  • "normal" tests and tests marked slow run on the CPU runner.

If this isn't quite what we are trying to do, let me know!

It sounds like there are some integration tests that we want to run on a CPU, which ones are those (do you have some in mind or is there some heuristic to choosing these)?

I agree, we're going out of scope, the untangling of CPU & GPU tests for the integration tests can be done in a separate PR. @atravitz are you happy with that?

@mikemhenry
Copy link
Contributor Author

I think @atravitz main point was to re-name the integration mark to gpu since it functional (now) marks tests we want to run on the GPU. I think we should keep them labeled as integration and then we can untangle them (maybe integration-gpu and integration-cpu) in another PR.

@atravitz
Copy link
Contributor

I think we should keep them labeled as integration and then we can untangle them (maybe integration-gpu and integration-cpu) in another PR.

I agree!

@github-actions
Copy link

No API break detected ✅

@mikemhenry mikemhenry enabled auto-merge (squash) October 27, 2025 17:50
@mikemhenry mikemhenry merged commit 1152edd into main Oct 27, 2025
13 checks passed
@mikemhenry mikemhenry deleted the feat/add_gpu_test_labels branch October 27, 2025 18:36
@atravitz atravitz mentioned this pull request Dec 4, 2025
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add gpu test labels

4 participants