Compute selection: deviceIndex & enforce 1 thread in vacuum #752

IAlibay · 2024-03-04T11:33:15Z

Fixes #739 #704

Checklist

Added a news entry

Developers certificate of origin

I certify that this contribution is covered by the MIT License here and the Developer Certificate of Origin at https://developercertificate.org/.

pep8speaks · 2024-03-04T11:33:26Z

Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

In the file openfe/protocols/openmm_afe/base.py:

Line 195:80: E501 line too long (93 > 79 characters)
Line 739:80: E501 line too long (81 > 79 characters)

In the file openfe/protocols/openmm_md/plain_md_methods.py:

Line 620:80: E501 line too long (81 > 79 characters)
Line 623:80: E501 line too long (80 > 79 characters)

In the file openfe/protocols/openmm_rfe/equil_rfe_methods.py:

Line 935:80: E501 line too long (81 > 79 characters)
Line 938:80: E501 line too long (80 > 79 characters)

In the file openfe/protocols/openmm_utils/omm_compute.py:

Line 30:80: E501 line too long (136 > 79 characters)

In the file openfe/protocols/openmm_utils/omm_settings.py:

Line 178:80: E501 line too long (132 > 79 characters)

Comment last updated at 2024-07-04 00:06:38 UTC

codecov · 2024-03-04T11:45:23Z

Codecov Report

Attention: Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.83%. Comparing base (be3433c) to head (567ef30).

Files with missing lines	Patch %	Lines
openfe/protocols/openmm_utils/omm_compute.py	63.63%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #752      +/-   ##
==========================================
- Coverage   94.59%   92.83%   -1.77%     
==========================================
  Files         134      134              
  Lines        9940     9961      +21     
==========================================
- Hits         9403     9247     -156     
- Misses        537      714     +177

Flag	Coverage Δ
fast-tests	`92.83% <89.74%> (?)`
slow-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

IAlibay · 2024-07-04T00:08:33Z

@mikemhenry when you get a chance please do have a look at this - I suspect it'll make life a bit easier in some cases.

IAlibay · 2024-07-04T00:09:31Z

openfe/protocols/openmm_utils/omm_compute.py

        String with the platform name. If None, it will use the fastest
        platform supporting mixed precision.
+        Default ``None``.
+    gpu_device_index : Optional[list[str]]


Actually we should probably have a chat about how we handle this long term - this is a bit like MPI settings, where technically we shouldn't make this immutable but maybe something we pick up at run time?

How can we go about handling this properly?

Something I don't think we abstracted well are "run time arguments". Like we have the split for settings that change thermo, but didn't consider a category of non-thermo settings that make the most sense to pick at runtime, I haven't looked at the code yet and will update this comment, but I suspect what we should do is

have some default

read this setting

read in an environmental variable

If we do things in that order, it means we don't break anything old, then when configuring your system you can make some choices, but then when running on HPC you can still set things if needed and override the settings

mikemhenry · 2024-07-23T23:14:45Z

news/compute_selection_fixes.rst

+
+**Changed:**
+
+* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved


good example of a nice changelog entry but since this was a private API, no symver major bump needed

mikemhenry · 2024-07-23T23:15:15Z

news/compute_selection_fixes.rst

+**Changed:**
+
+* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved
+  to `openfe.protocols.openmm_utils.omm_compute`.


Do we want a private namespace or is this now in our public API?

Suggested change

to `openfe.protocols.openmm_utils.omm_compute`.

to `openfe.protocols.openmm_utils._omm_compute`.

I'd say public developer API is fine, private was because we were directly vendoring from perses.

mikemhenry

This one is good, have a few notes but nothing blocking.

mikemhenry · 2024-07-23T23:26:09Z

openfe/protocols/openmm_afe/base.py

-        platform = compute.get_openmm_platform(
-            settings['engine_settings'].compute_platform
+        # Restrict CPU count if running vacuum simulation
+        restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'


another argument we really should have an explicit way of saying this is a vacuum simulation. I propose we add a setting somewhere for that #904

In the meantime, this seems like a pretty good heuristic.

We could do more logging, I it would be nice to do a hackathon on it but in the mean time I will just suggest as I see it. It would be good to log what is going on here, maybe could be more verbose than what I suggest but this seems like a spot where if someone was like "why is this running on the CPU and not the GPU?" a log message could hep

Suggested change

restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'

restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'

logging.info(f"{restrict_cpu=}")

mikemhenry

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

news/compute_selection_fixes.rst

openfe/tests/protocols/test_rfe_tokenization.py

Co-authored-by: Mike Henry <11765982+mikemhenry@users.noreply.github.com>

IAlibay · 2024-11-20T14:40:18Z

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

Are you saying that you don't want to go with this enforced 1 thread approach?
The main thing is that we don't have a way to control CPU count either at run time or via our settings, so we're relying on folks knowing they should use OPENMM_CPU_THREADS to set this to 1 (which isn't super well documented).

I'm happy to reconsider this change (and just stick to the deviceIndex stuff) - what do you think?

mikemhenry

Only bit needed -- we should respect whatever the user has set for OPENMM_CPU_THREADS if they have set it

IAlibay · 2024-11-21T19:08:29Z

@mikemhenry could you have a look and check that the latest change is what you meant?

mikemhenry

Yes that is exactly what I was thinking!

IAlibay · 2024-12-04T13:27:53Z

openfe/utils/handle_trajectories.py

        contained in MultiState reporter generated NetCDF file.
    """
-    ncfile = nc.Dataset(filename, 'w', format='NETCDF3_64BIT')
+    ncfile = nc.Dataset(filename, 'w', format='NETCDF3_64BIT_OFFSET')


This should fix the mypy error, see here: https://github.com/Unidata/netcdf4-python/blob/9426a684d3bd7b6c0c9b4d3c3f8531f30583d1e2/README.md?plain=1#L202C37-L202C57

mikemhenry · 2024-12-04T15:12:05Z

running ci on main here https://github.com/OpenFreeEnergy/openfe/actions/runs/12162626721 to see if we are seeing these errors, I kinda remember this happening before and it was a weird thing...

IAlibay · 2024-12-04T15:54:59Z

Cycling, but it looks like we're hitting an rdkit or networkx error.

IAlibay · 2024-12-04T16:00:00Z

Enviroment diff:

16c16
<   argon2-cffi                       23.1.0          pyhd8ed1ab_1                       conda-forge
---
>   argon2-cffi                       23.1.0          pyhd8ed1ab_0                       conda-forge
50c50
<   bleach                            6.2.0           pyhd8ed1ab_1                       conda-forge
---
>   bleach                            6.2.0           pyhd8ed1ab_0                       conda-forge
54c54
<   botocore                          1.35.74         pyge310_1234567_0                  conda-forge
---
>   botocore                          1.35.73         pyge310_1234567_1                  conda-forge
91,93c91,93
<   dask                              2024.12.0       pyhd8ed1ab_1                       conda-forge
<   dask-core                         2024.12.0       pyhd8ed1ab_1                       conda-forge
<   dask-expr                         1.1.20          pyhd8ed1ab_0                       conda-forge
---
>   dask                              2024.11.2       pyhff2d567_1                       conda-forge
>   dask-core                         2024.11.2       pyhff2d567_1                       conda-forge
>   dask-expr                         1.1.19          pyhd8ed1ab_0                       conda-forge
100c100
<   distributed                       2024.12.0       pyhd8ed1ab_1                       conda-forge
---
>   distributed                       2024.11.2       pyhff2d567_1                       conda-forge
146c146
<   h2                                4.1.0           pyhd8ed1ab_1                       conda-forge
---
>   h2                                4.1.0           pyhd8ed1ab_0                       conda-forge
150c150
<   hpack                             4.0.0           pyhd8ed1ab_1                       conda-forge
---
>   hpack                             4.0.0           pyh9f0ad1d_0                       conda-forge
153c153
<   hyperframe                        6.0.1           pyhd8ed1ab_1                       conda-forge
---
>   hyperframe                        6.0.1           pyhd8ed1ab_0                       conda-forge
165c165
<   jedi                              0.19.2          pyhd8ed1ab_1                       conda-forge
---
>   jedi                              0.19.2          pyhff2d567_0                       conda-forge
169c169
<   json5                             0.10.0          pyhd8ed1ab_1                       conda-forge
---
>   json5                             0.10.0          pyhd8ed1ab_0                       conda-forge
280c280
<   mpmath                            1.3.0           pyhd8ed1ab_1                       conda-forge
---
>   mpmath                            1.3.0           pyhd8ed1ab_0                       conda-forge
290c290
<   netcdf-fortran                    4.6.1           nompi_ha5d1325_108                 conda-forge
---
>   netcdf-fortran                    4.6.1           nompi_ha5d1325_107                 conda-forge
327c327
<   parso                             0.8.4           pyhd8ed1ab_1                       conda-forge
---
>   parso                             0.8.4           pyhd8ed1ab_0                       conda-forge
334c334
<   pexpect                           4.9.0           pyhd8ed1ab_1                       conda-forge
---
>   pexpect                           4.9.0           pyhd8ed1ab_0                       conda-forge
346,347c346,347
<   prometheus_client                 0.21.0          pyhd8ed1ab_1                       conda-forge
<   prompt-toolkit                    3.0.48          pyha770c72_1                       conda-forge
---
>   prometheus_client                 0.21.0          pyhd8ed1ab_0                       conda-forge
>   prompt-toolkit                    3.0.48          pyha770c72_0                       conda-forge
353c353
<   ptyprocess                        0.7.0           pyhd8ed1ab_1                       conda-forge
---
>   ptyprocess                        0.7.0           pyhd3deb0d_0                       conda-forge
363c363
<   pydantic                          2.10.3          pyh3cfb1c2_0                       conda-forge
---
>   pydantic                          2.10.2          pyh3cfb1c2_1                       conda-forge
373c373
<   pytables                          3.10.1          py310h431dcdc_4                    conda-forge
---
>   pytables                          3.10.1          py310h431dcdc_3                    conda-forge
402c402
<   rpds-py                           0.22.1          py310h505e2c1_0                    conda-forge
---
>   rpds-py                           0.22.0          py310h505e2c1_0                    conda-forge
450c450
<   types-python-dateutil             2.9.0.20241003  pyhd8ed1ab_1                       conda-forge
---
>   types-python-dateutil             2.9.0.20241003  pyhff2d567_0                       conda-forge
460c460
<   webcolors                         24.8.0          pyhd8ed1ab_1                       conda-forge
---
>   webcolors                         24.8.0          pyhd8ed1ab_0                       conda-forge
517c517
< botocore                      1.35.74
---
> botocore                      1.35.73
542,543c542,543
< dask                          2024.12.0
< dask-expr                     1.1.20
---
> dask                          2024.11.2
> dask-expr                     1.1.19
551c551
< distributed                   2024.12.0
---
> distributed                   2024.11.2
580c580
< gufe                          1.1.0+32.g5dd22ef
---
> gufe                          1.1.0+31.gf8c49d5
656c656
< openfe                        1.2.0+88.g4c0b2c28           /home/runner/work/openfe/openfe
---
> openfe                        1.2.0+87.g3bb014e4           /home/runner/work/openfe/openfe
710c710
< pydantic                      2.10.3
---
> pydantic                      2.10.2
742c742
< rpds-py                       0.22.1
---
> rpds-py                       0.22.0

Possibilities are either:

Something changed in gufe (hybridization things)
Something changed in dask

IAlibay · 2024-12-04T16:13:37Z

Have opened #1033 - this should be a simple fix if it's what I think it is.

IAlibay · 2024-12-04T18:33:15Z

@mikemhenry looks like all the failures are related to #1033

Are you ok with merging this as-is (knowing CI is failing due to another issue) or do you want to wait until we fix #1033?

mikemhenry · 2024-12-05T15:18:08Z

Happy to merge in now that we have #1033 triaged.

Fix openmm compute platform selection for issues 739 and 704

8062e72

Add rever entry

a6e41f1

IAlibay linked an issue Mar 4, 2024 that may be closed by this pull request

Make get_openmm_platform set threads to 1 if using NoCutoff #704

Closed

fix typing

0405e85

Remove erroneous extra file

c349ceb

IAlibay mentioned this pull request Mar 4, 2024

add cuda DeviceIndex in engine_settings #739

Closed

IAlibay added 3 commits April 21, 2024 12:49

Merge branch 'main' into fix-compute

d931c81

Merge branch 'main' into fix-compute

95146a7

fix gufe keys

48707e2

IAlibay requested a review from mikemhenry July 4, 2024 00:08

IAlibay commented Jul 4, 2024

View reviewed changes

mikemhenry reviewed Jul 23, 2024

View reviewed changes

mikemhenry approved these changes Jul 23, 2024

View reviewed changes

IAlibay added the priority:low label Oct 28, 2024

IAlibay added priority:medium and removed priority:low labels Nov 12, 2024

mikemhenry self-requested a review November 13, 2024 20:33

mikemhenry requested changes Nov 13, 2024

View reviewed changes

news/compute_selection_fixes.rst Outdated Show resolved Hide resolved

openfe/tests/protocols/test_rfe_tokenization.py Outdated Show resolved Hide resolved

mikemhenry self-assigned this Nov 14, 2024

IAlibay and others added 2 commits November 20, 2024 14:35

Merge branch 'main' into fix-compute

e6e02f2

Update news/compute_selection_fixes.rst

229322d

Co-authored-by: Mike Henry <11765982+mikemhenry@users.noreply.github.com>

IAlibay requested a review from mikemhenry November 20, 2024 14:38

mikemhenry approved these changes Nov 21, 2024

View reviewed changes

Update omm_compute.py

b877e76

import os

7f4d421

mikemhenry self-requested a review December 3, 2024 21:56

mikemhenry approved these changes Dec 3, 2024

View reviewed changes

Merge branch 'main' into fix-compute

567ef30

mikemhenry enabled auto-merge (squash) December 3, 2024 21:58

NETCDF3_64BIT is now NETCDF3_64BIT_OFFSET

6425392

IAlibay commented Dec 4, 2024

View reviewed changes

IAlibay disabled auto-merge December 4, 2024 15:54

IAlibay closed this Dec 4, 2024

IAlibay reopened this Dec 4, 2024

IAlibay mentioned this pull request Dec 4, 2024

[WIP] Issue 1033 #1034

Closed

2 tasks

mikemhenry merged commit 72d623a into main Dec 5, 2024
7 of 21 checks passed

mikemhenry deleted the fix-compute branch December 5, 2024 15:18


		Changed:

		* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved

	to `openfe.protocols.openmm_utils.omm_compute`.
	to `openfe.protocols.openmm_utils._omm_compute`.

	restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
	restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
	logging.info(f"{restrict_cpu=}")

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

Compute selection: deviceIndex & enforce 1 thread in vacuum #752

Uh oh!

Conversation

IAlibay commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Developers certificate of origin

Uh oh!

pep8speaks commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2024-07-04 00:06:38 UTC

Uh oh!

codecov bot commented Mar 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

IAlibay commented Jul 4, 2024

Uh oh!

IAlibay Jul 4, 2024

Choose a reason for hiding this comment

Uh oh!

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

IAlibay Nov 20, 2024

Choose a reason for hiding this comment

Uh oh!

mikemhenry left a comment

Choose a reason for hiding this comment

Uh oh!

mikemhenry Jul 23, 2024

Choose a reason for hiding this comment

Uh oh!

mikemhenry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IAlibay commented Nov 20, 2024

Uh oh!

mikemhenry left a comment

Choose a reason for hiding this comment

Uh oh!

IAlibay commented Nov 21, 2024

Uh oh!

mikemhenry left a comment

Choose a reason for hiding this comment

Uh oh!

IAlibay Dec 4, 2024

Choose a reason for hiding this comment

Uh oh!

mikemhenry commented Dec 4, 2024

Uh oh!

IAlibay commented Dec 4, 2024

Uh oh!

IAlibay commented Dec 4, 2024

Uh oh!

IAlibay commented Dec 4, 2024

Uh oh!

IAlibay commented Dec 4, 2024

Uh oh!

mikemhenry commented Dec 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

IAlibay commented Mar 4, 2024 •

edited

Loading

pep8speaks commented Mar 4, 2024 •

edited

Loading

codecov bot commented Mar 4, 2024 •

edited

Loading