Skip to content

Conversation

@IAlibay
Copy link
Member

@IAlibay IAlibay commented Mar 4, 2024

Fixes #739 #704

Checklist

  • Added a news entry

Developers certificate of origin

@pep8speaks
Copy link

pep8speaks commented Mar 4, 2024

Hello @IAlibay! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 195:80: E501 line too long (93 > 79 characters)
Line 739:80: E501 line too long (81 > 79 characters)

Line 620:80: E501 line too long (81 > 79 characters)
Line 623:80: E501 line too long (80 > 79 characters)

Line 935:80: E501 line too long (81 > 79 characters)
Line 938:80: E501 line too long (80 > 79 characters)

Line 30:80: E501 line too long (136 > 79 characters)

Line 178:80: E501 line too long (132 > 79 characters)

Comment last updated at 2024-07-04 00:06:38 UTC

@IAlibay IAlibay linked an issue Mar 4, 2024 that may be closed by this pull request
@codecov
Copy link

codecov bot commented Mar 4, 2024

Codecov Report

Attention: Patch coverage is 89.74359% with 4 lines in your changes missing coverage. Please review.

Project coverage is 92.83%. Comparing base (be3433c) to head (567ef30).

Files with missing lines Patch % Lines
openfe/protocols/openmm_utils/omm_compute.py 63.63% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #752      +/-   ##
==========================================
- Coverage   94.59%   92.83%   -1.77%     
==========================================
  Files         134      134              
  Lines        9940     9961      +21     
==========================================
- Hits         9403     9247     -156     
- Misses        537      714     +177     
Flag Coverage Δ
fast-tests 92.83% <89.74%> (?)
slow-tests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@IAlibay IAlibay requested a review from mikemhenry July 4, 2024 00:08
@IAlibay
Copy link
Member Author

IAlibay commented Jul 4, 2024

@mikemhenry when you get a chance please do have a look at this - I suspect it'll make life a bit easier in some cases.

String with the platform name. If None, it will use the fastest
platform supporting mixed precision.
Default ``None``.
gpu_device_index : Optional[list[str]]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we should probably have a chat about how we handle this long term - this is a bit like MPI settings, where technically we shouldn't make this immutable but maybe something we pick up at run time?

How can we go about handling this properly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I don't think we abstracted well are "run time arguments". Like we have the split for settings that change thermo, but didn't consider a category of non-thermo settings that make the most sense to pick at runtime, I haven't looked at the code yet and will update this comment, but I suspect what we should do is

  1. have some default
  2. read this setting
  3. read in an environmental variable

If we do things in that order, it means we don't break anything old, then when configuring your system you can make some choices, but then when running on HPC you can still set things if needed and override the settings


**Changed:**

* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good example of a nice changelog entry but since this was a private API, no symver major bump needed

**Changed:**

* `openfe.protocols.openmm_rfe._rfe_utils.compute` has been moved
to `openfe.protocols.openmm_utils.omm_compute`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a private namespace or is this now in our public API?

Suggested change
to `openfe.protocols.openmm_utils.omm_compute`.
to `openfe.protocols.openmm_utils._omm_compute`.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say public developer API is fine, private was because we were directly vendoring from perses.

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is good, have a few notes but nothing blocking.

platform = compute.get_openmm_platform(
settings['engine_settings'].compute_platform
# Restrict CPU count if running vacuum simulation
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another argument we really should have an explicit way of saying this is a vacuum simulation. I propose we add a setting somewhere for that #904

In the meantime, this seems like a pretty good heuristic.

We could do more logging, I it would be nice to do a hackathon on it but in the mean time I will just suggest as I see it. It would be good to log what is going on here, maybe could be more verbose than what I suggest but this seems like a spot where if someone was like "why is this running on the CPU and not the GPU?" a log message could hep

Suggested change
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
restrict_cpu = settings['forcefield_settings'].nonbonded_method.lower() == 'nocutoff'
logging.info(f"{restrict_cpu=}")

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

@mikemhenry mikemhenry self-assigned this Nov 14, 2024
IAlibay and others added 2 commits November 20, 2024 14:35
Co-authored-by: Mike Henry <11765982+mikemhenry@users.noreply.github.com>
@IAlibay IAlibay requested a review from mikemhenry November 20, 2024 14:38
@IAlibay
Copy link
Member Author

IAlibay commented Nov 20, 2024

Okay so I think that the device index selection is good but I am non the fence if we should be doing anything other than warning users when they are running a vacuum on a GPU or with more than 1 thread.

I think there is an argument for this being a "sane default" that we provide, but we need some user doc explaining how by default this is how a vacuum transformation works for these protocol(s)

Are you saying that you don't want to go with this enforced 1 thread approach?
The main thing is that we don't have a way to control CPU count either at run time or via our settings, so we're relying on folks knowing they should use OPENMM_CPU_THREADS to set this to 1 (which isn't super well documented).

I'm happy to reconsider this change (and just stick to the deviceIndex stuff) - what do you think?

Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only bit needed -- we should respect whatever the user has set for OPENMM_CPU_THREADS if they have set it

@IAlibay
Copy link
Member Author

IAlibay commented Nov 21, 2024

@mikemhenry could you have a look and check that the latest change is what you meant?

@mikemhenry mikemhenry self-requested a review December 3, 2024 21:56
Copy link
Contributor

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is exactly what I was thinking!

@mikemhenry mikemhenry enabled auto-merge (squash) December 3, 2024 21:58
contained in MultiState reporter generated NetCDF file.
"""
ncfile = nc.Dataset(filename, 'w', format='NETCDF3_64BIT')
ncfile = nc.Dataset(filename, 'w', format='NETCDF3_64BIT_OFFSET')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikemhenry
Copy link
Contributor

running ci on main here https://github.com/OpenFreeEnergy/openfe/actions/runs/12162626721 to see if we are seeing these errors, I kinda remember this happening before and it was a weird thing...

@IAlibay IAlibay disabled auto-merge December 4, 2024 15:54
@IAlibay IAlibay closed this Dec 4, 2024
@IAlibay IAlibay reopened this Dec 4, 2024
@IAlibay
Copy link
Member Author

IAlibay commented Dec 4, 2024

Cycling, but it looks like we're hitting an rdkit or networkx error.

@IAlibay
Copy link
Member Author

IAlibay commented Dec 4, 2024

Enviroment diff:

16c16
<   argon2-cffi                       23.1.0          pyhd8ed1ab_1                       conda-forge
---
>   argon2-cffi                       23.1.0          pyhd8ed1ab_0                       conda-forge
50c50
<   bleach                            6.2.0           pyhd8ed1ab_1                       conda-forge
---
>   bleach                            6.2.0           pyhd8ed1ab_0                       conda-forge
54c54
<   botocore                          1.35.74         pyge310_1234567_0                  conda-forge
---
>   botocore                          1.35.73         pyge310_1234567_1                  conda-forge
91,93c91,93
<   dask                              2024.12.0       pyhd8ed1ab_1                       conda-forge
<   dask-core                         2024.12.0       pyhd8ed1ab_1                       conda-forge
<   dask-expr                         1.1.20          pyhd8ed1ab_0                       conda-forge
---
>   dask                              2024.11.2       pyhff2d567_1                       conda-forge
>   dask-core                         2024.11.2       pyhff2d567_1                       conda-forge
>   dask-expr                         1.1.19          pyhd8ed1ab_0                       conda-forge
100c100
<   distributed                       2024.12.0       pyhd8ed1ab_1                       conda-forge
---
>   distributed                       2024.11.2       pyhff2d567_1                       conda-forge
146c146
<   h2                                4.1.0           pyhd8ed1ab_1                       conda-forge
---
>   h2                                4.1.0           pyhd8ed1ab_0                       conda-forge
150c150
<   hpack                             4.0.0           pyhd8ed1ab_1                       conda-forge
---
>   hpack                             4.0.0           pyh9f0ad1d_0                       conda-forge
153c153
<   hyperframe                        6.0.1           pyhd8ed1ab_1                       conda-forge
---
>   hyperframe                        6.0.1           pyhd8ed1ab_0                       conda-forge
165c165
<   jedi                              0.19.2          pyhd8ed1ab_1                       conda-forge
---
>   jedi                              0.19.2          pyhff2d567_0                       conda-forge
169c169
<   json5                             0.10.0          pyhd8ed1ab_1                       conda-forge
---
>   json5                             0.10.0          pyhd8ed1ab_0                       conda-forge
280c280
<   mpmath                            1.3.0           pyhd8ed1ab_1                       conda-forge
---
>   mpmath                            1.3.0           pyhd8ed1ab_0                       conda-forge
290c290
<   netcdf-fortran                    4.6.1           nompi_ha5d1325_108                 conda-forge
---
>   netcdf-fortran                    4.6.1           nompi_ha5d1325_107                 conda-forge
327c327
<   parso                             0.8.4           pyhd8ed1ab_1                       conda-forge
---
>   parso                             0.8.4           pyhd8ed1ab_0                       conda-forge
334c334
<   pexpect                           4.9.0           pyhd8ed1ab_1                       conda-forge
---
>   pexpect                           4.9.0           pyhd8ed1ab_0                       conda-forge
346,347c346,347
<   prometheus_client                 0.21.0          pyhd8ed1ab_1                       conda-forge
<   prompt-toolkit                    3.0.48          pyha770c72_1                       conda-forge
---
>   prometheus_client                 0.21.0          pyhd8ed1ab_0                       conda-forge
>   prompt-toolkit                    3.0.48          pyha770c72_0                       conda-forge
353c353
<   ptyprocess                        0.7.0           pyhd8ed1ab_1                       conda-forge
---
>   ptyprocess                        0.7.0           pyhd3deb0d_0                       conda-forge
363c363
<   pydantic                          2.10.3          pyh3cfb1c2_0                       conda-forge
---
>   pydantic                          2.10.2          pyh3cfb1c2_1                       conda-forge
373c373
<   pytables                          3.10.1          py310h431dcdc_4                    conda-forge
---
>   pytables                          3.10.1          py310h431dcdc_3                    conda-forge
402c402
<   rpds-py                           0.22.1          py310h505e2c1_0                    conda-forge
---
>   rpds-py                           0.22.0          py310h505e2c1_0                    conda-forge
450c450
<   types-python-dateutil             2.9.0.20241003  pyhd8ed1ab_1                       conda-forge
---
>   types-python-dateutil             2.9.0.20241003  pyhff2d567_0                       conda-forge
460c460
<   webcolors                         24.8.0          pyhd8ed1ab_1                       conda-forge
---
>   webcolors                         24.8.0          pyhd8ed1ab_0                       conda-forge
517c517
< botocore                      1.35.74
---
> botocore                      1.35.73
542,543c542,543
< dask                          2024.12.0
< dask-expr                     1.1.20
---
> dask                          2024.11.2
> dask-expr                     1.1.19
551c551
< distributed                   2024.12.0
---
> distributed                   2024.11.2
580c580
< gufe                          1.1.0+32.g5dd22ef
---
> gufe                          1.1.0+31.gf8c49d5
656c656
< openfe                        1.2.0+88.g4c0b2c28           /home/runner/work/openfe/openfe
---
> openfe                        1.2.0+87.g3bb014e4           /home/runner/work/openfe/openfe
710c710
< pydantic                      2.10.3
---
> pydantic                      2.10.2
742c742
< rpds-py                       0.22.1
---
> rpds-py                       0.22.0

Possibilities are either:

  1. Something changed in gufe (hybridization things)
  2. Something changed in dask

@IAlibay
Copy link
Member Author

IAlibay commented Dec 4, 2024

Have opened #1033 - this should be a simple fix if it's what I think it is.

@IAlibay IAlibay mentioned this pull request Dec 4, 2024
2 tasks
@IAlibay
Copy link
Member Author

IAlibay commented Dec 4, 2024

@mikemhenry looks like all the failures are related to #1033

Are you ok with merging this as-is (knowing CI is failing due to another issue) or do you want to wait until we fix #1033?

@mikemhenry
Copy link
Contributor

Happy to merge in now that we have #1033 triaged.

@mikemhenry mikemhenry merged commit 72d623a into main Dec 5, 2024
7 of 21 checks passed
@mikemhenry mikemhenry deleted the fix-compute branch December 5, 2024 15:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add cuda DeviceIndex in engine_settings Make get_openmm_platform set threads to 1 if using NoCutoff

4 participants