From 3e2584812c80b0677f458c3612197d79528788df Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Thu, 6 Feb 2025 17:03:42 +0000 Subject: [PATCH 1/3] first pass at charge tutorial updates --- cli_tutorials/cli_charge_molecules.md | 103 ++++++++++++++++++++++++++ rbfe_tutorial/cli_tutorial.md | 37 +++++++-- rbfe_tutorial/python_tutorial.ipynb | 38 +++++++++- 3 files changed, 171 insertions(+), 7 deletions(-) create mode 100644 cli_tutorials/cli_charge_molecules.md diff --git a/cli_tutorials/cli_charge_molecules.md b/cli_tutorials/cli_charge_molecules.md new file mode 100644 index 0000000..863aafc --- /dev/null +++ b/cli_tutorials/cli_charge_molecules.md @@ -0,0 +1,103 @@ +# Generating Partial Charges with the OpenFE CLI + +This tutorial will show you how to use the OpenFE CLI to generate and store partial charges for a series of ligands +which can be used with OpenFE protocols. It is recommended to use a single set of charges for each ligand to ensure reproducibility +between repeats or consistent charges between different legs of a calculation involving the same ligand, like a relative binding +affinity calculation for example. As such both the `plan-rbfe-network` and `plan-rhfe-network` commands will calculate +partial charges for ligands making it expensive to run multiple network mappings while finding the optimal one for the resources +available. Hence, the `charge-molecules` command offers a way to generate and store the charges into an SDF file which +can be used with the rest of the OpenFE CLI and python API. + +## Charging Molecules + +The `charge-molecules` command allows you to generate partial charges a series of small molecules saved in SDF or MOL2 +format using the `am1bcc` method calculated using `ambertools`: + +```bash +openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf +``` + +This will result in a new SDF file `charged_tyk2_ligands.sdf` which contains the same ligands and their partial charges +stored in a new SD tag like so: + +```text +lig_ejm_42 + RDKit 3D + + 35 36 0 0 0 0 0 0 0 0999 V2000 + -4.7651 -2.8327 -16.5085 H 0 0 0 0 0 0 0 0 0 0 0 0 + -5.3566 -3.6931 -16.2274 C 0 0 0 0 0 0 0 0 0 0 0 0 + -4.7703 -4.9699 -16.2000 C 0 0 0 0 0 0 0 0 0 0 0 0 +[continues] + 28 31 1 0 +M END + +> +lig_ejm_42 + +> +0.14794282857142857 -0.096057171428571425 -0.12905717142857143 ... +[continues] +``` + +Generating partial charges with the `am1bcc` method can be slow as they require a semi-empirical quantum chemical calculation, +we can however take advantage of multiprocessing to calculate the charges in parallel for each ligand which offers a +significant speed-up. The number of processors available for the workflow can be specified using the `-n` flag: + +```bash +openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf -n 4 +``` + +## Customizing the Charge Method + +There are a wide range of partial charge generation methods available with `am1bcc` based schemes being most commonly +used with OpenFF force fields. The choice of charge scheme can be easily customised by providing a settings file in `.yaml` format. +For example to recreate the current default settings in the workflow you would do the following: + +1. Provide a file like `settings.yaml` with the desired settings: + +```yaml +partial_charge: + method: am1bcc + settings: + off_toolkit_backend: ambertools +``` + +2. Charge the ligands with an additional `-s` flag for passing the settings: + +```bash +openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf -n 4 -s settings.yaml +``` + +3. The output of the CLI program will now reflect the changes made: + +```text +SMALL MOLECULE PARTIAL CHARGE GENERATOR +_________________________________________ + +Parsing in Files: + Got input: + Small Molecules: SmallMoleculeComponent(name=lig_ejm_31) SmallMoleculeComponent(name=lig_ejm_42) SmallMoleculeComponent(name=lig_ejm_43) SmallMoleculeComponent(name=lig_ejm_46) SmallMoleculeComponent(name=lig_ejm_47) SmallMoleculeComponent(name=lig_ejm_48) SmallMoleculeComponent(name=lig_ejm_50) SmallMoleculeComponent(name=lig_jmc_23) SmallMoleculeComponent(name=lig_jmc_27) SmallMoleculeComponent(name=lig_jmc_28) +Using Options: + Partial Charge Generation: am1bcc +``` + +The full range of partial charge settings can be found in the snipit bellow, note that some may require installing extra packages. + +```yaml +partial_charge: + method: am1bcc + # method: am1bccelf10 + # method: espaloma + # method: nagl + settings: + off_toolkit_backend: ambertools + # off_toolkit_backend: openeye # required for the am1bccelf10 method + number_of_conformers: null # null specifies the use of the input conformer, a value requests that a new conformer be generated + # nagl_model: null # null specifies the use of the latest nagl model +``` + +## Overwriting Charges + +By default, the `charge-molecules` command will only assign partial charges to ligands which **do not** already have charges, +this behaviour can be changed via the `--overwrite-charges` flag which will assign new charges using the specified settings. \ No newline at end of file diff --git a/rbfe_tutorial/cli_tutorial.md b/rbfe_tutorial/cli_tutorial.md index 4bfdff3..a579bf2 100644 --- a/rbfe_tutorial/cli_tutorial.md +++ b/rbfe_tutorial/cli_tutorial.md @@ -50,9 +50,21 @@ we do the following: - Instruct `openfe` to output files into a directory called `network_setup` with the `-o network_setup` option. -Planning the campaign may take some time, as it tries to find the best -network from all possible transformations. This will create a directory called -`network_setup/`, which is structured like this: +Planning the campaign may take some time due to the complex series of tasks involved: + +- partial charges are generated for each of the ligands to ensure reproducibility, by default this requires a semi-empirical quantum +chemical calculation to calculate `am1bcc` charges +- atom mappings are created and scored based on the perceived difficulty for all possible ligand pairs +- an optimal network is extracted from all possible pairwise transformations which balances edge redundancy and the total difficulty score of the network + +The partial charge generation can take advantage of multiprocessing which offers a significant speed-up, you can specify +the number of processors available using the `-n` flag: + +```bash +openfe plan-rbfe-network -M tyk2_ligands.sdf -p tyk2_protein.pdb -o network_setup -n 4 +``` + +This will result in a directory called `network_setup/`, which is structured like this: @@ -87,7 +99,7 @@ The files that describe each individual simulation we will run are located withi leg to run and contains all the necessary information to run that leg. Filenames indicate ligand names as taken from the SDF; for example, the file `easy_rbfe_lig_ejm_31_complex_lig_ejm_42_complex.json` is the leg -associated with the tranformation of the ligand `lig_ejm_31` into `lig_ejm_42` +associated with the transformation of the ligand `lig_ejm_31` into `lig_ejm_42` while in complex with the protein. A single RBFE between a pair of ligands requires running two legs of an alchemical cycle (JSON files): @@ -112,8 +124,9 @@ OpenFE contains many different options and methods for setting up a simulation c The options can be easily accessed and modified by providing a settings file in the `.yaml` format. Let's assume you want to exchange the LOMAP atom mapper with the Kartograf -atom mapper and the Minimal Spanning Tree -Network Planner with the Maximal Network Planner, then you could do the following: +atom mapper, the Minimal Spanning Tree +Network Planner with the Maximal Network Planner and the am1bcc charge method with the am1bccelf10 version from openeye, +then you could do the following: 1. provide a file like `settings.yaml` with the desired changes: @@ -123,6 +136,11 @@ mapper: network: method: generate_maximal_network + +partial_charge: + method: am1bccelf10 + settings: + off_toolkit_backend: openeye ``` 2. Plan your rbfe network with an additional `-s` flag for passing the settings: @@ -148,6 +166,7 @@ Using Options: Mapper: Mapping Scorer: Networker: functools.partial() + Partial Charge Generation: am1bccelf10 ``` That concludes the straightforward process of tailoring your OpenFE setup to your specifications. @@ -166,6 +185,12 @@ network: # method: generate_radial_network # method: generate_maximal_network # method: generate_minimal_redundant_network + +partial_charge: + method: am1bcc + # method: am1bccelf10 + # settings: + # off_toolkit_backend: openeye # required for the am1bccelf10 method ``` **Customize away!** diff --git a/rbfe_tutorial/python_tutorial.ipynb b/rbfe_tutorial/python_tutorial.ipynb index 67140c8..97d1eea 100644 --- a/rbfe_tutorial/python_tutorial.ipynb +++ b/rbfe_tutorial/python_tutorial.ipynb @@ -47,6 +47,42 @@ "ligands = [openfe.SmallMoleculeComponent.from_rdkit(mol) for mol in supp]" ] }, + { + "cell_type": "markdown", + "id": "8e5de19a", + "metadata": {}, + "source": [ + "## Charging the ligands\n", + "\n", + "It is recommended to use a single set of charges for each ligand to ensure reproducibility between repeats or consistent charges between different legs of a calculation involving the same ligand, like a relative binding affinity calculation for example. \n", + "\n", + "Here we will use some utility functions from OpenFE which can assign partial charges to a series of molecules with a variety of methods which can be configured via the `OpenFFPartialChargeSettings` class. In this example \n", + "we will charge the ligands using the `am1bcc` method from `ambertools` which is the default charge scheme used by OpenFE." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "5219106c", + "metadata": {}, + "outputs": [], + "source": [ + "from openfe.protocols.openmm_utils.omm_settings import OpenFFPartialChargeSettings\n", + "from openfe.protocols.openmm_utils.charge_generation import bulk_assign_partial_charges\n", + "\n", + "charge_settings = OpenFFPartialChargeSettings(partial_charge_method=\"am1bcc\", off_toolkit_backend=\"ambertools\")\n", + "\n", + "charged_ligands = bulk_assign_partial_charges(\n", + " molecules=ligands,\n", + " overwrite=False, \n", + " method=charge_settings.partial_charge_method,\n", + " toolkit_backend=charge_settings.off_toolkit_backend,\n", + " generate_n_conformers=charge_settings.number_of_conformers,\n", + " nagl_model=charge_settings.nagl_model,\n", + " processors=1\n", + ")" + ] + }, { "cell_type": "markdown", "id": "6963be83", @@ -93,7 +129,7 @@ "outputs": [], "source": [ "ligand_network = network_planner(\n", - " ligands=ligands,\n", + " ligands=charged_ligands,\n", " mappers=[mapper],\n", " scorer=scorer\n", ")" From 1ba32d54998043fd3b09a9e2766ac566df43f541 Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Fri, 7 Feb 2025 10:17:01 +0000 Subject: [PATCH 2/3] add feedback --- cli_tutorials/cli_charge_molecules.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/cli_tutorials/cli_charge_molecules.md b/cli_tutorials/cli_charge_molecules.md index 863aafc..62833cf 100644 --- a/cli_tutorials/cli_charge_molecules.md +++ b/cli_tutorials/cli_charge_molecules.md @@ -1,12 +1,13 @@ # Generating Partial Charges with the OpenFE CLI -This tutorial will show you how to use the OpenFE CLI to generate and store partial charges for a series of ligands -which can be used with OpenFE protocols. It is recommended to use a single set of charges for each ligand to ensure reproducibility -between repeats or consistent charges between different legs of a calculation involving the same ligand, like a relative binding -affinity calculation for example. As such both the `plan-rbfe-network` and `plan-rhfe-network` commands will calculate -partial charges for ligands making it expensive to run multiple network mappings while finding the optimal one for the resources -available. Hence, the `charge-molecules` command offers a way to generate and store the charges into an SDF file which -can be used with the rest of the OpenFE CLI and python API. +It is recommended to use a single set of charges for each ligand to ensure reproducibility between repeats or consistent +charges between different legs of a calculation involving the same ligand, like a relative binding affinity calculation for example. +As such both the `plan-rbfe-network` and `plan-rhfe-network` commands will calculate partial charges for ligands making it expensive +to run multiple network mappings while finding the optimal one for the resources available. + +Here we present a CLI tool to allow you to do this ahead of time, reducing overheads and further improving reproducibility. +This tutorial will show you how to use the OpenFE CLI command `charge-molecules` to generate and store partial charges for a series of ligands +into an SDF file which can be used with OpenFE protocols. ## Charging Molecules @@ -42,7 +43,8 @@ lig_ejm_42 Generating partial charges with the `am1bcc` method can be slow as they require a semi-empirical quantum chemical calculation, we can however take advantage of multiprocessing to calculate the charges in parallel for each ligand which offers a -significant speed-up. The number of processors available for the workflow can be specified using the `-n` flag: +significant speed-up. The number of processors available for the workflow can be specified using the `-n` flag. For +example to spread out the calculation over 4 cores: ```bash openfe charge-molecules -M tyk2_ligands.sdf -o charged_tyk2_ligands.sdf -n 4 @@ -82,7 +84,7 @@ Using Options: Partial Charge Generation: am1bcc ``` -The full range of partial charge settings can be found in the snipit bellow, note that some may require installing extra packages. +The full range of partial charge settings can be found in the snippet bellow, note that some may require installing extra packages. ```yaml partial_charge: From 7f7e0458bc5c57893f787a79c115380f7950a746 Mon Sep 17 00:00:00 2001 From: Josh Horton Date: Tue, 11 Feb 2025 11:56:48 +0000 Subject: [PATCH 3/3] add pre-print link update wording --- cli_tutorials/cli_charge_molecules.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/cli_tutorials/cli_charge_molecules.md b/cli_tutorials/cli_charge_molecules.md index 62833cf..52bc991 100644 --- a/cli_tutorials/cli_charge_molecules.md +++ b/cli_tutorials/cli_charge_molecules.md @@ -1,11 +1,11 @@ # Generating Partial Charges with the OpenFE CLI It is recommended to use a single set of charges for each ligand to ensure reproducibility between repeats or consistent -charges between different legs of a calculation involving the same ligand, like a relative binding affinity calculation for example. -As such both the `plan-rbfe-network` and `plan-rhfe-network` commands will calculate partial charges for ligands making it expensive +charges between different legs of a calculation involving the same ligand, like a relative binding affinity calculation for example (see [Osato et al.](https://chemrxiv.org/engage/chemrxiv/article-details/67579833085116a133e39d86)). + As such both the `plan-rbfe-network` and `plan-rhfe-network` commands will calculate partial charges for ligands making it expensive to run multiple network mappings while finding the optimal one for the resources available. -Here we present a CLI tool to allow you to do this ahead of time, reducing overheads and further improving reproducibility. +Here we present a CLI tool to do this ahead of time, reducing overheads and further improving reproducibility. This tutorial will show you how to use the OpenFE CLI command `charge-molecules` to generate and store partial charges for a series of ligands into an SDF file which can be used with OpenFE protocols.