Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2017, David L. Mobley and Michael K. Gilson
Copyright 2017, David L. Mobley, Germano Heinzelmann, Niel M. Henriksen and Michael K. Gilson

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Expand Down
18 changes: 13 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Benchmark sets for free energy calculations

This repository relates to the *perpetual review* ([definition](https://arxiv.org/abs/1502.01329)) paper called "[Predicting binding free energies: Frontiers and benchmarks](https://github.com/MobleyLab/benchmarksets/blob/master/paper/benchmarkset.pdf)" by David L. Mobley and Michael K. Gilson.
This repository relates to the *perpetual review* ([definition](https://arxiv.org/abs/1502.01329)) paper called "[Predicting binding free energies: Frontiers and benchmarks](https://github.com/MobleyLab/benchmarksets/blob/master/paper/benchmarkset.pdf)" by David L. Mobley, Germano Heinzelmann, Niel M. Henriksen, and Michael K. Gilson.
The repository's focus is benchmark sets for binding free energy calculations, including the perpetual review paper, but also supporting files and other materials relating to free energy benchmarks.
Thus, the repository includes not only the perpetual review paper but also further discussion, datasets, and (hopefully ultimately) standards for datasets and data deposition.

Expand Down Expand Up @@ -45,13 +45,14 @@ Currently proposed benchmark sets are detailed in [the paper](https://github.com
* Host guest systems
* CB7
* Gibb deep cavity cavitands (GDCCs) OA and TEMOA
* Cyclodextrins (alpha and beta)
* Lysozyme model binding sites
* apolar L99A
* polar L99A/M102Q
* Bromodomain BRD4-1

Other near-term candidates include:
* Thrombin
* Bromodomains
* Suggest and vote on your favorites via a feature request below

Community involvement is needed to pick and advance the best benchmarks.
Expand All @@ -77,6 +78,7 @@ We also welcome contributions to the material which is already here to extend it
## Authors
- David L. Mobley (UCI)
- Germano Heinzelmann (Universidade Federal de Santa Catarina)
- Niel M. Henriksen (UCSD)
- Michael K. Gilson (UCSD)

Your name, too, can go here if you help us substantially revise/extend the paper.
Expand All @@ -88,7 +90,7 @@ We want to thank the following people who contributed to this repository and the

- David Slochower (UCSD, Gilson lab): Grammar corrections and improved table formatting
- Nascimento (in a comment on biorxiv): Highlighted PDB code error for n-phenylglycinonitrile
- Jian Yin (UCSD, Gilson lab): Provided host-guest structures and input files for the host-guest sets described in the paper
- Jian Yin (UCSD, Gilson lab): Provided host-guest structures and input files for the CB7 and GDCC host-guest sets described in the paper

Please note that GitHub's automatic "contributors" list does not provide a full accounting of everyone contributing to this work, as some contributions have been received by e-mail or other mechanisms.

Expand All @@ -105,8 +107,14 @@ Please note that GitHub's automatic "contributors" list does not provide a full
- v1.2 ([10.5281/zenodo.839047](http://doi.org/10.5281/zenodo.839047)): Addition of bromodomain BRD4(1) test cases as a new ``soft'' benchmark, with help from Germano Heinzelmann. Addition of Heinzelmann as an author. Addition of files for BRD4(1) benchmark. Removed bromodomain material from future benchmarks in view of its presence now as a benchmark system.

## Changes not yet in a release
- Include cyclodextrin benchmarks to data and to paper; removal of most of cyclodextrin material from future benchmarks. Addition of Niel Henriksen as an author based on his work on this.

## Manifest

* paper: Provides LaTeX source files and final PDF for the current version of the manuscript (reformatted from the version submitted to Ann. Rev. and with 2D structures added to the tables); images, etc. are also available in sub-directories, as is the supporting information.
* input_files: Host-guest structures and simulation input files for the host-guest benchmark sets proposed in the paper (provided by Jian Yin from the Gilson lab)
* paper: Provides LaTeX source files and final PDF for the current version of the manuscript (reformatted and expanded from the version submitted to Ann. Rev. and with 2D structures added to the tables); images, etc. are also available in sub-directories, as is the supporting information.
* input_files: Ultimately to include structures and simulation input files for all of the benchmark systems present as well as (we hope) gold standard calculated values for these files. Currently this includes:
* README.md: A more extensive document describing the files present
* BRD4 structures and simulation input files from Germano Heinzelmann
* CB7 structures and simulation input files from Jian Yin (Gilson lab)
* GDCC structures and simulation input files from Jian Yin (Gilson lab)
* Cyclodextrin structures and simulation input files from Niel Henriksen (Gilson lab)
52 changes: 47 additions & 5 deletions input_files/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,60 @@
# Benchmark Set Input Files
# Benchmark Set Input Files and Supporting Data

This directory (and its subdirectories) provides structure and simulation input files for benchmark sets proposed in the associated perpetual review paper as well as (in some cases) data on additional supplementary compounds as well.

This file documents what files are present here and how they were generated.

## Manifest
- `BRD4`: BRD4-1 benchmarks as proposed in two Tables in the paper; provides its own `README.md` detailing its organization/contents/provenance.
- `cb7-set1` and `cb7-set2`: Proposed CB7 benchmark sets, with organization/contents/provenance information below.
- `cd-set1` and `cd-set2`: Proposed cyclodextrin benchmark sets (the first of which is on alpha cyclodextrin and the second on beta cyclodextrin), with organization/contents/provenance information below. **Additional, supplementary guests are also provided.** These also contain a machine-parseable `README.md` file with experimental binding free energies and enthalpies, with references.
- `gdcc-set1` and `gdcc-set2`: Proposed Gibb deep cavity cavitand benchmark sets, with organization/contents/provenance information below.

## File Descriptions
This set of files comprises PDB, sdf, and mol2 files for the free hosts and guests, as well as AMBER prmtop/rst7 format input files for the solvated and equilibrated host-guest complexes. The guest compounds in each set are named according to the compound ID listed in Tables 1-6 in the associated paper [1]. For instance, compound p-toluidine is located in the cb7-set2 subdirectory and named guest-9 because it is included in CB7 Set 2, and its ID in the paper is 9. The prmtop/rst7 files are named in the same way, except that both the host identifier name and guest ID are used for the filename. For cd-set1 and cd-set2, there are two sets of prmtop/rst7 files, one for each possible orientation of the guest in the cyclodextrin cavity. In addition, the cyclodextrin datasets also include files with an 's' character prior to the guest ID number (e.g., bcd-s15.pdb), which indicates that these are supplemental guests not listed in the associated paper which could be of additional interest.
This set of files comprises PDB, sdf, and mol2 files for the free hosts and guests, as well as AMBER prmtop/rst7 format input files for the solvated and equilibrated host-guest complexes.
The guest compounds in each set are named according to the compound ID listed in the corresponding Tables in the associated paper [1].
For instance, compound p-toluidine is located in the cb7-set2 subdirectory and named guest-9 because it is included in CB7 Set 2, and its ID in the paper is 9.
The prmtop/rst7 files are named in the same way, except that both the host identifier name and guest ID are used for the filename.
For cd-set1 and cd-set2, there are two sets of prmtop/rst7 files, one for each possible orientation of the guest in the cyclodextrin cavity.
In addition, the cyclodextrin datasets also include files with an 's' character prior to the guest ID number (e.g., bcd-s15.pdb), which indicates that these are supplemental guests not listed in the associated paper which could be of additional interest.

## CB7 Methods
The structures of the free CB7 host were initially obtained from the crystal structure [2] while all other unbound guest structures were built manually. The structues were then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussin09. The CB7 molecule has zero net charge, while the protonation states of the guests were predicted with the pKa plugin in the Marvin suite of programs [3]. Guest guest-18 in cb7-set1 is a special case, because it was predicted to have the protonated and unprotonated forms coexisting at the experimental pH value (4.74) [4] with a nearly 1:1 ratio. Therefore, files of both forms are provided, with guest-18 as the protonated form of the guest and guest-18b as the unprotonated form. For the simulation files, bonded and Lennard-Jones parameters were obtained from the general AMBER force field (GAFF v1.7) [5]. Partial charges for each atom were generated using the RESP procedure [6], as implemented in the Antechamber program [7], by fitting to electrostatic potentials grids generated during the QM minimization. The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8]. The binding pose was then solvated in a cubic box with 1500 TIP3P water molecules with sodium or chloride counterions added only for neutralization. Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9]. After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file.
The structures of the free CB7 host were initially obtained from the crystal structure [2] while all other unbound guest structures were built manually.
The structures were then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussian09.
The CB7 molecule has zero net charge, while the protonation states of the guests were predicted with the pKa plugin in the Marvin suite of programs [3].
Guest guest-18 in cb7-set1 is a special case, because it was predicted to have the protonated and unprotonated forms coexisting at the experimental pH value (4.74) [4] with a nearly 1:1 ratio.
Therefore, files of both forms are provided, with guest-18 as the protonated form of the guest and guest-18b as the unprotonated form.
For the simulation files, bonded and Lennard-Jones parameters were obtained from the general AMBER force field (GAFF v1.7) [5].
Partial charges for each atom were generated using the RESP procedure [6], as implemented in the Antechamber program [7], by fitting to electrostatic potentials grids generated during the QM minimization.
The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8].
The binding pose was then solvated in a cubic box with 1500 TIP3P water molecules with sodium or chloride counterions added only for neutralization.
Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9].
After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file.


## OA/TEMOA (GDCC) Methods
The structures of the free hosts OA and TEMOA, as well as of all unbound guests, were built manually and then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussin09. The OA and TEMOA hosts both were assigned net charges of -8au, based on the pH at which the experiments were conducted (9.2 and 11.5) [10, 11]. The protonation states of guests were predicted with the pKa plugin in the Marvin suite of programs [3]. For the simulation files, bonded and LJ force field parameters were taken from GAFF v1.7 and partial charges were determined using the RESP approach, in identical fashion to the CB7 method. The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8]. The binding pose was then solvated in a cubic box with 2100 TIP3P water molecules with sodium or chloride counterions added only for neutralization. Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9]. After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file.
The structures of the free hosts OA and TEMOA, as well as of all unbound guests, were built manually and then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussin09.
The OA and TEMOA hosts both were assigned net charges of -8au, based on the pH at which the experiments were conducted (9.2 and 11.5) [10, 11].
The protonation states of guests were predicted with the pKa plugin in the Marvin suite of programs [3].
For the simulation files, bonded and LJ force field parameters were taken from GAFF v1.7 and partial charges were determined using the RESP approach, in identical fashion to the CB7 method.
The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8].
The binding pose was then solvated in a cubic box with 2100 TIP3P water molecules with sodium or chloride counterions added only for neutralization.
Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9].
After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file.


## CD Methods
The stuctures of unbound alpha-cyclodextrin (aCD) and beta-cyclodextrin (bCD), as well as all guests, were built manually. Protonation states followed what was reported by Rekharsky et al. [12] at pH 6.9. The guest molecules were QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussian09. Partial charges, LJ paramters, and bonded parameters for the CD molecules were taken from the q4md-CD force field by Cézard et al. [13]. Guest partial charges were derived using the RESP method implemented in the R.E.D. Server tool [14], while LJ and bonded parameters were taken from GAFF v1.7. The AMBER simulation files consist of the host-guest complex, 1500 TIP3P waters, and three Na+ and Cl- ions in addition to any counterions required for neutralization. This roughly corresponds to the ionic strength of the 50 mmol phosphate buffer used in experiment. The solvated systems were equilibrated in the NPT ensemble with light (0.1 kcal/mol) positional restraints on the host and guest atoms. The final conformation of this equilibration step is provided here. Unrestrained equilibration and clustering was not performed for the cyclodextrin sets, in contrast to the CB7 and GDCC sets, because in some cases the guest binds weakly enough that it could leave the binding cavity for significant periods of time. To account for the two possible orientations of the guest within the CD cavity, simulation files with the '-p' suffix indicate that the guest is bound with the polar functional group oriented out of the primary (narrow) face of the CD, whereas the '-s' suffix indicates the guest polar functional group is oriented out of the secondary (wider) face of the CD.
The stuctures of unbound alpha-cyclodextrin (aCD) and beta-cyclodextrin (bCD), as well as all guests, were built manually.
Protonation states followed what was reported by Rekharsky et al. [12] at pH 6.9.
The guest molecules were QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussian09.
Partial charges, LJ paramters, and bonded parameters for the CD molecules were taken from the q4md-CD force field by Cézard et al. [13].
Guest partial charges were derived using the RESP method implemented in the R.E.D. Server tool [14], while LJ and bonded parameters were taken from GAFF v1.7.
The AMBER simulation files consist of the host-guest complex, 1500 TIP3P waters, and three Na+ and Cl- ions in addition to any counterions required for neutralization.
This roughly corresponds to the ionic strength of the 50 mmol phosphate buffer used in experiment.
The solvated systems were equilibrated in the NPT ensemble with light (0.1 kcal/mol) positional restraints on the host and guest atoms.
The final conformation of this equilibration step is provided here.
Unrestrained equilibration and clustering was not performed for the cyclodextrin sets, in contrast to the CB7 and GDCC sets, because in some cases the guest binds weakly enough that it could leave the binding cavity for significant periods of time.
To account for the two possible orientations of the guest within the CD cavity, simulation files with the '-p' suffix indicate that the guest is bound with the polar functional group oriented out of the primary (narrow) face of the CD, whereas the '-s' suffix indicates the guest polar functional group is oriented out of the secondary (wider) face of the CD.

## BRD4
For information on the BRD4 benchmark, see the associated `README.md` file in the BRD4 subdirectory.
Expand Down
Binary file modified paper/benchmarkset.pdf
Binary file not shown.
Loading