diff --git a/LICENSE b/LICENSE index f7a6d74..4b02958 100644 --- a/LICENSE +++ b/LICENSE @@ -1,4 +1,4 @@ -Copyright 2017, David L. Mobley and Michael K. Gilson +Copyright 2017, David L. Mobley, Germano Heinzelmann, Niel M. Henriksen and Michael K. Gilson Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: diff --git a/README.md b/README.md index 96aee39..886f111 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Benchmark sets for free energy calculations -This repository relates to the *perpetual review* ([definition](https://arxiv.org/abs/1502.01329)) paper called "[Predicting binding free energies: Frontiers and benchmarks](https://github.com/MobleyLab/benchmarksets/blob/master/paper/benchmarkset.pdf)" by David L. Mobley and Michael K. Gilson. +This repository relates to the *perpetual review* ([definition](https://arxiv.org/abs/1502.01329)) paper called "[Predicting binding free energies: Frontiers and benchmarks](https://github.com/MobleyLab/benchmarksets/blob/master/paper/benchmarkset.pdf)" by David L. Mobley, Germano Heinzelmann, Niel M. Henriksen, and Michael K. Gilson. The repository's focus is benchmark sets for binding free energy calculations, including the perpetual review paper, but also supporting files and other materials relating to free energy benchmarks. Thus, the repository includes not only the perpetual review paper but also further discussion, datasets, and (hopefully ultimately) standards for datasets and data deposition. @@ -45,13 +45,14 @@ Currently proposed benchmark sets are detailed in [the paper](https://github.com * Host guest systems * CB7 * Gibb deep cavity cavitands (GDCCs) OA and TEMOA + * Cyclodextrins (alpha and beta) * Lysozyme model binding sites * apolar L99A * polar L99A/M102Q +* Bromodomain BRD4-1 Other near-term candidates include: * Thrombin -* Bromodomains * Suggest and vote on your favorites via a feature request below Community involvement is needed to pick and advance the best benchmarks. @@ -77,6 +78,7 @@ We also welcome contributions to the material which is already here to extend it ## Authors - David L. Mobley (UCI) - Germano Heinzelmann (Universidade Federal de Santa Catarina) +- Niel M. Henriksen (UCSD) - Michael K. Gilson (UCSD) Your name, too, can go here if you help us substantially revise/extend the paper. @@ -88,7 +90,7 @@ We want to thank the following people who contributed to this repository and the - David Slochower (UCSD, Gilson lab): Grammar corrections and improved table formatting - Nascimento (in a comment on biorxiv): Highlighted PDB code error for n-phenylglycinonitrile -- Jian Yin (UCSD, Gilson lab): Provided host-guest structures and input files for the host-guest sets described in the paper +- Jian Yin (UCSD, Gilson lab): Provided host-guest structures and input files for the CB7 and GDCC host-guest sets described in the paper Please note that GitHub's automatic "contributors" list does not provide a full accounting of everyone contributing to this work, as some contributions have been received by e-mail or other mechanisms. @@ -105,8 +107,14 @@ Please note that GitHub's automatic "contributors" list does not provide a full - v1.2 ([10.5281/zenodo.839047](http://doi.org/10.5281/zenodo.839047)): Addition of bromodomain BRD4(1) test cases as a new ``soft'' benchmark, with help from Germano Heinzelmann. Addition of Heinzelmann as an author. Addition of files for BRD4(1) benchmark. Removed bromodomain material from future benchmarks in view of its presence now as a benchmark system. ## Changes not yet in a release +- Include cyclodextrin benchmarks to data and to paper; removal of most of cyclodextrin material from future benchmarks. Addition of Niel Henriksen as an author based on his work on this. ## Manifest -* paper: Provides LaTeX source files and final PDF for the current version of the manuscript (reformatted from the version submitted to Ann. Rev. and with 2D structures added to the tables); images, etc. are also available in sub-directories, as is the supporting information. -* input_files: Host-guest structures and simulation input files for the host-guest benchmark sets proposed in the paper (provided by Jian Yin from the Gilson lab) +* paper: Provides LaTeX source files and final PDF for the current version of the manuscript (reformatted and expanded from the version submitted to Ann. Rev. and with 2D structures added to the tables); images, etc. are also available in sub-directories, as is the supporting information. +* input_files: Ultimately to include structures and simulation input files for all of the benchmark systems present as well as (we hope) gold standard calculated values for these files. Currently this includes: + * README.md: A more extensive document describing the files present + * BRD4 structures and simulation input files from Germano Heinzelmann + * CB7 structures and simulation input files from Jian Yin (Gilson lab) + * GDCC structures and simulation input files from Jian Yin (Gilson lab) + * Cyclodextrin structures and simulation input files from Niel Henriksen (Gilson lab) diff --git a/input_files/README.md b/input_files/README.md index d276d69..b15ef6e 100644 --- a/input_files/README.md +++ b/input_files/README.md @@ -1,18 +1,60 @@ -# Benchmark Set Input Files +# Benchmark Set Input Files and Supporting Data + +This directory (and its subdirectories) provides structure and simulation input files for benchmark sets proposed in the associated perpetual review paper as well as (in some cases) data on additional supplementary compounds as well. + +This file documents what files are present here and how they were generated. + +## Manifest +- `BRD4`: BRD4-1 benchmarks as proposed in two Tables in the paper; provides its own `README.md` detailing its organization/contents/provenance. +- `cb7-set1` and `cb7-set2`: Proposed CB7 benchmark sets, with organization/contents/provenance information below. +- `cd-set1` and `cd-set2`: Proposed cyclodextrin benchmark sets (the first of which is on alpha cyclodextrin and the second on beta cyclodextrin), with organization/contents/provenance information below. **Additional, supplementary guests are also provided.** These also contain a machine-parseable `README.md` file with experimental binding free energies and enthalpies, with references. +- `gdcc-set1` and `gdcc-set2`: Proposed Gibb deep cavity cavitand benchmark sets, with organization/contents/provenance information below. ## File Descriptions -This set of files comprises PDB, sdf, and mol2 files for the free hosts and guests, as well as AMBER prmtop/rst7 format input files for the solvated and equilibrated host-guest complexes. The guest compounds in each set are named according to the compound ID listed in Tables 1-6 in the associated paper [1]. For instance, compound p-toluidine is located in the cb7-set2 subdirectory and named guest-9 because it is included in CB7 Set 2, and its ID in the paper is 9. The prmtop/rst7 files are named in the same way, except that both the host identifier name and guest ID are used for the filename. For cd-set1 and cd-set2, there are two sets of prmtop/rst7 files, one for each possible orientation of the guest in the cyclodextrin cavity. In addition, the cyclodextrin datasets also include files with an 's' character prior to the guest ID number (e.g., bcd-s15.pdb), which indicates that these are supplemental guests not listed in the associated paper which could be of additional interest. +This set of files comprises PDB, sdf, and mol2 files for the free hosts and guests, as well as AMBER prmtop/rst7 format input files for the solvated and equilibrated host-guest complexes. +The guest compounds in each set are named according to the compound ID listed in the corresponding Tables in the associated paper [1]. +For instance, compound p-toluidine is located in the cb7-set2 subdirectory and named guest-9 because it is included in CB7 Set 2, and its ID in the paper is 9. +The prmtop/rst7 files are named in the same way, except that both the host identifier name and guest ID are used for the filename. +For cd-set1 and cd-set2, there are two sets of prmtop/rst7 files, one for each possible orientation of the guest in the cyclodextrin cavity. +In addition, the cyclodextrin datasets also include files with an 's' character prior to the guest ID number (e.g., bcd-s15.pdb), which indicates that these are supplemental guests not listed in the associated paper which could be of additional interest. ## CB7 Methods -The structures of the free CB7 host were initially obtained from the crystal structure [2] while all other unbound guest structures were built manually. The structues were then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussin09. The CB7 molecule has zero net charge, while the protonation states of the guests were predicted with the pKa plugin in the Marvin suite of programs [3]. Guest guest-18 in cb7-set1 is a special case, because it was predicted to have the protonated and unprotonated forms coexisting at the experimental pH value (4.74) [4] with a nearly 1:1 ratio. Therefore, files of both forms are provided, with guest-18 as the protonated form of the guest and guest-18b as the unprotonated form. For the simulation files, bonded and Lennard-Jones parameters were obtained from the general AMBER force field (GAFF v1.7) [5]. Partial charges for each atom were generated using the RESP procedure [6], as implemented in the Antechamber program [7], by fitting to electrostatic potentials grids generated during the QM minimization. The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8]. The binding pose was then solvated in a cubic box with 1500 TIP3P water molecules with sodium or chloride counterions added only for neutralization. Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9]. After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file. +The structures of the free CB7 host were initially obtained from the crystal structure [2] while all other unbound guest structures were built manually. +The structures were then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussian09. +The CB7 molecule has zero net charge, while the protonation states of the guests were predicted with the pKa plugin in the Marvin suite of programs [3]. +Guest guest-18 in cb7-set1 is a special case, because it was predicted to have the protonated and unprotonated forms coexisting at the experimental pH value (4.74) [4] with a nearly 1:1 ratio. +Therefore, files of both forms are provided, with guest-18 as the protonated form of the guest and guest-18b as the unprotonated form. +For the simulation files, bonded and Lennard-Jones parameters were obtained from the general AMBER force field (GAFF v1.7) [5]. +Partial charges for each atom were generated using the RESP procedure [6], as implemented in the Antechamber program [7], by fitting to electrostatic potentials grids generated during the QM minimization. +The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8]. +The binding pose was then solvated in a cubic box with 1500 TIP3P water molecules with sodium or chloride counterions added only for neutralization. +Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9]. +After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file. ## OA/TEMOA (GDCC) Methods -The structures of the free hosts OA and TEMOA, as well as of all unbound guests, were built manually and then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussin09. The OA and TEMOA hosts both were assigned net charges of -8au, based on the pH at which the experiments were conducted (9.2 and 11.5) [10, 11]. The protonation states of guests were predicted with the pKa plugin in the Marvin suite of programs [3]. For the simulation files, bonded and LJ force field parameters were taken from GAFF v1.7 and partial charges were determined using the RESP approach, in identical fashion to the CB7 method. The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8]. The binding pose was then solvated in a cubic box with 2100 TIP3P water molecules with sodium or chloride counterions added only for neutralization. Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9]. After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file. +The structures of the free hosts OA and TEMOA, as well as of all unbound guests, were built manually and then QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussin09. +The OA and TEMOA hosts both were assigned net charges of -8au, based on the pH at which the experiments were conducted (9.2 and 11.5) [10, 11]. +The protonation states of guests were predicted with the pKa plugin in the Marvin suite of programs [3]. +For the simulation files, bonded and LJ force field parameters were taken from GAFF v1.7 and partial charges were determined using the RESP approach, in identical fashion to the CB7 method. +The starting bound configuration of each host-guest pair was generated by docking the guests into the hosts with MOE [8]. +The binding pose was then solvated in a cubic box with 2100 TIP3P water molecules with sodium or chloride counterions added only for neutralization. +Counterions were modeled with the TIP3P-specific sodium parameters of Joung and Cheatham [9]. +After an equilibration phase, an NVT simulation of 2 ns was carried out, and the frame with the most populated configuration, determined via clustering, was selected as the simulation input file. ## CD Methods -The stuctures of unbound alpha-cyclodextrin (aCD) and beta-cyclodextrin (bCD), as well as all guests, were built manually. Protonation states followed what was reported by Rekharsky et al. [12] at pH 6.9. The guest molecules were QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussian09. Partial charges, LJ paramters, and bonded parameters for the CD molecules were taken from the q4md-CD force field by Cézard et al. [13]. Guest partial charges were derived using the RESP method implemented in the R.E.D. Server tool [14], while LJ and bonded parameters were taken from GAFF v1.7. The AMBER simulation files consist of the host-guest complex, 1500 TIP3P waters, and three Na+ and Cl- ions in addition to any counterions required for neutralization. This roughly corresponds to the ionic strength of the 50 mmol phosphate buffer used in experiment. The solvated systems were equilibrated in the NPT ensemble with light (0.1 kcal/mol) positional restraints on the host and guest atoms. The final conformation of this equilibration step is provided here. Unrestrained equilibration and clustering was not performed for the cyclodextrin sets, in contrast to the CB7 and GDCC sets, because in some cases the guest binds weakly enough that it could leave the binding cavity for significant periods of time. To account for the two possible orientations of the guest within the CD cavity, simulation files with the '-p' suffix indicate that the guest is bound with the polar functional group oriented out of the primary (narrow) face of the CD, whereas the '-s' suffix indicates the guest polar functional group is oriented out of the secondary (wider) face of the CD. +The stuctures of unbound alpha-cyclodextrin (aCD) and beta-cyclodextrin (bCD), as well as all guests, were built manually. +Protonation states followed what was reported by Rekharsky et al. [12] at pH 6.9. +The guest molecules were QM energy minimized in vacuo using the HF/6-31G(d) method in Gaussian09. +Partial charges, LJ paramters, and bonded parameters for the CD molecules were taken from the q4md-CD force field by Cézard et al. [13]. +Guest partial charges were derived using the RESP method implemented in the R.E.D. Server tool [14], while LJ and bonded parameters were taken from GAFF v1.7. +The AMBER simulation files consist of the host-guest complex, 1500 TIP3P waters, and three Na+ and Cl- ions in addition to any counterions required for neutralization. +This roughly corresponds to the ionic strength of the 50 mmol phosphate buffer used in experiment. +The solvated systems were equilibrated in the NPT ensemble with light (0.1 kcal/mol) positional restraints on the host and guest atoms. +The final conformation of this equilibration step is provided here. +Unrestrained equilibration and clustering was not performed for the cyclodextrin sets, in contrast to the CB7 and GDCC sets, because in some cases the guest binds weakly enough that it could leave the binding cavity for significant periods of time. +To account for the two possible orientations of the guest within the CD cavity, simulation files with the '-p' suffix indicate that the guest is bound with the polar functional group oriented out of the primary (narrow) face of the CD, whereas the '-s' suffix indicates the guest polar functional group is oriented out of the secondary (wider) face of the CD. ## BRD4 For information on the BRD4 benchmark, see the associated `README.md` file in the BRD4 subdirectory. diff --git a/paper/benchmarkset.pdf b/paper/benchmarkset.pdf index 9db2599..cdb8bfc 100644 Binary files a/paper/benchmarkset.pdf and b/paper/benchmarkset.pdf differ diff --git a/paper/benchmarkset.tex b/paper/benchmarkset.tex index 2014796..ff4842f 100644 --- a/paper/benchmarkset.tex +++ b/paper/benchmarkset.tex @@ -55,6 +55,10 @@ \author{Germano Heinzelmann} \email{germano.heinzelmann@ufsc.br} \affiliation{Departamento de F\'isica, Universidade Federal de Santa Catarina, Florian\'opolis, Santa Catarina, Brazil 88040-900} +\author{Niel M. Henriksen} +\email{nhenriksen@ucsd.edu} +\affiliation{Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA, 92092} +\affiliation{Departamento de F\'isica, Universidade Federal de Santa Catarina, Florian\'opolis, Santa Catarina, Brazil 88040-900} \author{Michael K. Gilson} \email{mgilson@ucsd.edu} \affiliation{Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, CA, USA, 92092} @@ -79,7 +83,7 @@ %PUT MANUSCRIPT VERSION HERE % Use this style for ongoing development: "Version 1.0.5 pre-release" % Use this style for releases: "Version 2.0" -{\bf Manuscript version 1.2.1 pre-release.} See \url{https://github.com/mobleylab/benchmarksets} for all versions. +{\bf Manuscript version 1.3 pre-release.} See \url{https://github.com/mobleylab/benchmarksets} for all versions. %{\bf Manuscript version 1.2.} See \url{https://github.com/mobleylab/benchmarksets} for all versions. \\ \\ @@ -326,8 +330,8 @@ \subsection{Host-guest benchmarks} For example, all members of the cyclodextrin family are chiral rings of glucose monomers; family members then differ in the number of monomers and in the presence or absence of various chemical substituents. For tests of computational methods ultimately aimed at predicting protein-ligand binding affinities in aqueous solution, water soluble hosts are, arguably, most relevant. On the other hand, host-guest systems in organic solvents may usefully test how well force fields work in the nonaqueous environment within a lipid membrane. -Here, we focus on two host families, the cucurbiturils \cite{freeman_cucurbituril_1981,mock_host-guest_1983} and the octa-acids (more generally, Gibb deep cavity cavitands)~\cite{gibb_well-defined_2004, hillyer_synthesis_2016}. -These have already been the subject of concerted attention from the simulation community, due in part to their use in the SAMPL blinded prediction challenges~\cite{muddana_sampl3_2012, muddana_sampl4_2014, yin_overview_2016}. +Here, we focus on three host families, the cucurbiturils \cite{freeman_cucurbituril_1981,mock_host-guest_1983}, the octa-acids (more generally, Gibb deep cavity cavitands (GDCCs))~\cite{gibb_well-defined_2004, hillyer_synthesis_2016}, and the cyclodextrins~\cite{rekharsky_complexation_1998}. +These have already been the subject of concerted attention from the simulation community; in the case of cucubiturils GDCCs due in part to their use in the SAMPL blinded prediction challenges~\cite{muddana_sampl3_2012, muddana_sampl4_2014, yin_overview_2016}, and in the case of cyclodextrins, due in part to the wealth of high quality experimental data available~\cite{rekharsky_complexation_1998, Wickstrom:2013:J.Chem.TheoryComput., henriksen_evaluating_2017}. \begin{figure*} \includegraphics[width=\textwidth]{figures/hosts-CB7-OA-bCD.pdf} @@ -366,7 +370,6 @@ \subsubsection{Cucurbiturils} \item{{\bf Salt concentration and buffer conditions}: Binding thermodynamics are sensitive to the composition of dissolved salts, both experimentally~\cite{moghaddam_new_2011, moghaddam_hostguest_2009, muddana_sampl4_2014} and computationally~\cite{muddana_blind_2014,hsiao_prediction_2014}. As a consequence, to be valid, a comparison of calculation with experiment must adequately model the experimental salt conditions.} \item{{\bf Finite-size artifacts due to charge modification}: Because many guest molecules carry net charge, it should be ascertained that calculations in which guests are decoupled from the system do not generate artifacts related to the treatment of long-ranged Coulombic interactions~\cite{rocklin_calculating_2013, lin_overview_2014, reif_net_2014, simonson_concepts_2016}.} -%Re-order these based on generality? i.e. #1 is not that relevant to some. \end{enumerate} \paragraph{The proposed CB7 benchmark sets comprise two compound series} @@ -607,22 +610,22 @@ \subsubsection{Cyclodextrins} We propose a benchmark series which includes both $\alpha$- and $\beta$-CD binding to a set of neutral and charged guests. The CD host has many experimental upsides, including adequate solubility, at least for these common variants, and they are available in gram quantities at low cost from multiple suppliers. -Additionally, CDs are familiar host-guest systems to many computational researchers and are known to tractable in simulation \cite{Mark:1994:J.Am.Chem.Soc., Luzhkov:1999:ChemicalPhysicsLetters, Bea:2002:TheorChemAcc, Chen:2004:BiophysicalJournal, Sellner:2008:J.Phys.Chem.B, Cai:2009:J.Phys.Chem.B, Wickstrom:2013:J.Chem.TheoryComput., Shi:2014:TheorChemAcc, henriksen_computational_2015, Zhang:2015:J.Chem.TheoryComput., Khuntawee:2016:CarbohydratePolymers, Gebhardt:2016:FluidPhaseEquilibria, wickstrom_parameterization_2016}. +Additionally, CDs are familiar host-guest systems to many computational researchers and are known to be tractable for simulation \cite{Mark:1994:J.Am.Chem.Soc., Luzhkov:1999:ChemicalPhysicsLetters, Bea:2002:TheorChemAcc, Chen:2004:BiophysicalJournal, Sellner:2008:J.Phys.Chem.B, Cai:2009:J.Phys.Chem.B, Wickstrom:2013:J.Chem.TheoryComput., Shi:2014:TheorChemAcc, henriksen_computational_2015, Zhang:2015:J.Chem.TheoryComput., Khuntawee:2016:CarbohydratePolymers, Gebhardt:2016:FluidPhaseEquilibria, wickstrom_parameterization_2016}. However, CDs offer a contrast to the previous two benchmark series in a number of ways. First, the hosts are more flexible than CB7 or the GDCCs, as their glucose units are joined by only a single bond, which allows greater rotation of each monomer relative to its neighbors. -Second, the binding cavity has two, non-equivalent entryways, allowing asymmetric guests to bind in either a ``head-in'' or ``head-out'' orientation. +Second, the binding cavity has two, non-equivalent openings or ``rims'', allowing asymmetric guests to bind in either a ``head-in'' or ``head-out'' orientation. It is also worth noting that, because the CDs are glucose polymers, they may be best modeled with dedicated carbohydrate force fields \cite{Kirschner:2008:J.Comput.Chem., Cezard:2011:PhysicalChemistryChemicalPhysics, Guvench:2011:J.Chem.TheoryComput., Xiong:2015:CarbohydrateResearch}, rather than generalized small molecule force fields. These features combine to add complexity in predicting the dominant binding poses for guests, although the default assumption is that polar functional groups will orient towards solvent near the wider rim (also called the secondary face) and hydrophobic groups will nestle into the hydrophobic cavity of the narrow rim (also called the primary face). \paragraph{The proposed CD benchmark series includes an $\alpha$-CD set and a $\beta$-CD set} For each CD set (Tables~\ref{cd_benchmark1} and \ref{cd_benchmark2}), we have selected guests which include n-alkyl ammonium cations, neutral cyclic alcohols, and n-alkyl or aromatic carboxylate anions. -The experiments were all taken from the same ITC study \cite{rekharsky_thermodynamic_1997} and were performed under uniform conditions (50 mM sodium phosphate buffer, pH 6.90, 298K). +The experimental data all originates from the same ITC study \cite{rekharsky_thermodynamic_1997} and experiments were performed under uniform conditions (50 mM sodium phosphate buffer, pH 6.90, 298K). The buffer constituents are not expected to compete for binding in the CD cavity \cite{rekharsky_thermodynamic_1995}. Both binding free energy and binding enthalpy measurements are available for all guests in the series. The binding free energies range from -1.27 to -4.62 kcal/mol and the binding enthalpies range from 1.89 to -5.46 kcal/mol. In the case of $\alpha$-CD and its guests, the binding enthalpies are uniformly negative and therefore the binding entropies are uniformly positive. In contrast, the $\beta$-CD set includes both positive and negative binding enthalpies and binding entropies. -In addition to the formal benchmark sets listed in Tables~\ref{cd_benchmark1} and \ref{cd_benchmark2}, which only include a representative sample of eight and twelve guests, respectively, we have also made available (in our GitHub repository) an expanded set of guests which feature minor variations in the carbon substituents to molecule classes mentioned above. +In addition to the formal benchmark sets listed in Tables~\ref{cd_benchmark1} and \ref{cd_benchmark2}, which only include a representative sample of eight and twelve guests, respectively, we have also made available (in our GitHub repository) an expanded set of guests which differ only slightly from the molecule classes mentioned above, by minor variations in the carbon substituents. Despite the flexibility of CD hosts, the small size of the guests combined with long timescale simulations enable by GPUs should allow adequate convergence of these thermodynamic values. @@ -679,13 +682,13 @@ \subsubsection{Cyclodextrins} \endgroup \paragraph{Challenges specific to CDs} -Due to the dynamic nature of the CD host, along with the asymmetric binding site, researchers should anticipate the following complexities: +Researchers should anticipate the following complexities due to the dynamic nature of the CD host and its asymmetric binding site: \begin{enumerate} \item{{\bf Multiple binding orientations}: Unlike CB7 or OA, CDs have a binding cavity with two unique openings to solvent. Although the default expectation is that a guest's polar functional group will orient out of the wider (secondary) rim of the cavity~\cite{rekharsky_thermodynamic_1997, rekharsky_complexation_1998}, and hydrophobic groups will favor the narrower (primary) side, it raises the possibility of the reverse type of binding or a mixture of both. Since the cavity is both narrow and hydrophobic, interchange between the two orientations is usually slow and requires release into the solvent and subsequent rebinding. -Therefore unless frequent interchange between the two orientations is observed during simulation, it is advisable to perform free energy calculations for both orientations and then combine the results according to their Boltzmann contribution \cite{henriksen_computational_2015}.} +Therefore unless frequent interchange between the two orientations is observed during simulation, it may be advisable to perform free energy calculations for both orientations and then combine the results according to their Boltzmann weights~\cite{henriksen_computational_2015}.} \item{{\bf Monomer flexibility}: Cyclodextrin molecules are known to be rather flexible~\cite{bell_new_1997, Connors:1997:Chem.Rev.}. In particular, each glucose monomer can flip conformation along the glycosidic bond, which tends to partially occlude the binding cavity. @@ -693,40 +696,32 @@ \subsubsection{Cyclodextrins} For affinity calculations which rely on physical or alchemical methods, the portion of the free energy pathway in which the guest is only partially interacting with the host is expected to be the most problematic. Conformational restraints, if properly accounted for, can help mitigate these convergence challenges~\cite{henriksen_computational_2015}.} -\item{{\bf Buffer Conditions}: CDs are not expected to bind phosphate ions from the buffer with detectable affinity~\cite{rekharsky_thermodynamic_1995, rekharsky_complexation_1998}, however, due to limitations in force fields, it is possible that the guest and buffer could compete for binding in simulation. -The solution behavior of phosphate ions, particularly the HPO$_4^{2-}$ species, can be sensitive to the particular choice of water model and may be worthwhile investigating prior to extensive binding calculations. -Alternatively, one can substitute phosphate buffer for the equivalent ionic strength of monovalent salt~\cite{henriksen_evaluating_2017}. } +\item{{\bf Buffer conditions}: CDs are not expected to bind phosphate ions from the buffer with detectable affinity~\cite{rekharsky_thermodynamic_1995, rekharsky_complexation_1998}; however, due to force field limitations, it is possible that the guest and buffer could compete for binding in simulations. +The solution behavior of phosphate ions, particularly the HPO$_4^{2-}$ species, can be sensitive to the particular choice of water model and may be worth investigating prior to extensive binding calculations. +Alternatively, one could substitute an equivalent strength of monovalent salt for the phosphate buffer~\cite{henriksen_evaluating_2017}. } -\item{{\bf Finite-size artifacts due to charge modification}: Once again, the presence of charged guests in the CD benchmarks necessitates care with regard to charge decoupling procedures which could generate artifacts in the long range Couloumb calculations ~\cite{rocklin_calculating_2013, lin_overview_2014, reif_net_2014, simonson_concepts_2016}. } +\item{{\bf Finite-size artifacts due to charge modification}: Once again, the presence of charged guests in the CD benchmarks necessitates care with regard to charge decoupling procedures which could generate artifacts in the long range Couloumb calculations~\cite{rocklin_calculating_2013, lin_overview_2014, reif_net_2014, simonson_concepts_2016}. } \end{enumerate} \paragraph{Prior studies provide additional insight into the challenges of CD} -Although experimental literature for CD binding is widely available, computational studies including more than just a few guests are rare and, since CDs have not been included in any SAMPL challenges, the unique properties of this host might not be as familiar as CB7 and OA. -As with these other hosts, the choice of buffer influences the binding measurements since any organic components will likely competitively bind. -Phosphate buffer, which is used in both of the proposed CD benchmarks, did not show any evidence of binding to the host in ITC measurements, whereas acetate buffer demonstrated competitive activity~\cite{rekharsky_thermodynamic_1995, rekharsky_complexation_1998}. -Ideally, simulations of these systems would match the experimental conditions, however phosphate buffer can be difficult to model, particularly the HPO$_4^{2-}$ species~\cite{henriksen_computational_2015, henriksen_evaluating_2017}. -One remedy for this problem is to match the ionic strength of the phosphate buffer with monovalent salt \cite{henriksen_evaluating_2017}, although the impact of this approximation is not fully understood. +Although experimental literature for CD binding is widely available, computational studies including more than just a few guests are rare and, since CDs have not been included in any SAMPL challenges, the unique properties of this host might not be as familiar to many as those of CB7 and OA. +As with these other hosts, the choice of buffer influences the binding measurements since any organic buffer components will likely competitively bind. +Phosphate buffer, which is used in both of the proposed CD benchmarks, did not show any evidence of binding to the host (as measured by ITC), whereas acetate buffer demonstrated competitive activity~\cite{rekharsky_thermodynamic_1995, rekharsky_complexation_1998}. +Ideally, simulations of these systems would match the experimental conditions, however, phosphate buffer can be difficult to model accurately, particularly the HPO$_4^{2-}$ species~\cite{henriksen_computational_2015, henriksen_evaluating_2017}. +One potential remedy to this problem is to match the ionic strength of the phosphate buffer with monovalent salt~\cite{henriksen_evaluating_2017}, although the impact of this approximation is not fully understood. -One often cited driving force for guests binding to CDs is the expulsion of structured water from the unoccupied host cavity. +Expulsion of structured water from the unoccupied host cavity is one of the often-cited driving forces for guests binding to CDs. Although the size of this effect is debated~\cite{Connors:1997:Chem.Rev., taulier_hydrophobic_2006}, it is generally accepted that approximately two to ten water molecules occupy the unbound cavity, depending on how the cavity is defined and whether the host is $\alpha$- or $\beta$-CD \cite{Connors:1997:Chem.Rev., biedermann_hydrophobic_2014}. -In simulation, the choice of host force field can play a large role in the appearance of structured water. +In simulations, the choice of host force field can play a large role in the appearance of structured water. If the glycosidic torsional barriers are low, the CD host can contort itself in such a way as to expel these presumably high energy waters~\cite{henriksen_evaluating_2017}. -There are a some recent large-scale computational studies of CD binding free energies which deserve mention. -One study, from the Levy group~\cite{Wickstrom:2013:J.Chem.TheoryComput.}, used the BEDAM method~\cite{gallicchio_bedam_2015-1} to compute binding free energies for 57 guests to $\beta$-CD. -The method uses an implicit representation of solvent, although guest occupancy of water sites in the CD cavity are incorporated into the model. -The overall agreement with experiment was mediocre, with an RMSE value of 1.44 kcal/mol and R$^2$ value of 0.43. -Levy and co-workers noted that some guests, particularly those with ammoniums, oriented their polar group out the narrow, primary rim of the CD cavity. -Another study, by the van der Spoel group~\cite{zhang_evaluation_2016}, computed umbrella sampling PMFs for removal of 75 neutral guests from $\beta$-CD using several Generalized Born implicit solvent models. -These solvent models did not include consideration of structured water and the results were sobering: across all solvent models, RMSEs ranged between 1.9 - 4.8 kcal/mol and R$^2$ values were less than 0.3. -The authors noted that 70\% of the guests preferred to orient the guest polar group through the wider, secondary opening of the host cavity, suggesting that although this orienatation is preferred, the primary orientation must be considered as well. -Finally, a recent study by the Gilson lab~\cite{henriksen_evaluating_2017} used the attach-pull-release method~\cite{henriksen_computational_2015} to compute explicitly solvated binding free energies and enthalpies for 43 host-guest pairs, including both $\alpha$- and $\beta$-CD hosts, for which experimental calorimetry measurements are available~\cite{rekharsky_thermodynamic_1997}. -The guests comprised three classes, including alkyl-ammoniums, cyclic alcohols, and alkyl- and aromatic-carboxylates. -The two CD benchmark sets in this work were derived this study. -Several water models and host force fields were investigated, but none of the force field combinations proved superior at both binding free energy and enthalpy. -Across force fields, the RMSE values for binding free energy ranged from 0.9 to 1.8 kcal/mol and R$^2$ from 0.44 to 0.60. -For binding enthalpy, RMSE values ranged from 0.9 to 4.0 kcal/mol and R$^2$ from 0.30 to 0.77. -Once again, a preference of ammonium guests to orient out of the narrow, primary opening was observed. It remains to be determined whether experimental evidence will support this preference. +%DLM: This may be too extensive an overview of prior computational work; we ended up electing (in the other sections) to focus on major issues highlighted by the computational work and NOT to overview individual studies, as there are too many. In other words, if, in the other sections, we provided the level of detail which is given here, the paper would be far far too long. I suggest making this more concise and focusing primarily on the conclusions, giving references. +%Proposed condensed version -- I tried to hit on the main take-away messages, but feel free to adjust or make your own condensed version. +There are a some recent large-scale computational studies of CD binding free energies which provide further insights into what performance and challenges to expect. +With implicit solvent models, it appears particularly important to include treatment of water sites in the CD cavity; depending on how these sites are handled and the solvent model, RMS errors can be as low as 1.44 kcal/mol (with an $R^2$ of 0.43)~\cite{Wickstrom:2013:J.Chem.TheoryComput.} or as high as 1.9-4.8 kcal/mol ($R^2$ of less than 0.3)~\cite{zhang_evaluation_2016}, with these studies examining 57 and 75 guests, respectively. +An extensive explicit solvent study compared a variety of different force fields and water models, with RMS errors of 0.9 to 1.8 kcal/mol ($R^2$ of 0.44 to 0.6) ans also examined accuracy on binding enthalpies~\cite{henriksen_evaluating_2017}. +Overall, guests with ammoniums appear to prefer to orient their polar groups out of the narrow, primary opening of the CD cavity~\cite{Wickstrom:2013:J.Chem.TheoryComput., henriksen_evaluating_2017} (though this is yet to be determined experimentally) but one study suggests that as many as 70\% of neutral, polar guests prefer to orient their polar group toward the secondary, wider opening~\cite{zhang_evaluation_2016}, so both orientations will likely need to be considered. + To aid the adoption of these systems as benchmarks, input files for the CD systems proposed here are available in our GitHub repository. @@ -1019,12 +1014,12 @@ \section{FUTURE BENCHMARK SYSTEMS} The following subsections discuss additional systems that have already been used to test computational methods and that may be suited for development as new benchmark systems in the near future. \subsection{Host-Guest Systems} -The cyclodextrins (CDs; Section~\ref{sec:hgbenchmarks}) are a particularly promising source of additional host-guest benchmark sets. -The CDs also appear easier to derivatize than the cucurbiturils. +Depending on whether the existing host-guest systems end up being sufficient for the community's needs, there may be further opportunities for host-guest benchmark systems. +The cyclodextrins (CDs; Section~\ref{sec:hgbenchmarks}) are a particularly promising source of additional host-guest benchmark sets; one interesting possibility may be new CD derivatives, as the CDs appear easier to derivatize than the cucurbiturils. In particular, varied substituents may be appended by reactions involving secondary hydroxyls at the wide entry and primary hydroxyls at the narrow entry~\cite{Fromming:1994:CyclodextrinsinPharmacy, Dodziuk:2006:, Szente:1999:AdvancedDrugDeliveryReviews, Qu:2002:JournalofInclusionPhenomena, Jindrich:2005:J.Org.Chem.}, with the caveat that generating pure products can be difficult, because there are so many hydroxyls that may be modified. Binding data on CD derivatives could be quite useful as a means of adding chemical diversity to host-guest benchmark sets, and such data are already available for some derivatives (see, e.g., \cite{rekharsky_complexation_1998, Faugeras:2012:Eur.J.Org.Chem.}). -A number of additional CD (and other host guest) binding systems are available for download in electronic format at the BindingDB~\cite{Liu:2007:Nucl.AcidsRes., Gilson:2016:Nucl.AcidsRes.} website (\url{www.bindingdb.org/bind/HostGuest.jsp}) +A number of additional CD (and other host-guest) binding systems are available for download in electronic format at the BindingDB~\cite{Liu:2007:Nucl.AcidsRes., Gilson:2016:Nucl.AcidsRes.} website (\url{www.bindingdb.org/bind/HostGuest.jsp}) \subsection{Protein-Ligand Systems} On the order of a million experimental protein-ligand binding measurements are currently accessible through open-access databases, notably BindingDB~\cite{Liu:2007:Nucl.AcidsRes.}, ChEMBL~\cite{Bento:2014:NuclAcidsRes} and PubChem~\cite{Kim:2016:NucleicAcidsRes.}.