Conversation
…ration is still in progress and will be updated)
CD mol2/pdb/sd files coordinates starting in the binding pocket
|
It looks like a lot of those |
|
Also, if those antechamber-format 500+ files is also a lot. Many of these formats ( |
|
The original CB7 and OA files were added by @Janeyin600, so I followed her format for the CDs. I agree that the current mol2 format is not ideal for universal usage. I like the idea of putting them all into a single file. The GAFF version is mentioned in the README (GAFF v1.7). |
|
Thanks, @nhenriksen ! I'm super swamped right now because of a couple of deadlines early next week, but hopefully can review this in detail next week. In terms of mol2 files -- I think I agree it would be good if we provided GAFF and "standard" mol2 files of everything. OpenEye tools can do this conversion easily. If you're on board with that, I could add a script for it to the repo and probably handle conversions of everything (or you can do it if you guys have your OE license renewed yet). Thoughts? |
|
@davidlmobley I think I can get this done today. |
|
I've redone all the CD benchmark files. All the mol2 files use SYBYL atom types now. They are all generated from the original bound structures (secondary orientation), which means that all guest coordinates fit into the host coordinates from a single file (that way the aligning procedure that @andrrizzi used is no longer necessary). I've also generated multi-molecule versions for the Before making commits, I just want to follow-up on some items:
|
I think @jchodera has a bit of an aversion to "lots of files in the repo", but I agree with you that for most "normal simulation" users, multi-molecule mol2 files are not useful -- they basically mean you either need to rely on a chemistry toolkit for parsing them (OpenEye software, RDKit, etc.) or write your own tool to separate them out into individual files. Thus, my preference is actually to have the individual files, OR provide a tool which can break them into these, otherwise we're just asking other people to do the work so that our repo can stay tidy. Sorry, I should have answered that aspect before.
I don't think we need GAFF types, no.
Sounds great.
They should be there! So I'm all for having them. :) I don't think our tools will correctly process the PDB files without them. |
|
I've updated the input files and READMEs for the host-guest benchmark files. (For some reason I screwed up the commit process and it did it twice ... not sure how that was possible.)
Things to be addressed in the future:
|
|
@GHeinzelmann - can you see Niel's note above? Specifically:
@nhenriksen - thanks for all the work. I'll review as soon as I can. At the moment I'm slightly swamped dealing with SAMPL so I'm not going to get to it as soon as I hoped to (I'd hoped to have reviewed it already!). It looks like you're being quite thorough though. I can create issues for these:
I think I can fix the second of these, at least for the GDCC case, by docking into the host using the code I just figured out. Alternatively your strategy of extracting from the equilibrated files should work. Do you have thoughts on which is preferable? @Janeyin600 might also want to weigh in? |
|
@davidlmobley Whenever you get to reviewing is fine, no urgency on my part. If you want to get the CB7/GDCC guests docked with your code, that would be great! |
… one-sentence-per-line for better version control
davidlmobley
left a comment
There was a problem hiding this comment.
@nhenriksen - overall this looks great! Thanks so much for all your work. I'll put some feedback here for you to address (sorry if it's not formatted that nicely but I'm in an airport between flights and only have a moment to info dump) and also will submit a PR to your branch that includes some changes that address some minor issues.
Changes suggested:
- The updated Figure 1 removes TEMOA and adds beta-CD, though the benchmark includes both beta-CD and gamma-CD. I think my preference would be to just expand the figure (making it two rows perhaps) to keep TEMOA and add beta and gamma-CD
- The Figure 1 caption is not correct since it still refers to TEMOA, and doesn't say which cyclodextrin is pictured (but see point just prior)
- I'm thinking it would be convenient for Fig. 1 to say which is the "wider end" vs the "narrower end"; I didn't notice they were asymmetric at first from just the text.
- "The behavior of phosphate ions, particularly the HPO_4^{2-} species, can be sensitive to the particular choice of water model..." seems to need a reference.
- The SMILES for the aromatic guests use different kekulization than those in the other tables; e.g. benzoic acid shows up in Table IV and Table VI with different SMILES. If you guys have the OpenEye tools again I can easily generate SMILES for you, or you can do it, but I'd prefer to treat them consistently. (To say it another way: The aromatic rings get "aromatic carbon" (small c1ccc(cc1)) notation in all my SMILES strings, whereas yours use alternating singles and doubles (C1=CC=C(C=C1)) notation.
Questions:
- "Despite the flexibility of CD hosts, the small size of the guests combined with long timescale simulations enable by GPUs should allow adequate convergence of these thermodynamic values." Would it be accurate to say, "appears to allow adequate convergence..." and give a reference to your study? This would be a little stronger of a statement.
- "Since the cavity is both narrow and hydrophobic, interchange between the two orientations is usually slow and requires release into the solvent and subsequent rebinding"; can you say anything about the timescales for this from your experience or prior work, e.g. "These events may take as long as XYZ microseconds for some guests" or similar?
- Your tables use "mol/kg" units for buffer. I realize this is probably what the experiments were reported with, but it's inconvenient for computational folks and could introduce a source of error in setup (if everyone is converting). Maybe we want to put "(approximately XYZ mM)" in parens to help people out?
I took the liberty of going ahead and adding you as an author (will be in my PR)! Thanks for all your work on this. It's much appreciated.
|
Also, I forgot this comment:
|
|
BTW I think we are almost ready to merge, so once you get these edits in (plus my PR) we should go ahead and send the LaTeX/PDF to Mike so he can edit. @GHeinzelmann - please also review. :) |
Incorporate minor edits of new cyclodextrin material into nieldev branch
|
A few clarifications ...
The figure is already 3 rows due to the various views and takes up slightly more than half the page. My thinking was that due to the similarity of OA/TEMOA and α-CD/β-CD, we could just use a single representative. But assuming we do want to display all 5 hosts, shall I aim for a full page figure then?
Good idea.
I don't have a reference for that because it's unpublished work. We could add an "(unpublished)" to it?
Good point, I'll redo those.
Sure, that works.
It's hard to say, since I haven't done any long time-scale on/off type simulations. For strong binders, I don't see any unbinding on the microsecond timescale, so interchange would definitely exceed that due to all the additional time it takes to diffuse around the box once it does successfully unbind. I could say something like "Sampling these events is expected to require microsecond-scale simulations or longer."
Actually I think molality is nicer to deal with than molarity, since it won't depend on volume/temp./pressure. But I agree that it is less common. At lower concentrations, it will have essentially the same molarity as molality since 1 kg of the solution will be about 1 Liter. I couldn't find any exact data for density of 50 mM sodium phosphate, but I did find it for potassium phosphate, which shows that it is very close to 1 g/cm^3. So yes, I think we could say "(approximately 50 mM)" in the table footnotes.
Yes, I can add them ... although the table might get squeezed for space. I'll try it out. |
- added smiles strings - updated naming to reflect expected protonation state.
- added smiles - edited names to reflect protonation state
True, it was getting back. But, we're electronic-only now and don't have to worry about page limits, so, why not full-page? Or we can split it into three separate figures, one for each category of host, if having a full-page figure bothers you. :) I'd rather be able to see all three of them, myself -- otherwise first thing I'll do wonder what the other ones look like and wish for a figure. Ha.
Sure, cite yourself...? I think we just need to say how we know.
That sounds good to me; you could even do better and say "... because previous work has not observed binding and unbinding events in microsecond-length simulations [ref your paper]." Otherwise, all sounds good. |
|
@davidlmobley I've addressed your requested changes in the latest commit. I had to shrink the images in Fig 1 to get them to fit. The figure might work better on a landscape page rather than portrait, but that might be more trouble than it's worth to do. All the |
|
@nhenriksen - Thanks! Hopefully I'll review late today. I'll go ahead and send TeX/PDF to Mike, CC you. |
The Zhang 2016 paper has a bunch more of the same guests, but they were all treated as neutral, whereas the experiment had the guests charged. I've omitted them to avoid confusion. Also, I just realized that names like "4-methylphenylacetate" may have caused confusion in the literature. Rekharsky 1997 used that name to refer to the charged version of 4-methylphenylacetic acid, but both Wickstrom 2013 or Zhang 2016 interpreted it as the structure that is more commonly known as "p-tolyl acetate" which is totally different. This occurred for several different related compounds. I've left those off the list as well, because clearly the numbers won't be comparable.
davidlmobley
left a comment
There was a problem hiding this comment.
@nhenriksen - this is looking great. README files look excellent, the new Figure 1 looks good to me, and the changes appear to have addressed my issues.
I'm marking this as approved. I think we just need Mike to take a look, but I've sent it to him and CCd him, so hopefully that doesn't take long.
Again, thank you for all your work on this. I hope it feels worthwhile -- I'm convinced it will be (people are going to use your data/files/etc.!).
|
Also, random comments on your last commit:
Gosh, that's a huge issue. Yes, definitely want to avoid confusion.
This is part of why I don't like to use compound names as identifiers ever -- using a compound name is begging a human to interpret it (partly because automated name parsers sometimes make mistakes, and/or can't interpret names that humans can), and once you get a human involved they will misinterpret things. Hence, SMILES and/or PubChem compound IDs... I'm starting to work with some people on some best practices documents for MD simulations. I should probably mention the "please don't use compound names" thing in those. :) |
No description provided.