Contributions on Encore and MemoryReader to Blog on 0.16.0 release#30
Conversation
orbeckst
left a comment
There was a problem hiding this comment.
Very nice summary, see comments for improvements.
_posts/2016-xx-xx-release-0.16.md
Outdated
|
|
||
| The ENCORE ensemble similarity library has been integrated with MDAnalysis. It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: | ||
|
|
||
| Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:10.1371/journal.pcbi.1004415. |
There was a problem hiding this comment.
Can you make the doi a link?
Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative
Ensemble Comparison. PLoS Comput Biol 11(10): e1004415.
doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415).|
|
||
| MDAnalysis typically reads trajectories from files on-demand, so that it can efficiently deal with large trajectories - even those that do not fit in memory. However, in some cases, both for convenience and for efficiency, it can be an advantage to work with trajectories directly in memory. In this release, we have introduced a MemoryReader, which makes this possible. | ||
|
|
||
| The MemoryReader works with numpy arrays, using the same format as that used by for instance `DCDReader.timeseries()`. You can create a Universe directly from such an array: |
There was a problem hiding this comment.
Just a comment/question: .timeseries() is currently only implemented for DCDs; there's an open issue for Gromacs formats MDAnalysis/mdanalysis#186.
I think, it would make the post too complicated to mention here that this particular example only works with DCDs but for the future we need to think if we want to support .timeseries() more widely. The Universe(..., in_memory=True) approach takes care of this, of course.
There was a problem hiding this comment.
We actually have a more flexible option then the timeseries now, see the Convenience functions to create a new analysis section of the post. The new AnalysisFromFunction works with any trajectory and complex function. Your MemoryReader can also speed up that one significantly. I actually planned to make an jupyter notebook as a gist for the post to show the speed up.
There was a problem hiding this comment.
@orbeckst, @kain88-de, Interesting comments. If you prefer, I can certainly change the example so that I extract the array by iterating over the coordinates, which is basically what we do internally in Universe.transfer_to_memory if there is no .timeseries available. It would just be slightly more verbose - and perhaps less clear what is going on.
There was a problem hiding this comment.
I would like to avoid the timeseries API. We only support it for DCD. Showing the AnalysisFromFunction will work for any trajectory. So the code can easily be copy pasted by users. The code would be.
coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(),
u.trajectory, u.atoms).run().resultsThis code is slightly more verbose but works for any reader, even chain readers!
There was a problem hiding this comment.
OK, I've changed the first example, so it no longer duplicates an existing trajectory, but instead now constructs a trajectory from a random coordinate array, as suggested by @kain88-de below.
Btw, nice trick with the AnalysisFromFunction. If you can find a way to still sneak that in, you should certainly feel free to do so.
There was a problem hiding this comment.
I think I add the AnalysisFromFunction example in the section of the release notes where we introduce the feature.
@orbeckst you mentioned to simulate a x-ray example by adding gaussian noise. Would that be added per frame? Because adding them per frame can be done with a small modification to the above AnalysisFromFunction code.
coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy() + np.random.normal(size=ag.positions.shape),
u.trajectory, u.atoms).run().resultsThere was a problem hiding this comment.
Something like this, but with the Gaussians' widths set from the B-factor of each atom via Bfactor2RMSF().
There was a problem hiding this comment.
Actually, thinking about it now, I was originally thinking of an example that creates a new numpy array: Start with a crystal structure (1 frame) and generate a fake ensemble of positions compatible with the crystallographic B-factors. (I think we have real PDBs in the test datafiles.) I am not sure yet what one would use this ensemble for... or maybe run things like HOLE on it to get pore radii that take the resolution of a crystal structure into account or use it as a negative control for PCA.
_posts/2016-xx-xx-release-0.16.md
Outdated
| # Create a new Universe directly from these coordinates | ||
| # using the MemoryReader | ||
| universe2 = Universe(PDB_small, coordinates, | ||
| format=MemoryReader) |
There was a problem hiding this comment.
Maybe this comment is not the right place but would it be worthwhile to also allow format="MEMORY"? (It would save the explicit import of MemoryReader.)
(Although I do like the elegance of using the reader directly as a format...)
There was a problem hiding this comment.
This should work automatically for any array_like ideally without specifying the format. I created an issue for it.
There was a problem hiding this comment.
@orbeckst and @kain88-de, as far as I can see, your two posts refer to slightly different behaviours. Should we create an issue for the feature that @orbeckst refers to as well? I agree that it would be convenient not always to need the explicit import.
There was a problem hiding this comment.
I think if my issue is fixed then we don't need a format="MEMORY" any longer since we don't need to specify the format at all.
There was a problem hiding this comment.
@kain88-de, I see, right. I haven't made any changes to the blog post regarding this. I guess this will depend on whether the issue gets fixed.
There was a problem hiding this comment.
Either format="MEMORY" or direct recognition would be fine with me, but further discussion see MDAnalysis/mdanalysis#1049. For this post I suggest we keep what definitely works, namely explicitly using the MemoryReader class. (If MDAnalysis/mdanalysis#1049 gets fixed we can change the post.)
|
|
||
| ```python | ||
| universe = Universe(PDB_small, DCD, in_memory=True) | ||
| ``` |
There was a problem hiding this comment.
And you can also switch a trajectory over to a MemoryReader after you loaded it using
universe = Universe(PDB_small, DCD) # normal DCDReader
universe.transfer_to_memory() # DCDReader was replaced with MemoryReaderThere was a problem hiding this comment.
Yeah showing of this is really important since I see it as the main way people will use the MemoryReader.
There was a problem hiding this comment.
@orbeckst, @kain88-de This has now been made explicit. I actually now mention this use-case first, and only thereafter the constructor option.
_posts/2016-xx-xx-release-0.16.md
Outdated
| universe = Universe(PDB_small, DCD, in_memory=True) | ||
| ``` | ||
|
|
||
| Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. |
There was a problem hiding this comment.
rms_fit_trj() is being deprecated: use AlignTraj instead.
Check that AlignTraj has the functionality and then mention MDAnalysis.analysis.align.AlignTraj here.
(If AlignTraj does not have in_memory, raise an issue asap...)
There was a problem hiding this comment.
_posts/2016-xx-xx-release-0.16.md
Outdated
| ``` | ||
| Similarities are written in a square symmetric matrix having the same dimensions and ordering as the input list, with each element being the similarity value for a pair of the input ensembles. | ||
|
|
||
| The encore library includes a general interface to various clustering and dimensionality reduction algorithms (through the scikit-learn package), which makes it easy to switch between clustering and dimensionality reduction algorithms when using the `ces` and `dres` functions. The clustering and dimensionality reduction functionality is also directly available through the `cluster` and `reduce_dimensionality` functions. For instance, to cluster the conformations from the two universes defined above, we can write: |
_posts/2016-xx-xx-release-0.16.md
Outdated
|
|
||
| ## Incorporation of the ENCORE ensemble similarity library | ||
|
|
||
| The ENCORE ensemble similarity library has been integrated with MDAnalysis. It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: |
There was a problem hiding this comment.
Mention the module and link to the docs:
The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as
[MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html).Note: the link to the docs is broken because of MDAnalysis/mdanalysis#1047
There was a problem hiding this comment.
Fixed. I hope to have time to look at your comments for MDAnalysis/mdanalysis#1047 the coming days.
_posts/2016-xx-xx-release-0.16.md
Outdated
| universe = Universe(PDB_small, DCD) | ||
|
|
||
| # Extract coordinates | ||
| coordinates = universe.trajectory.timeseries() |
There was a problem hiding this comment.
Please remove the timeseries. It is a really used feature. I think it is enough to point out that calling universe.transfer_to_memory() will speed up their code.
There was a problem hiding this comment.
@kain88-de I wanted to illustrate how easy it is to generate a Universe from an existing numpy array, which is actually one of the prime use-cases when we use it in our group. Also, it is actually quite convenient that MemoryReader supports the same interface as .timeseries, including the format option for specifying which dimension is which. It is therefore something I would really like to highlight. If you have strong opinions about this, I can certainly remove it, but otherwise, I would prefer to keep it in. Perhaps we can add a transfer_to_memory example first, and then this afterwards?
There was a problem hiding this comment.
Good point to show case this here. I haven't thought about that. But your example is only a explicit version of transfer_to_memory, maybe we can have another example that can't be replaced with a transfer_to_memory?
The only one I can think of is creating a random trajectory.
coordinates = np.random.uniform(size=(100, u.atoms.n_atoms, 3)).cumsum(0)
universe = Universe(PDB_small, coordinates, format=MemoryReader)There was a problem hiding this comment.
@kain88-de , your example is now included.
There was a problem hiding this comment.
I like the Brownian dynamics example but you need to change the order
coordinates = np.random.uniform(size=(u.atoms.n_atoms, 100, 3)).cumsum(0)(If we could easily construct a topology on the fly with only, say, Ar atoms, then the example would make almost sense as an ideal gas. Alternatively, we could just add a gaussian random offset to the initial coordinates with the width determined by the B-factor and so "simulate" an ensemble of X-ray structures...)
_posts/2016-xx-xx-release-0.16.md
Outdated
| [array([1, 1, 1, 1, 2]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2])] | ||
| ``` | ||
|
|
||
| For further details, see the documentation of the individual functions within Encore. |
There was a problem hiding this comment.
I would reduce the examples to 1 and link to the others in the docs.
There was a problem hiding this comment.
Done. I've moved this code to the docs for the cluster.py module. Will push the changes to the docs themselves in a few days (need to go over them all again as part of MDAnalysis/mdanalysis#1047)
|
@wouterboomsma thanks for the chapters. If you work on @orbeckst comments it will be good to go. |
orbeckst
left a comment
There was a problem hiding this comment.
See comments, obviously MDAnalysis/mdanalysis#1059 is a road block. I suggest you move ahead and pretend for this blog post that AlignTraj gets fixed soon so that we can merge.
_posts/2016-xx-xx-release-0.16.md
Outdated
| universe = Universe(PSF, DCD, in_memory=True) | ||
| ``` | ||
|
|
||
| Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. |
There was a problem hiding this comment.
rms_fit_trj() is deprecated, see earlier comments.
There was a problem hiding this comment.
@orbeckst, have now replaced "rms_fit_trj function" with "AlignTraj class", under the assumption that this will become true prior to release.
_posts/2016-xx-xx-release-0.16.md
Outdated
| ``` | ||
|
|
||
| The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any format: | ||
| The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any order of the dimensions ('fac' means ('frames','atoms','coordinates')): |
There was a problem hiding this comment.
or use the `.timeseries()` method (if available in the trajectory reader)There was a problem hiding this comment.
Is it still fac order? The coordinates = np.random.uniform(size=(universe.atoms.n_atoms, 100, 3)).cumsum(0) example uses afc ordering.
There was a problem hiding this comment.
The default order of timeseries is 'afc'. I always found this a bit unnatural, but I made it the default for MemoryReader as well to be consistent with timeseries. When manually constructing a MemoryReader you can explicitly specify which format you which to use, but when doing it implicitly through the Universe constructor, you are forced to use the default - i.e, 'afc'. When I wrote the code, I thought timeseries was a big deal in MDAnalysis. From your and @kain88-de comments, I gather that that perhaps this is not the case - which means we might also reconsider making 'fac' the default. Up to you guys.
There was a problem hiding this comment.
@orbeckst , btw in this part of the text, we refer specifically to the MemoryReader, so we don't really need to mention "if available in the trajectory reader".
There was a problem hiding this comment.
@orbeckst I've removed the "order of the dimensions ('fac' means ('frames','atoms','coordinates'))" as you suggested.
_posts/2016-xx-xx-release-0.16.md
Outdated
| ## Incorporation of the ENCORE ensemble similarity library | ||
|
|
||
| The ENCORE ensemble similarity library has been integrated with MDAnalysis. It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: | ||
| The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: |
There was a problem hiding this comment.
Sorry, I wasn't really clear where the docs for releases live: On PyPi but we have redirects, so the link should be
http://docs.mdanalysis.org/documentation_pages/analysis/encore.html
(assuming that PR #1047 gets merged.)
…release. Fixed link to Encore documentation.
- fixes #1056 (comment) - added entry to reST sphinx docs - added/changed docs in coordinates/memory.py: - included AnalysisFromFunction example from MDAnalysis/MDAnalysis.github.io#30 (comment) - included transfer_to_memory() example - included example for how to make a in-memory trajectory of a sub-system with Merge() and Universe.load_new() from #915 (comment) - minor reST fixes in the MemoryReader docs (e.g., renamed the Examples section because the numpy format napoleon parser treats this heading special and does not recognize it as a standard section)
|
@orbeckst just introduced the following example to the MemoryReader documentation: with the following note: Perhaps we should just make a final decision on this matter now - since changing it later will likely break some code. Should we change the default order of MemoryReader to 'fac'? Since 'fac' seems to be the default in |
|
Whoops. This was not really the right place to post this. Sorry for the confusion. Perhaps we should raise an issue instead? |
|
@wouterboomsma yes please open an issue. |
* Draft of blog post contribution for Encore and MemoryReader * Replaced reference to rms_fit_trj to AlignTraj in blog post for 0.16 release. Fixed link to Encore documentation.
AW/aSf1cMwyV/HEBsjzJVpIzJQJjRKMIlnTphJLvowrp8fZzWuLXNqrRGZ8qh40P 992dcFcdQtBeJZzN/Ae0f1IuksVq67VgHNDdAqJp29NcYc1631juuKV/J+n4XNIM 8awwckUsclitWmWXmuujydyoW6/LspRhO9+ZdGeh6FRFQbgMoI7/4NMGFRscVY9D VOdn8OkOrHy5PmR0atC8jtdmuBhxF6CR8KLNpt46xsI2zbncPsFB534aFQk+5my4 Xzyq/d3eBkftFegs2nY0/lIypiL6nxWdTGQOMtloxKkpal5pBn1lUn2npLy5CbUh krFLbWa8UrL8Nw6UZat/1GpwXoj+O0n4gbJIHg5bZ2PiBZkUO9lJhivtuiyPpAdy MGsznWyaC2KRwb158wQZamnP3zTKwn3vgRFb1njgTRQI8fg+E8MyhPZazq1OI4Zl T2SgfzHm2HGmEv/cYiXMpwr5fOgcaHisNWXsGV0sm/PJ/RqAH5HKS0SP8TdOve/b CRLlwBHbLhEfDoafxKuRMzx9HXwAQj2YGAN6dqyM+jeZaz0e7dz5MolO3iUkuAIK Vgugw3NSTR/GzuFWCOgTFumvnuxManNxMTK8+KqaKq2H2QHbrJwF/BRQ+r4ua43+ vX2R8RYDn8DCSKUJIL+P =OWsn -----END PGP SIGNATURE----- include docs for MemoryReader - fixes MDAnalysis#1056 (comment) - added entry to reST sphinx docs - added/changed docs in coordinates/memory.py: - included AnalysisFromFunction example from MDAnalysis/MDAnalysis.github.io#30 (comment) - included transfer_to_memory() example - included example for how to make a in-memory trajectory of a sub-system with Merge() and Universe.load_new() from MDAnalysis#915 (comment) - minor reST fixes in the MemoryReader docs (e.g., renamed the Examples section because the numpy format napoleon parser treats this heading special and does not recognize it as a standard section)
* Draft of blog post contribution for Encore and MemoryReader * Replaced reference to rms_fit_trj to AlignTraj in blog post for 0.16 release. Fixed link to Encore documentation.
* Draft of blog post contribution for Encore and MemoryReader * Replaced reference to rms_fit_trj to AlignTraj in blog post for 0.16 release. Fixed link to Encore documentation.
* Draft of blog post contribution for Encore and MemoryReader * Replaced reference to rms_fit_trj to AlignTraj in blog post for 0.16 release. Fixed link to Encore documentation.
* Skeleton blog post for Outreachy wrap up * Add final blog post and link to skeleton post (#30) * Add final blog post and link to skeleton post * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update _posts/2024-09-04-outreachy2024_adetutudeborah.md Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Update 2024-09-04-outreachy2024_adetutudeborah.md * Update 2024-09-04-outreachy2024_adetutudeborah.md --------- Co-authored-by: Jenna M Swarthout Goddard <38287809+jennaswa@users.noreply.github.com> * Rename 2024-09-02-outreachy2024-final.md to 2024-09-09-outreachy2024-final.md --------- Co-authored-by: Adetutu Oluwasanmi <oluwasanmiadetutu@gmail.com>
A first draft of some paragraphs on Encore and MemoryReader to add to the 0.16.0 release blog post.