-
Notifications
You must be signed in to change notification settings - Fork 48
Contributions on Encore and MemoryReader to Blog on 0.16.0 release #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
169b90c
218c8b3
d2a3fea
e188c2e
558887a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,7 +18,7 @@ You can upgrade with `pip install --upgrade MDAnalysis` | |
|
|
||
| # Noticable Changes | ||
|
|
||
| ## Attach arbitraty time series to your trajectories | ||
| ## Attach arbitrary time series to your trajectories | ||
|
|
||
| Our GSoC student @fiona-naughton has implemented an auxillary reader to add | ||
| arbitrary time series to a universe. The time series are kept in sync with the | ||
|
|
@@ -119,6 +119,115 @@ If you are using the low-level qcprot algorithm your self intead of our provided | |
| wrappers you have to change your code since the API has changed. For more see | ||
| the [CHANGELOG]. | ||
|
|
||
| ## MemoryReader: Reading trajectories from memory | ||
|
|
||
| MDAnalysis typically reads trajectories from files on-demand, so that it can efficiently deal with large trajectories - even those that do not fit in memory. However, in some cases, both for convenience and for efficiency, it can be an advantage to work with trajectories directly in memory. In this release, we have introduced a MemoryReader, which makes this possible. | ||
|
|
||
| The MemoryReader works with numpy arrays, using the same format as that used by for instance `DCDReader.timeseries()`. You can create a Universe directly from such an array: | ||
|
|
||
| ```python | ||
| import numpy as np | ||
| from MDAnalysis import Universe | ||
| from MDAnalysisTests.datafiles import DCD, PSF | ||
| from MDAnalysis.coordinates.memory import MemoryReader | ||
|
|
||
| # Create a Universe using a DCD reader | ||
| universe = Universe(PSF, DCD) | ||
|
|
||
| # Create a numpy array with random coordinates (100 frames) for the same topology | ||
| coordinates = np.random.uniform(size=(universe.atoms.n_atoms, 100, 3)).cumsum(0) | ||
|
|
||
| # Create a new Universe directly from these coordinates | ||
| universe2 = Universe(PSF, coordinates, format=MemoryReader) | ||
| ``` | ||
|
|
||
| The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array: | ||
|
|
||
| ```python | ||
| coordinates_fac = universe2.trajectory.timeseries(format='fac') | ||
| ``` | ||
|
|
||
| Certain operations can be speeded up by moving a trajectory to memory, and we have therefore | ||
| added functionality to directly transfer any existing trajectory to a MemoryReader using `Universe.transfer_to_memory`: | ||
|
|
||
| ```python | ||
| universe = Universe(PSF, DCD) | ||
| universe.transfer_to_memory() # Switches to a MemoryReader representation | ||
| ``` | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And you can also switch a trajectory over to a universe = Universe(PDB_small, DCD) # normal DCDReader
universe.transfer_to_memory() # DCDReader was replaced with MemoryReader
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah showing of this is really important since I see it as the main way people will use the MemoryReader.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @orbeckst, @kain88-de This has now been made explicit. I actually now mention this use-case first, and only thereafter the constructor option. |
||
|
|
||
| You can also do this directly upon construction of a Universe, by using the `in_memory` flag: | ||
|
|
||
| ```python | ||
| universe = Universe(PSF, DCD, in_memory=True) | ||
| ``` | ||
|
|
||
| Likewise, the `AlignTraj` class in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. | ||
|
|
||
|
|
||
| ## Incorporation of the ENCORE ensemble similarity library | ||
|
|
||
| The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://docs.mdanalysis.org/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: | ||
|
|
||
| Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415). | ||
|
|
||
| Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects: | ||
|
|
||
| ```python | ||
| from MDAnalysis import Universe | ||
| import MDAnalysis.analysis.encore as encore | ||
| from MDAnalysis.tests.datafiles import PSF, DCD, DCD2 | ||
| u1 = Universe(PSF, DCD) | ||
| u2 = Universe(PSF, DCD2) | ||
| ``` | ||
|
|
||
| and running the similarity measures on them, choosing among 1) the Harmonic Ensemble Similarity measure: | ||
|
|
||
| ```python | ||
| hes_similarities, details = encore.hes([u1, u2]) | ||
| print hes_similarities | ||
| ``` | ||
| ``` | ||
| [[ 0. 38279683.9587939] | ||
| [ 38279683.9587939 0. ]] | ||
| ``` | ||
|
|
||
| 2) the Clustering Ensemble Similarity measure: | ||
|
|
||
| ```python | ||
| ces_similarities, details = encore.ces([u1, u2]) | ||
| print ces_similarities | ||
| ``` | ||
| ``` | ||
| [[ 0. 0.68070702] | ||
| [ 0.68070702 0. ]] | ||
| ``` | ||
|
|
||
| or 3) the Dimensionality Reduction Ensemble Similarity measure: | ||
|
|
||
| ```python | ||
| dres_similarities, details = encore.dres([u1, u2]) | ||
| print dres_similarities | ||
| ``` | ||
| ``` | ||
| [[ 0. 0.65434461] | ||
| [ 0.65434461 0. ]] | ||
| ``` | ||
| Similarities are written in a square symmetric matrix having the same dimensions and ordering as the input list, with each element being the similarity value for a pair of the input ensembles. | ||
|
|
||
| The encore library includes a general interface to various clustering and dimensionality reduction algorithms (through the [scikit-learn](http://scikit-learn.org/) package), which makes it easy to switch between clustering and dimensionality reduction algorithms when using the `ces` and `dres` functions. The clustering and dimensionality reduction functionality is also directly available through the `cluster` and `reduce_dimensionality` functions. For instance, to cluster the conformations from the two universes defined above, we can write: | ||
| ```python | ||
| cluster_collection = encore.cluster([u1,u2]) | ||
| print cluster_collection | ||
| ``` | ||
| ``` | ||
| 0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98]) | ||
| 1 (size:5,centroid:6): array([4, 5, 6, 7, 8]) | ||
| 2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15]) | ||
| … | ||
| ``` | ||
| In addition to standard cluster membership information, the `cluster_collection` output keep track of the origin of each conformation, so you check how the different trajectories are represented in each cluster. For further details, see the documentation of the individual functions within Encore. | ||
|
|
||
|
|
||
| # Minor Enhancements | ||
|
|
||
| - No more deprecation warning spam when MDAnalyis is imported | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment/question:
.timeseries()is currently only implemented for DCDs; there's an open issue for Gromacs formats MDAnalysis/mdanalysis#186.I think, it would make the post too complicated to mention here that this particular example only works with DCDs but for the future we need to think if we want to support
.timeseries()more widely. TheUniverse(..., in_memory=True)approach takes care of this, of course.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have a more flexible option then the timeseries now, see the Convenience functions to create a new analysis section of the post. The new
AnalysisFromFunctionworks with any trajectory and complex function. Your MemoryReader can also speed up that one significantly. I actually planned to make an jupyter notebook as a gist for the post to show the speed up.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orbeckst, @kain88-de, Interesting comments. If you prefer, I can certainly change the example so that I extract the array by iterating over the coordinates, which is basically what we do internally in
Universe.transfer_to_memoryif there is no.timeseriesavailable. It would just be slightly more verbose - and perhaps less clear what is going on.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to avoid the timeseries API. We only support it for DCD. Showing the AnalysisFromFunction will work for any trajectory. So the code can easily be copy pasted by users. The code would be.
This code is slightly more verbose but works for any reader, even chain readers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I've changed the first example, so it no longer duplicates an existing trajectory, but instead now constructs a trajectory from a random coordinate array, as suggested by @kain88-de below.
Btw, nice trick with the
AnalysisFromFunction. If you can find a way to still sneak that in, you should certainly feel free to do so.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I add the AnalysisFromFunction example in the section of the release notes where we introduce the feature.
@orbeckst you mentioned to simulate a x-ray example by adding gaussian noise. Would that be added per frame? Because adding them per frame can be done with a small modification to the above AnalysisFromFunction code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this, but with the Gaussians' widths set from the B-factor of each atom via Bfactor2RMSF().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, thinking about it now, I was originally thinking of an example that creates a new numpy array: Start with a crystal structure (1 frame) and generate a fake ensemble of positions compatible with the crystallographic B-factors. (I think we have real PDBs in the test datafiles.) I am not sure yet what one would use this ensemble for... or maybe run things like HOLE on it to get pore radii that take the resolution of a crystal structure into account or use it as a negative control for PCA.