From 169b90c066eb95568bcf0252b9a876e09c3b8e57 Mon Sep 17 00:00:00 2001 From: Wouter Boomsma Date: Fri, 21 Oct 2016 16:33:44 +0200 Subject: [PATCH 1/5] Draft of blog post contribution for Encore and MemoryReader --- _posts/2016-xx-xx-release-0.16.md | 115 +++++++++++++++++++++++++++++- 1 file changed, 114 insertions(+), 1 deletion(-) diff --git a/_posts/2016-xx-xx-release-0.16.md b/_posts/2016-xx-xx-release-0.16.md index 679277f7..48007627 100644 --- a/_posts/2016-xx-xx-release-0.16.md +++ b/_posts/2016-xx-xx-release-0.16.md @@ -18,7 +18,7 @@ You can upgrade with `pip install --upgrade MDAnalysis` # Noticable Changes -## Attach arbitraty time series to your trajectories +## Attach arbitrary time series to your trajectories Our GSoC student @fiona-naughton has implemented an auxillary reader to add arbitrary time series to a universe. The time series are kept in sync with the @@ -119,6 +119,119 @@ If you are using the low-level qcprot algorithm your self intead of our provided wrappers you have to change your code since the API has changed. For more see the [CHANGELOG]. +## MemoryReader: Reading trajectories from memory + +MDAnalysis typically reads trajectories from files on-demand, so that it can efficiently deal with large trajectories - even those that do not fit in memory. However, in some cases, both for convenience and for efficiency, it can be an advantage to work with trajectories directly in memory. In this release, we have introduced a MemoryReader, which makes this possible. + +The MemoryReader works with numpy arrays, using the same format as that used by for instance `DCDReader.timeseries()`. You can create a Universe directly from such an array: + +```python +from MDAnalysis import Universe +from MDAnalysisTests.datafiles import DCD, PDB_small +from MDAnalysis.coordinates.memory import MemoryReader + +# Create a Universe using a DCD reader +universe = Universe(PDB_small, DCD) + +# Extract coordinates +coordinates = universe.trajectory.timeseries() + +# Create a new Universe directly from these coordinates +# using the MemoryReader +universe2 = Universe(PDB_small, coordinates, + format=MemoryReader) +``` + +The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any format: + +```python +coordinates_fac = universe2.trajectory.timeseries(format='fac') +``` + +Certain operations can be speeded up by moving a trajectory to memory. To facilitate this operation, the constructor of `Universe` takes an `in_memory` flag which will automatically convert any trajectory to a MemoryReader: + +```python +universe = Universe(PDB_small, DCD, in_memory=True) +``` + +Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. + + +## Incorporation of the ENCORE ensemble similarity library + +The ENCORE ensemble similarity library has been integrated with MDAnalysis. It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: + + Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015) + ENCORE: Software for Quantitative Ensemble Comparison. + PLoS Comput Biol 11(10): e1004415. doi:10.1371/journal.pcbi.1004415 + +Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects: + +```python +from MDAnalysis import Universe +import MDAnalysis.analysis.encore as encore +from MDAnalysis.tests.datafiles import PSF, DCD, DCD2 +u1 = Universe(PSF, DCD) +u2 = Universe(PSF, DCD2) +``` + +and running the similarity measures on them, choosing among 1) the Harmonic Ensemble Similarity measure: + +```python +hes_similarities, details = encore.hes([u1, u2]) +print hes_similarities +``` +``` +[[ 0. 38279683.9587939] + [ 38279683.9587939 0. ]] +``` + +2) the Clustering Ensemble Similarity measure: + +```python +ces_similarities, details = encore.ces([u1, u2]) +print ces_similarities +``` +``` +[[ 0. 0.68070702] + [ 0.68070702 0. ]] +``` + +or 3) the Dimensionality Reduction Ensemble Similarity measure: + +```python +dres_similarities, details = encore.dres([u1, u2]) +print dres_similarities +``` +``` +[[ 0. 0.65434461] + [ 0.65434461 0. ]] +``` +Similarities are written in a square symmetric matrix having the same dimensions and ordering as the input list, with each element being the similarity value for a pair of the input ensembles. + +The encore library includes a general interface to various clustering and dimensionality reduction algorithms (through the scikit-learn package), which makes it easy to switch between clustering and dimensionality reduction algorithms when using the `ces` and `dres` functions. The clustering and dimensionality reduction functionality is also directly available through the `cluster` and `reduce_dimensionality` functions. For instance, to cluster the conformations from the two universes defined above, we can write: +```python +cluster_collection = encore.cluster([u1,u2]) +print cluster_collection +``` +``` +0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98]) +1 (size:5,centroid:6): array([4, 5, 6, 7, 8]) +2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15]) +… +``` +In addition to standard cluster membership information, the `cluster_collection` output keep track of the origin of each conformation, so you check how the different trajectories are represented in each cluster: +```python +[ccfor cluster in cluster_collection: + print cluster.metadata["ensemble_membership"] +``` +``` +[array([1, 1, 1, 1, 2]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2])] +``` + +For further details, see the documentation of the individual functions within Encore. + + # Minor Enhancements - No more deprecation warning spam when MDAnalyis is imported From 218c8b3fc13493e3d42befcff2aef50923c4326d Mon Sep 17 00:00:00 2001 From: Wouter Boomsma Date: Fri, 21 Oct 2016 16:38:22 +0200 Subject: [PATCH 2/5] Minor corrections to blog post contribution for Encore and MemoryReader --- _posts/2016-xx-xx-release-0.16.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/_posts/2016-xx-xx-release-0.16.md b/_posts/2016-xx-xx-release-0.16.md index 48007627..649bccb8 100644 --- a/_posts/2016-xx-xx-release-0.16.md +++ b/_posts/2016-xx-xx-release-0.16.md @@ -161,9 +161,7 @@ Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an The ENCORE ensemble similarity library has been integrated with MDAnalysis. It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: - Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015) - ENCORE: Software for Quantitative Ensemble Comparison. - PLoS Comput Biol 11(10): e1004415. doi:10.1371/journal.pcbi.1004415 + Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:10.1371/journal.pcbi.1004415. Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects: @@ -222,8 +220,7 @@ print cluster_collection ``` In addition to standard cluster membership information, the `cluster_collection` output keep track of the origin of each conformation, so you check how the different trajectories are represented in each cluster: ```python -[ccfor cluster in cluster_collection: - print cluster.metadata["ensemble_membership"] +print [cluster.metadata["ensemble_membership"] for cluster in cluster_collection] ``` ``` [array([1, 1, 1, 1, 2]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2])] From d2a3fea45aa57afefd3580f54bfba2c4a20abc2f Mon Sep 17 00:00:00 2001 From: Wouter Boomsma Date: Tue, 1 Nov 2016 00:04:54 +0100 Subject: [PATCH 3/5] Changes to Encore and MemoryReader contribution to 0.16 release blog post. --- _posts/2016-xx-xx-release-0.16.md | 43 +++++++++++++++---------------- 1 file changed, 21 insertions(+), 22 deletions(-) diff --git a/_posts/2016-xx-xx-release-0.16.md b/_posts/2016-xx-xx-release-0.16.md index 649bccb8..585d812c 100644 --- a/_posts/2016-xx-xx-release-0.16.md +++ b/_posts/2016-xx-xx-release-0.16.md @@ -127,31 +127,38 @@ The MemoryReader works with numpy arrays, using the same format as that used by ```python from MDAnalysis import Universe -from MDAnalysisTests.datafiles import DCD, PDB_small +from MDAnalysisTests.datafiles import DCD, PSF from MDAnalysis.coordinates.memory import MemoryReader # Create a Universe using a DCD reader -universe = Universe(PDB_small, DCD) +universe = Universe(PSF, DCD) -# Extract coordinates -coordinates = universe.trajectory.timeseries() +# Create a numpy array with random coordinates (100 frames) +# for the same topology +coordinates = np.random.uniform(size=(100, universe.atoms.n_atoms, 3)).cumsum(0) # Create a new Universe directly from these coordinates -# using the MemoryReader -universe2 = Universe(PDB_small, coordinates, - format=MemoryReader) +universe2 = Universe(PDB_small, coordinates, format=MemoryReader) ``` -The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any format: +The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any order of the dimensions ('fac'='(frames,atoms,coordinates)): ```python coordinates_fac = universe2.trajectory.timeseries(format='fac') ``` -Certain operations can be speeded up by moving a trajectory to memory. To facilitate this operation, the constructor of `Universe` takes an `in_memory` flag which will automatically convert any trajectory to a MemoryReader: +Certain operations can be speeded up by moving a trajectory to memory, and we have therefore +added functionality to directly transfer any existing trajectory to a MemoryReader using `Universe.transfer_to_memory`: ```python -universe = Universe(PDB_small, DCD, in_memory=True) +universe = Universe(PSF, DCD) +universe.transfer_to_memory() # Switches to a MemoryReader representation +``` + +You can also do this directly upon construction of a Universe, by using the `in_memory` flag: + +```python +universe = Universe(PSF, DCD, in_memory=True) ``` Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. @@ -159,9 +166,9 @@ Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an ## Incorporation of the ENCORE ensemble similarity library -The ENCORE ensemble similarity library has been integrated with MDAnalysis. It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: +The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: - Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:10.1371/journal.pcbi.1004415. + Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415). Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects: @@ -207,7 +214,7 @@ print dres_similarities ``` Similarities are written in a square symmetric matrix having the same dimensions and ordering as the input list, with each element being the similarity value for a pair of the input ensembles. -The encore library includes a general interface to various clustering and dimensionality reduction algorithms (through the scikit-learn package), which makes it easy to switch between clustering and dimensionality reduction algorithms when using the `ces` and `dres` functions. The clustering and dimensionality reduction functionality is also directly available through the `cluster` and `reduce_dimensionality` functions. For instance, to cluster the conformations from the two universes defined above, we can write: +The encore library includes a general interface to various clustering and dimensionality reduction algorithms (through the [scikit-learn](http://scikit-learn.org/) package), which makes it easy to switch between clustering and dimensionality reduction algorithms when using the `ces` and `dres` functions. The clustering and dimensionality reduction functionality is also directly available through the `cluster` and `reduce_dimensionality` functions. For instance, to cluster the conformations from the two universes defined above, we can write: ```python cluster_collection = encore.cluster([u1,u2]) print cluster_collection @@ -218,15 +225,7 @@ print cluster_collection 2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15]) … ``` -In addition to standard cluster membership information, the `cluster_collection` output keep track of the origin of each conformation, so you check how the different trajectories are represented in each cluster: -```python -print [cluster.metadata["ensemble_membership"] for cluster in cluster_collection] -``` -``` -[array([1, 1, 1, 1, 2]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1]), array([1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2]), array([2, 2, 2, 2, 2, 2, 2])] -``` - -For further details, see the documentation of the individual functions within Encore. +In addition to standard cluster membership information, the `cluster_collection` output keep track of the origin of each conformation, so you check how the different trajectories are represented in each cluster. For further details, see the documentation of the individual functions within Encore. # Minor Enhancements From e188c2ee707eb7c35716b1df4f8da4673901f5cd Mon Sep 17 00:00:00 2001 From: Wouter Boomsma Date: Tue, 1 Nov 2016 00:19:52 +0100 Subject: [PATCH 4/5] Minor edits in Encore and MemoryReader contribution to 0.16 release blog post. --- _posts/2016-xx-xx-release-0.16.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/_posts/2016-xx-xx-release-0.16.md b/_posts/2016-xx-xx-release-0.16.md index 585d812c..c6e83362 100644 --- a/_posts/2016-xx-xx-release-0.16.md +++ b/_posts/2016-xx-xx-release-0.16.md @@ -126,6 +126,7 @@ MDAnalysis typically reads trajectories from files on-demand, so that it can eff The MemoryReader works with numpy arrays, using the same format as that used by for instance `DCDReader.timeseries()`. You can create a Universe directly from such an array: ```python +import numpy as np from MDAnalysis import Universe from MDAnalysisTests.datafiles import DCD, PSF from MDAnalysis.coordinates.memory import MemoryReader @@ -133,15 +134,14 @@ from MDAnalysis.coordinates.memory import MemoryReader # Create a Universe using a DCD reader universe = Universe(PSF, DCD) -# Create a numpy array with random coordinates (100 frames) -# for the same topology -coordinates = np.random.uniform(size=(100, universe.atoms.n_atoms, 3)).cumsum(0) +# Create a numpy array with random coordinates (100 frames) for the same topology +coordinates = np.random.uniform(size=(universe.atoms.n_atoms, 100, 3)).cumsum(0) # Create a new Universe directly from these coordinates -universe2 = Universe(PDB_small, coordinates, format=MemoryReader) +universe2 = Universe(PSF, coordinates, format=MemoryReader) ``` -The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any order of the dimensions ('fac'='(frames,atoms,coordinates)): +The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any order of the dimensions ('fac' means ('frames','atoms','coordinates')): ```python coordinates_fac = universe2.trajectory.timeseries(format='fac') @@ -168,7 +168,7 @@ Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: - Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415). +Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415). Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects: From 558887ad94c16c471fdf484235def38cfcc60011 Mon Sep 17 00:00:00 2001 From: Wouter Boomsma Date: Tue, 1 Nov 2016 08:46:39 +0100 Subject: [PATCH 5/5] Replaced reference to rms_fit_trj to AlignTraj in blog post for 0.16 release. Fixed link to Encore documentation. --- _posts/2016-xx-xx-release-0.16.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/_posts/2016-xx-xx-release-0.16.md b/_posts/2016-xx-xx-release-0.16.md index c6e83362..87871a4d 100644 --- a/_posts/2016-xx-xx-release-0.16.md +++ b/_posts/2016-xx-xx-release-0.16.md @@ -141,7 +141,7 @@ coordinates = np.random.uniform(size=(universe.atoms.n_atoms, 100, 3)).cumsum(0) universe2 = Universe(PSF, coordinates, format=MemoryReader) ``` -The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array in any order of the dimensions ('fac' means ('frames','atoms','coordinates')): +The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array: ```python coordinates_fac = universe2.trajectory.timeseries(format='fac') @@ -161,12 +161,12 @@ You can also do this directly upon construction of a Universe, by using the `in_ universe = Universe(PSF, DCD, in_memory=True) ``` -Likewise, the `rms_fit_trj` function in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. +Likewise, the `AlignTraj` class in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory. ## Incorporation of the ENCORE ensemble similarity library -The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: +The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://docs.mdanalysis.org/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication: Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415).