Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 110 additions & 1 deletion _posts/2016-xx-xx-release-0.16.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ You can upgrade with `pip install --upgrade MDAnalysis`

# Noticable Changes

## Attach arbitraty time series to your trajectories
## Attach arbitrary time series to your trajectories

Our GSoC student @fiona-naughton has implemented an auxillary reader to add
arbitrary time series to a universe. The time series are kept in sync with the
Expand Down Expand Up @@ -119,6 +119,115 @@ If you are using the low-level qcprot algorithm your self intead of our provided
wrappers you have to change your code since the API has changed. For more see
the [CHANGELOG].

## MemoryReader: Reading trajectories from memory

MDAnalysis typically reads trajectories from files on-demand, so that it can efficiently deal with large trajectories - even those that do not fit in memory. However, in some cases, both for convenience and for efficiency, it can be an advantage to work with trajectories directly in memory. In this release, we have introduced a MemoryReader, which makes this possible.

The MemoryReader works with numpy arrays, using the same format as that used by for instance `DCDReader.timeseries()`. You can create a Universe directly from such an array:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment/question: .timeseries() is currently only implemented for DCDs; there's an open issue for Gromacs formats MDAnalysis/mdanalysis#186.

I think, it would make the post too complicated to mention here that this particular example only works with DCDs but for the future we need to think if we want to support .timeseries() more widely. The Universe(..., in_memory=True) approach takes care of this, of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have a more flexible option then the timeseries now, see the Convenience functions to create a new analysis section of the post. The new AnalysisFromFunction works with any trajectory and complex function. Your MemoryReader can also speed up that one significantly. I actually planned to make an jupyter notebook as a gist for the post to show the speed up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst, @kain88-de, Interesting comments. If you prefer, I can certainly change the example so that I extract the array by iterating over the coordinates, which is basically what we do internally in Universe.transfer_to_memory if there is no .timeseries available. It would just be slightly more verbose - and perhaps less clear what is going on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to avoid the timeseries API. We only support it for DCD. Showing the AnalysisFromFunction will work for any trajectory. So the code can easily be copy pasted by users. The code would be.

coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy(), 
                                   u.trajectory, u.atoms).run().results

This code is slightly more verbose but works for any reader, even chain readers!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've changed the first example, so it no longer duplicates an existing trajectory, but instead now constructs a trajectory from a random coordinate array, as suggested by @kain88-de below.
Btw, nice trick with the AnalysisFromFunction. If you can find a way to still sneak that in, you should certainly feel free to do so.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I add the AnalysisFromFunction example in the section of the release notes where we introduce the feature.

@orbeckst you mentioned to simulate a x-ray example by adding gaussian noise. Would that be added per frame? Because adding them per frame can be done with a small modification to the above AnalysisFromFunction code.

coordinates = AnalysisFromFunction(lambda ag: ag.positions.copy() + np.random.normal(size=ag.positions.shape), 
                                   u.trajectory, u.atoms).run().results

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this, but with the Gaussians' widths set from the B-factor of each atom via Bfactor2RMSF().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, thinking about it now, I was originally thinking of an example that creates a new numpy array: Start with a crystal structure (1 frame) and generate a fake ensemble of positions compatible with the crystallographic B-factors. (I think we have real PDBs in the test datafiles.) I am not sure yet what one would use this ensemble for... or maybe run things like HOLE on it to get pore radii that take the resolution of a crystal structure into account or use it as a negative control for PCA.


```python
import numpy as np
from MDAnalysis import Universe
from MDAnalysisTests.datafiles import DCD, PSF
from MDAnalysis.coordinates.memory import MemoryReader

# Create a Universe using a DCD reader
universe = Universe(PSF, DCD)

# Create a numpy array with random coordinates (100 frames) for the same topology
coordinates = np.random.uniform(size=(universe.atoms.n_atoms, 100, 3)).cumsum(0)

# Create a new Universe directly from these coordinates
universe2 = Universe(PSF, coordinates, format=MemoryReader)
```

The MemoryReader will work just as any other reader. In particular, you can iterate over it as usual, or use the `.timeseries()` method to retrieve a reference to the raw array:

```python
coordinates_fac = universe2.trajectory.timeseries(format='fac')
```

Certain operations can be speeded up by moving a trajectory to memory, and we have therefore
added functionality to directly transfer any existing trajectory to a MemoryReader using `Universe.transfer_to_memory`:

```python
universe = Universe(PSF, DCD)
universe.transfer_to_memory() # Switches to a MemoryReader representation
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you can also switch a trajectory over to a MemoryReader after you loaded it using

universe = Universe(PDB_small, DCD)    # normal DCDReader
universe.transfer_to_memory()          # DCDReader was replaced with MemoryReader

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah showing of this is really important since I see it as the main way people will use the MemoryReader.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orbeckst, @kain88-de This has now been made explicit. I actually now mention this use-case first, and only thereafter the constructor option.


You can also do this directly upon construction of a Universe, by using the `in_memory` flag:

```python
universe = Universe(PSF, DCD, in_memory=True)
```

Likewise, the `AlignTraj` class in the analysis/align.py module also has an `in_memory` flag, allowing it to do in-place alignments in memory.


## Incorporation of the ENCORE ensemble similarity library

The **ENCORE** ensemble similarity library has been integrated with MDAnalysis as [MDAnalysis.analysis.encore](http://docs.mdanalysis.org/documentation_pages/analysis/encore.html). It implements a variety of techniques for calculating similarities between structural ensembles (trajectories), as described in this publication:

Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): e1004415. doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415).

Using the similarity measures is simply a matter of loading the trajectories or experimental ensembles that one would like to compare as MDAnalysis.Universe objects:

```python
from MDAnalysis import Universe
import MDAnalysis.analysis.encore as encore
from MDAnalysis.tests.datafiles import PSF, DCD, DCD2
u1 = Universe(PSF, DCD)
u2 = Universe(PSF, DCD2)
```

and running the similarity measures on them, choosing among 1) the Harmonic Ensemble Similarity measure:

```python
hes_similarities, details = encore.hes([u1, u2])
print hes_similarities
```
```
[[ 0. 38279683.9587939]
[ 38279683.9587939 0. ]]
```

2) the Clustering Ensemble Similarity measure:

```python
ces_similarities, details = encore.ces([u1, u2])
print ces_similarities
```
```
[[ 0. 0.68070702]
[ 0.68070702 0. ]]
```

or 3) the Dimensionality Reduction Ensemble Similarity measure:

```python
dres_similarities, details = encore.dres([u1, u2])
print dres_similarities
```
```
[[ 0. 0.65434461]
[ 0.65434461 0. ]]
```
Similarities are written in a square symmetric matrix having the same dimensions and ordering as the input list, with each element being the similarity value for a pair of the input ensembles.

The encore library includes a general interface to various clustering and dimensionality reduction algorithms (through the [scikit-learn](http://scikit-learn.org/) package), which makes it easy to switch between clustering and dimensionality reduction algorithms when using the `ces` and `dres` functions. The clustering and dimensionality reduction functionality is also directly available through the `cluster` and `reduce_dimensionality` functions. For instance, to cluster the conformations from the two universes defined above, we can write:
```python
cluster_collection = encore.cluster([u1,u2])
print cluster_collection
```
```
0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98])
1 (size:5,centroid:6): array([4, 5, 6, 7, 8])
2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15])
```
In addition to standard cluster membership information, the `cluster_collection` output keep track of the origin of each conformation, so you check how the different trajectories are represented in each cluster. For further details, see the documentation of the individual functions within Encore.


# Minor Enhancements

- No more deprecation warning spam when MDAnalyis is imported
Expand Down