[WIP] Bauhaus PCA by jdetle · Pull Request #896 · MDAnalysis/mdanalysis

jdetle · 2016-07-08T20:29:40Z

Changes made in this Pull Request:

Added PCA Module conforming to Bauhaus API

TODO

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

kain88-de · 2016-07-08T20:42:33Z

package/MDAnalysis/analysis/pca.py

+
+
+    def _single_frame(self):
+        self._xyz[self._frame_index] = self._atoms.positions.copy()


Nope this should slowly calculate the covariance matrix. Otherwise you are possibly loading hundreds of gigabytes

So are you saying I should start calculating the covariance matrix here with a loop inside _single_frame?

What I'm saying is that the sum goes over the frames. So you have to slowly build up the sum for each entry of the matrix as you iterate over the frame. Something like this should do the trick.

def _prelude(self): n_dim = self.atoms.n_atom * 3 self.cov = np.zeros(n_dim, n_dim) def _single_frame(self): x = self.atoms.positions.flat # maybe you need to use ravel as this only iterates! self.cov += np.dot(x[:, np.newaxis], x[:, np.newaxis].T) def _conclude(self): self.cov /= self.n_frames - 1

Please check the dot product. To get more information about the correct normalization have a look at the np.cov docs

We can't use np.cov directly in _single_frame because it works different for 1D and 2D arrays.

The dot product could be faster in cython but I doubt it will be noticeable compared with the slow loading of
a trajectory.

I guess that's a good point. You're only as slow as your bottleneck and loading from a disk is pretty slow.

Okay, I took a few hours to mull it over just to be sure but this method that you wrote out about above will not be zero-mean. This will give a different answer than the method you wrote out in your gist. Did you overlook this or is it intentional?

for i, ts in enumerate(u.trajectory[0:-1:1]): x = ca.positions.flatten() mean += x mean /= u.trajectory.n_frames for i, ts in enumerate(u.trajectory[0:-1:1]): x = ca.positions.flatten() x -= mean prac_cov += np.dot(x[:, np.newaxis], x[:, np.newaxis].T) prac_cov /= u.trajectory.n_frames -1

This is a snippet for code that replicates the np.cov method

sorry about the confusion. But yeah my example wasn't complete. It was a quick example how I thought it should work.

coveralls · 2016-07-08T21:03:07Z

Coverage increased (+0.03%) to 80.414% when pulling 41f1ef6 on jdetle:PCA into 5d78f03 on MDAnalysis:develop.

kain88-de · 2016-07-20T21:24:43Z

about the transform. For a PCA it makes sense to accept a atomgroup and return a array of the transformed coordinates. This allows to apply the same PCA on different simulations. Returning the array is consistent with the DIffusionMaps implementation.

jdetle · 2016-07-20T21:41:57Z

Okay, and this atom group should be the same atom group used to generate
the covariance matrix, right?

On Wed, Jul 20, 2016 at 2:24 PM Max Linke notifications@github.com wrote:

about the transform. For a PCA it makes sense to accept a atomgroup and
return a array of the transformed coordinates. This allows to apply the
same PCA on different simulations. Returning the array is consistent with
the DIffusionMaps implementation.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#896 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AKcARt1H9oyPEu4BfeixQxN7wTILv2WUks5qXpIcgaJpZM4JIXwX
.

Have a wonderful day,
John J. Detlefs

kain88-de · 2016-07-20T21:52:39Z

Well kind of yeah. Lets say we have a test like this.

if self.atoms.n_atoms != atoms.n_atoms:
    raise ValueError()
elif self.atoms != atoms:  # I hope that checks for atom types and not reference id
    warnings.warn('transform for different atom types')

This could be something people might want to do in exotic cases. But we warn the normal user that he should be careful.

coveralls · 2016-07-21T23:31:07Z

Coverage increased (+0.006%) to 83.146% when pulling 3e245b6 on jdetle:PCA into 9a632a9 on MDAnalysis:develop.

coveralls · 2016-07-21T23:56:40Z

Coverage increased (+0.03%) to 83.168% when pulling 3e245b6 on jdetle:PCA into 9a632a9 on MDAnalysis:develop.

kain88-de · 2016-07-22T07:26:20Z

package/MDAnalysis/analysis/pca.py

+        self._n_atoms = self._atoms.n_atoms
+        self._calculated = False
+
+    def fit(self, n_components=None, start=None, stop=None,


why fit? This goes away from the consistency we are trying to achieve for the analysis methods. They all should have a run method that then does everything required. So I would assume I can use the PCA like this.

>>> pca = analysis.pca.PCA(atomgroup).run() >>> pca.variance >>> pca_coord = pca.transform(atomgroup, n_components=4) >>> print (pca_coord.shape) (n_frames, 4)

Sorry thats a good point. I called it fit in an unnecessary attempt to mirror the sklearn api.

coveralls · 2016-07-24T02:01:54Z

Coverage increased (+0.02%) to 83.159% when pulling 8ee2426 on jdetle:PCA into 9a632a9 on MDAnalysis:develop.

kain88-de · 2016-07-25T19:38:45Z

package/MDAnalysis/analysis/pca.py


-    def __init__(self, atomgroup, select='all', n_components=None,
-                 **kwargs):
+    def __init__(self, atomgroup, select='all', mean_free=True, reference=None,


Please call the option demean this is used in scikit-learn and statsmodel to mean that the mean should be subtracted before any calculations are done. I would like to use the same terminology to other python science packages to minimize surprise and friction for people using mdanalysis.

Oh okay. Haven't really got to addressing that yet but thanks in advance.

kain88-de · 2016-07-26T07:47:18Z

package/MDAnalysis/analysis/pca.py

+        self.cov = np.zeros((n_dim, n_dim))
+        self.mean = np.zeros(n_dim,)
+
+        for i, ts in enumerate(self._u.trajectory[self.start:self.stop:self.step]):


this doesn't need to be done anymore. You have aligned the structures.

jdetle · 2016-07-26T22:14:01Z

Right now I'm kind of struggling to get through what should be insignificant roadblocks so I'm just going to right some crappy code and iterate from there.

kain88-de · 2016-07-27T09:34:16Z

package/MDAnalysis/analysis/pca.py

+                                             self._atoms, self._ref_cog,
+                                             self._atoms.center_of_geometry())
+            # now all structures are aligned to reference
+            x = mobile_atoms.positions.ravel()


move this out of the if-else clause then you can skip the else clause completely. (and remove code duplication)

mobile_atoms is different than self._atoms right? Or am I misunderstanding things?

yeah you are right. I overlooked that.

jdetle · 2016-08-12T20:25:07Z

I think I've been creating a headache for you by squashing commits and force pushing. I'll stop that now.

kain88-de · 2016-08-12T20:29:36Z

package/MDAnalysis/analysis/pca.py

+
+        if mean is None:
+            warning.warn('In order to demean to generate the covariance matrix'
+                         ' the frames have to be iterated over twice. To avoid'


I normally let the white space at the end of the previous line. This is also what we do in most line breaking strings in mdanalysis (afaik that is some kind of unspoken standard between languages)

kain88-de · 2016-08-12T20:30:39Z

package/MDAnalysis/analysis/pca.py

+
+This module constructs a covariance matrix wherein each element of the matrix
+is denoted by (i,j) row-column coordinates. The (i,j) coordinate is the
+influence of the of the ith frame's coordinates on the jth frame's coordinates


This text is still wrong. I doesn't refer to a frame but simple a coordinates.

kain88-de · 2016-08-12T20:34:20Z

package/MDAnalysis/analysis/pca.py

+    After initializing and calling method with a universe or an atom group,
+    principal components ordering the atom coordinate data by decreasing
+    variance will be available for analysis. Please refer to the
+    :ref:`PCA-tutorial` for more detailed instructions.


Sorry this doesn't work in an interactive session. You have to do a simple code example.
Something like this is enough to get a very short overview. It is still good to link to the tutorial but that is only seen in the online docs as a link.

pca = PCA(atomgroup, select='backbone').run() pca_space = pca.transform(atomgroup.select_atoms('backbone'), 3)

kain88-de · 2016-08-12T20:35:49Z

Don't worry about the force push. It's just that I don't see always the changes in github since you change (rightly) different lines then I commend on.

kain88-de · 2016-08-12T20:37:35Z

I'm more sorry I only comment on the doc issues now and still see some minor things that I missed earlier. Seems like a lot of small issues that I could have mentioned all at once to save you some work.

jdetle · 2016-08-12T20:49:36Z

No problem on my end, makes me feel like I have work to do :)

kain88-de · 2016-08-12T21:05:08Z

package/MDAnalysis/analysis/pca.py

-    variance will be available for analysis. Please refer to the
-    :ref:`PCA-tutorial` for more detailed instructions.
+    variance will be available for analysis. As an example:
+        >>> pca = PCA(atomgroup, select='backbone').run()


You have to wrap this in a code block so it shows up in the Sphinx docs right?

Nope, I just fixed a typo there though. Wait, I don't know if I understood your question. I gave it the code blocks so the code is formatted in sphinx, yes.

I use in my code blocks

.. code::python >>> # now comes the code

Have a look around for this in the other doc strings and you see what I mean.

I just checked this is already correctly rendered so you can ignore my comment.

coveralls · 2016-08-12T21:18:09Z

Changes Unknown when pulling 6099a79 on jdetle:PCA into * on MDAnalysis:develop*.

kain88-de · 2016-08-13T06:39:16Z

package/MDAnalysis/analysis/pca.py

+    >>> from MDAnalysis.tests.datafiles import PSF, DCD
+
+Given a universe containing trajectory data we can perform PCA using
+:class:`PCA`:: and retrieve the principal components.


the last :: are not necessary. And please mention in the test that PCA is also a class. In the rendered docs this reads to perform PCA we can use PCA. something like to perform a PCA we can use the PCA class is better.

kain88-de · 2016-08-13T06:43:49Z

One last comment about the doc text/

kain88-de · 2016-08-13T06:45:13Z

package/MDAnalysis/analysis/pca.py

+variance is explained by the components. This cumulated variance by the
+components is conveniently stored in the one-dimensional array attribute
+`cumulated_variance`. The value at the ith index of `cumulated_variance` is the
+sum of the variances from 0 to i.


from 0 to 1 and not to i

soory seems like I can't read in the morning.

kain88-de · 2016-08-14T17:01:39Z

Thanks @jdetle

orbeckst · 2016-08-18T17:37:34Z

@jdetle , this PR is as good as any of the other ones that you contributed to say many thanks for all your excellent contributions to MDAnalysis – both the code and the community – during your GSoC. It would be great if you were to hang around a bit more even after GSoC.

[WIP] Bauhaus PCA

kain88-de reviewed Jul 8, 2016
View reviewed changes

jdetle force-pushed the PCA branch from 41f1ef6 to e91e655 Compare July 21, 2016 23:06

kain88-de reviewed Jul 22, 2016
View reviewed changes

kain88-de reviewed Jul 25, 2016
View reviewed changes

jdetle mentioned this pull request Jul 26, 2016

[Feature Request] Write a trajectory from an AtomGroup and an Array #915

Open

kain88-de reviewed Jul 26, 2016
View reviewed changes

kain88-de reviewed Jul 27, 2016
View reviewed changes

jdetle force-pushed the PCA branch from ddbc5dc to b1f4d84 Compare August 12, 2016 20:26

kain88-de reviewed Aug 12, 2016
View reviewed changes

jdetle force-pushed the PCA branch from b1f4d84 to 97c0e82 Compare August 12, 2016 20:29

kain88-de reviewed Aug 12, 2016
View reviewed changes

Doc fixes, test updates

a52d9b8

kain88-de reviewed Aug 12, 2016
View reviewed changes

More docs fixes

e0617e0

jdetle force-pushed the PCA branch from 97c0e82 to e0617e0 Compare August 12, 2016 20:45

kain88-de reviewed Aug 12, 2016
View reviewed changes

Doc fix [skip ci]

6099a79

jdetle force-pushed the PCA branch from 8dd8819 to 6099a79 Compare August 12, 2016 21:08

kain88-de reviewed Aug 13, 2016
View reviewed changes

jdetle force-pushed the PCA branch from e8ab932 to 8bb6226 Compare August 13, 2016 22:27

PCA docs updates

1cf340c

jdetle force-pushed the PCA branch from 8bb6226 to 1cf340c Compare August 13, 2016 22:46

kain88-de merged commit b375b4b into MDAnalysis:develop Aug 14, 2016

jdetle deleted the PCA branch August 14, 2016 17:47

abiedermann pushed a commit to abiedermann/mdanalysis that referenced this pull request Jan 5, 2017

Merge pull request MDAnalysis#896 from jdetle/PCA

98cacda

[WIP] Bauhaus PCA



		def _single_frame(self):
		self._xyz[self._frame_index] = self._atoms.positions.copy()

Conversation

jdetle commented Jul 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

PR Checklist

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdetle Jul 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jul 8, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kain88-de commented Jul 20, 2016

Uh oh!

jdetle commented Jul 20, 2016

Uh oh!

kain88-de commented Jul 20, 2016

Uh oh!

coveralls commented Jul 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Jul 21, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jul 24, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdetle commented Jul 26, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdetle commented Aug 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kain88-de commented Aug 12, 2016

Uh oh!

kain88-de commented Aug 12, 2016

Uh oh!

jdetle commented Aug 12, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdetle Aug 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jdetle commented Jul 8, 2016 •

edited

Loading

jdetle Jul 21, 2016 •

edited

Loading

coveralls commented Jul 8, 2016 •

edited

Loading

coveralls commented Jul 21, 2016 •

edited

Loading

coveralls commented Jul 21, 2016 •

edited

Loading

coveralls commented Jul 24, 2016 •

edited

Loading

jdetle Aug 12, 2016 •

edited

Loading

coveralls commented Aug 12, 2016 •

edited

Loading