Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,9 @@ Enhancements
to transfer data from HDF5 file into timestep. (PR #3293)

Changes
* `analysis.pca.PCA` class now stores `p_components`, `variance`,
`cumulated_variance` and `mean_atoms` using the
`analysis.base.Results` class (Issues #3275 #3285)
* Timestep now stores information in 'C' memory layout instead of
the previous 'F' default (PR #1738)
* `analysis.rdf.InterRDF` and `analysis.rdf.InterRDF_s` now store
Expand Down Expand Up @@ -281,6 +284,11 @@ Changes
* Added OpenMM coordinate and topology converters (Issue #2863, PR #2917)

Deprecations
* The attributes `p_components`, `variance`, `cumulated_variance` and
`mean_atoms` in `analysis.pca.PCA` are now deprecated in favour of
`results.p_components`, `results.variance`, `results.cumulated_variance`
and `results.mean_atoms`. They will be removed in 3.0.0
(Issues #3275 #3285)
* The `bins`, `edges`, `count`, `rdf` attributes for `analysis.rdf.InterRDF`
and `analysis.rdf.InterRDF_s`, and `cdf` attributes for
`analysis.rdf.InterRDF_s` are now deprecated in favour of `results.bins`,
Expand Down
193 changes: 130 additions & 63 deletions package/MDAnalysis/analysis/pca.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,12 +43,13 @@
components are the eigenvectors of this matrix.

For each eigenvector, its eigenvalue is the variance that the eigenvector
explains. Stored in :attr:`PCA.cumulated_variance`, a ratio for each number of
eigenvectors up to index :math:`i` is provided to quickly find out how many
principal components are needed to explain the amount of variance reflected by
those :math:`i` eigenvectors. For most data, :attr:`PCA.cumulated_variance`
explains. Stored in :attr:`PCA.results.cumulated_variance`, a ratio for each
number of eigenvectors up to index :math:`i` is provided to quickly find out
how many principal components are needed to explain the amount of variance
reflected by those :math:`i` eigenvectors. For most data,
:attr:`PCA.results.cumulated_variance`
will be approximately equal to one for some :math:`n` that is significantly
smaller than the total number of components, these are the components of
smaller than the total number of components. These are the components of
interest given by Principal Component Analysis.

From here, we can project a trajectory onto these principal components and
Expand All @@ -67,30 +68,34 @@
:data:`~MDAnalysis.tests.datafiles.DCD`). This tutorial shows how to use the
PCA class.

First load all modules and test data
First load all modules and test data::

import MDAnalysis as mda
import MDAnalysis.analysis.pca as pca
from MDAnalysis.tests.datafiles import PSF, DCD

>>> import MDAnalysis as mda
>>> import MDAnalysis.analysis.pca as pca
>>> from MDAnalysis.tests.datafiles import PSF, DCD

Given a universe containing trajectory data we can perform Principal Component
Analyis by using the class :class:`PCA` and retrieving the principal
components.
components.::

u = mda.Universe(PSF, DCD)
PSF_pca = pca.PCA(u, select='backbone')
PSF_pca.run()

>>> u = mda.Universe(PSF, DCD)
>>> PSF_pca = pca.PCA(u, select='backbone')
>>> PSF_pca.run()

Inspect the components to determine the principal components you would like
to retain. The choice is arbitrary, but I will stop when 95 percent of the
variance is explained by the components. This cumulated variance by the
components is conveniently stored in the one-dimensional array attribute
``cumulated_variance``. The value at the ith index of `cumulated_variance`
is the sum of the variances from 0 to i.
:attr:`PCA.results.cumulated_variance`. The value at the ith index of
:attr:`PCA.results.cumulated_variance` is the sum of the variances from 0 to
i.::

n_pcs = np.where(PSF_pca.results.cumulated_variance > 0.95)[0][0]
atomgroup = u.select_atoms('backbone')
pca_space = PSF_pca.transform(atomgroup, n_components=n_pcs)

>>> n_pcs = np.where(PSF_pca.cumulated_variance > 0.95)[0][0]
>>> atomgroup = u.select_atoms('backbone')
>>> pca_space = PSF_pca.transform(atomgroup, n_components=n_pcs)

From here, inspection of the ``pca_space`` and conclusions to be drawn from the
data are left to the user.
Expand All @@ -104,6 +109,10 @@

.. autofunction:: cosine_content

.. autofunction:: rmsip

.. autofunction:: cumulative_overlap

"""
import warnings

Expand All @@ -124,32 +133,82 @@ class PCA(AnalysisBase):

After initializing and calling method with a universe or an atom group,
principal components ordering the atom coordinate data by decreasing
variance will be available for analysis. As an example:
variance will be available for analysis. As an example:::

pca = PCA(universe, select='backbone').run()
pca_space = pca.transform(universe.select_atoms('backbone'), 3)

>>> pca = PCA(universe, select='backbone').run()
>>> pca_space = pca.transform(universe.select_atoms('backbone'), 3)

generates the principal components of the backbone of the atomgroup and
then transforms those atomgroup coordinates by the direction of those
variances. Please refer to the :ref:`PCA-tutorial` for more detailed
instructions.

Parameters
----------
universe : Universe
Universe
select : string, optional
A valid selection statement for choosing a subset of atoms from
the atomgroup.
align : boolean, optional
If True, the trajectory will be aligned to a reference
structure.
mean : array_like, optional
Optional reference positions to be be used as the mean of the
covariance matrix.
n_components : int, optional
The number of principal components to be saved, default saves
all principal components
verbose : bool (optional)
Show detailed progress of the calculation if set to ``True``.

Attributes
----------
p_components: array, (n_atoms * 3, n_components)
The principal components of the feature space,
results.p_components: array, (n_atoms * 3, n_components)
Principal components of the feature space,
representing the directions of maximum variance in the data.
The column vector p_components[:, i] is the eigenvector
corresponding to the variance[i].
variance : array (n_components, )
The raw variance explained by each eigenvector of the covariance

.. versionadded:: 2.0.0

p_components: array, (n_atoms * 3, n_components)
Alias to the :attr:`results.p_components`.

.. deprecated:: 2.0.0
Will be removed in MDAnalysis 3.0.0. Please use
:attr:`results.p_components` instead.

results.variance : array (n_components, )
Raw variance explained by each eigenvector of the covariance
matrix.
cumulated_variance : array, (n_components, )

.. versionadded:: 2.0.0

variance : array (n_components, )
Alias to the :attr:`results.variance`.

.. deprecated:: 2.0.0
Will be removed in MDAnalysis 3.0.0. Please use
:attr:`results.variance` instead.

results.cumulated_variance : array, (n_components, )
Percentage of variance explained by the selected components and the sum
of the components preceding it. If a subset of components is not chosen
then all components are stored and the cumulated variance will converge
to 1.

.. versionadded:: 2.0.0

cumulated_variance : array, (n_components, )
Alias to the :attr:`results.cumulated_variance`.

.. deprecated:: 2.0.0
Will be removed in MDAnalysis 3.0.0. Please use
:attr:`results.cumulated_variance` instead.


Methods
-------
transform(atomgroup, n_components=None)
Expand All @@ -161,11 +220,10 @@ class PCA(AnalysisBase):
-----
Computation can be sped up by supplying precalculated mean positions.

.. versionchanged:: 2.0.0
``mean_atoms`` removed, as this did not reliably contain the mean
positions.
``mean`` input now accepts coordinate arrays instead of atomgroup.

.. versionchanged:: 0.19.0
The start frame is used when performing selections and calculating
mean positions. Previously the 0th frame was always used.
.. versionchanged:: 1.0.0
``n_components`` now limits the correct axis of ``p_components``.
``cumulated_variance`` now accurately represents the contribution of
Expand All @@ -174,34 +232,16 @@ class PCA(AnalysisBase):
``p_components``, ``cumulated_variance`` will not sum to 1.
``align=True`` now correctly aligns the trajectory and computes the
correct means and covariance matrix.

.. versionchanged:: 0.19.0
The start frame is used when performing selections and calculating
mean positions. Previously the 0th frame was always used.
.. versionchanged:: 2.0.0
``mean_atoms`` removed, as this did not reliably contain the mean
positions.
``mean`` input now accepts coordinate arrays instead of atomgroup.
:attr:`p_components`, :attr:`variance` and :attr:`cumulated_variance`
are now stored in a :class:`MDAnalysis.analysis.base.Results` instance.
"""

def __init__(self, universe, select='all', align=False, mean=None,
n_components=None, **kwargs):
"""
Parameters
----------
universe : Universe
Universe
select : string, optional
A valid selection statement for choosing a subset of atoms from
the atomgroup.
align : boolean, optional
If True, the trajectory will be aligned to a reference
structure.
mean : array_like, optional
Optional reference positions to be be used as the mean of the
covariance matrix.
n_components : int, optional
The number of principal components to be saved, default saves
all principal components
verbose : bool (optional)
Show detailed progress of the calculation if set to ``True``.
"""
super(PCA, self).__init__(universe.trajectory, **kwargs)
self._u = universe

Expand Down Expand Up @@ -281,6 +321,30 @@ def _conclude(self):
self._calculated = True
self.n_components = self._n_components

@property
def p_components(self):
wmsg = ("The `p_components` attribute was deprecated in "
"MDAnalysis 2.0.0 and will be removed in MDAnalysis 3.0.0. "
"Please use `results.p_components` instead.")
warnings.warn(wmsg, DeprecationWarning)
return self.results.p_components

@property
def variance(self):
wmsg = ("The `variance` attribute was deprecated in "
"MDAnalysis 2.0.0 and will be removed in MDAnalysis 3.0.0. "
"Please use `results.variance` instead.")
warnings.warn(wmsg, DeprecationWarning)
return self.results.variance

@property
def cumulated_variance(self):
wmsg = ("The `cumulated_variance` attribute was deprecated in "
"MDAnalysis 2.0.0 and will be removed in MDAnalysis 3.0.0. "
"Please use `results.cumulated_variance` instead.")
warnings.warn(wmsg, DeprecationWarning)
return self.results.cumulated_variance

@property
def n_components(self):
return self._n_components
Expand All @@ -290,10 +354,10 @@ def n_components(self, n):
if self._calculated:
if n is None:
n = len(self._variance)
self.variance = self._variance[:n]
self.cumulated_variance = (np.cumsum(self._variance) /
self.results.variance = self._variance[:n]
self.results.cumulated_variance = (np.cumsum(self._variance) /
np.sum(self._variance))[:n]
self.p_components = self._p_components[:, :n]
self.results.p_components = self._p_components[:, :n]
self._n_components = n

def transform(self, atomgroup, n_components=None, start=None, stop=None,
Expand Down Expand Up @@ -348,7 +412,7 @@ def transform(self, atomgroup, n_components=None, start=None, stop=None,
n_frames = len(range(start, stop, step))

dim = (n_components if n_components is not None else
self.p_components.shape[1])
self.results.p_components.shape[1])

dot = np.zeros((n_frames, dim))

Expand Down Expand Up @@ -390,12 +454,13 @@ def rmsip(self, other, n_components=None):

See also
--------
rmsip
:func:`~MDAnalysis.analysis.pca.rmsip`


.. versionadded:: 1.0.0
"""
try:
a = self.p_components
a = self.results.p_components
except AttributeError:
raise ValueError('Call run() on the PCA before using rmsip')

Expand Down Expand Up @@ -442,14 +507,14 @@ def cumulative_overlap(self, other, i=0, n_components=None):

See also
--------
cumulative_overlap
:func:`~MDAnalysis.analysis.pca.cumulative_overlap`

.. versionadded:: 1.0.0

.. versionadded:: 1.0.0
"""

try:
a = self.p_components
a = self.results.p_components
except AttributeError:
raise ValueError(
'Call run() on the PCA before using cumulative_overlap')
Expand Down Expand Up @@ -533,6 +598,7 @@ def rmsip(a, b, n_components=None):
0 indicates that they are mutually orthogonal, whereas 1 indicates
that they are identical.


.. versionadded:: 1.0.0
"""
n_components = util.asiterable(n_components)
Expand Down Expand Up @@ -586,6 +652,7 @@ def cumulative_overlap(a, b, i=0, n_components=None):
0 indicates that they are mutually orthogonal, whereas 1 indicates
that they are identical.


.. versionadded:: 1.0.0
"""

Expand Down
Loading