Skip to content

Allow PCA to run with given reference mean values#3296

Merged
orbeckst merged 15 commits intoMDAnalysis:developfrom
fiona-naughton:fix-pca-mean
May 26, 2021
Merged

Allow PCA to run with given reference mean values#3296
orbeckst merged 15 commits intoMDAnalysis:developfrom
fiona-naughton:fix-pca-mean

Conversation

@fiona-naughton
Copy link
Contributor

@fiona-naughton fiona-naughton commented May 9, 2021

Fixes #2728

Changes made in this Pull Request:

  • Fix shape of positions array from provided reference atomgroup so the PCA will actually run
  • Update the corresponding test to actually run with given mean values

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented May 9, 2021

Hello @fiona-naughton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-05-26 21:51:01 UTC

@codecov
Copy link

codecov bot commented May 9, 2021

Codecov Report

Merging #3296 (4eb73a5) into develop (3500131) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff            @@
##           develop    #3296   +/-   ##
========================================
  Coverage    93.60%   93.61%           
========================================
  Files          176      176           
  Lines        22819    22820    +1     
  Branches      3223     3224    +1     
========================================
+ Hits         21360    21363    +3     
+ Misses        1408     1406    -2     
  Partials        51       51           
Impacted Files Coverage Δ
package/MDAnalysis/analysis/pca.py 100.00% <100.00%> (+1.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3500131...4eb73a5. Read the comment docs.

# create dummy atomgroup to populate with the mean values
num_atoms = len(u.select_atoms(SELECTION))
ag = mda.Universe.empty(num_atoms, trajectory=True).select_atoms('all')
ag.positions = pca_aligned.mean.reshape((num_atoms, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the docs we should just be able to use mean_atoms back as the atomgroup with the mean positions right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the current docs, yes - but, per discussion in #3285, it seems that mean_atoms does not actually contain the mean positions. If we fix that, then we could just use mean_atoms here; I decided to do it roundabout like this for now with a view of getting the more 'urgent' changes merged rather than debating exactly what mean_atoms should be doing (imo the doc changes in #3285 reflect what mean_atoms seem to actually be storing atm so it's not a big after that goes through).

All that being said, there is still the option to fix mean_atoms here (or in 3285), if that seems better? (Personally, it feels like both means and mean_atoms would be better as position arrays rather than atomgroups, but I'm not super familiar with pca and how people use this part of the input/output so others should weigh in first)

Copy link
Member

@IAlibay IAlibay May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO part of fixing issue #2728 is also fixing mean_atoms (i.e. the whole cycle of outputing the mean to reading it).

Given that it didn't work before I'd be ok with switching to a position array (less likely to get overwritten if ts changes), but I think @orbeckst had some views here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if I had views here.

But looking at the docs and how this is being used, I'd be ok with changing means to be a coordinate array (N, 3) so that one can pass through means=ag.positions.

We didn't deprecate anything here so we can

  1. either argue that this is a fix of something that never worked anyway and switch to array or
  2. for 2.x, also allow passing of means=ag and then internally pull the positions out (try: means = means.positions; except AttributeError: pass; means = np.ravel(means)) and deprecate passing of ag.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean_atoms is really fragile

        self.mean_atoms = self._atoms
        self.mean_atoms.positions = self._atoms.positions

This will get overwritten immediately unless it's an in-memory universe and then suddenly frame 0 contains the mean. Updating positions in the Universe is unpredictable and we should get rid of mean_atoms.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd vote for just fixing it properly now (as a coordinate array) and not going through a deprecation cycle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me.

@IAlibay IAlibay added this to the 2.0 milestone May 10, 2021
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I consider the whole "supply/use mean" broken since the beginning. This means that whatever we want to do, we can do without deprecation: it's a fix (but we can add code to honor the docs).

  1. I'd get rid of PCA.mean_atoms: it doesn't give the right positions most of the time and it's not clear why you'd need it.
  2. Make means a (N. 3) array so that you can pass means=ag.positions. Store internally self.means = np.asarray(means) and maybe the ravelled versioln self._xmean = np.ravel(means) (for speed — although for a normal protein with 3816 atoms, I got for ravel 39.2 µs ± 176 ns per loop, so speed is probably not an issue). (Optionally, allow means=ag and extract the positions from the ag — this would make it backwards compatible with the docs.)

It would also be good if the docs showed the run() method.

Parenthetically, why do we require Universe and select? Wouldn't it be cleaner to just input the AtomGroup?

# create dummy atomgroup to populate with the mean values
num_atoms = len(u.select_atoms(SELECTION))
ag = mda.Universe.empty(num_atoms, trajectory=True).select_atoms('all')
ag.positions = pca_aligned.mean.reshape((num_atoms, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if I had views here.

But looking at the docs and how this is being used, I'd be ok with changing means to be a coordinate array (N, 3) so that one can pass through means=ag.positions.

We didn't deprecate anything here so we can

  1. either argue that this is a fix of something that never worked anyway and switch to array or
  2. for 2.x, also allow passing of means=ag and then internally pull the positions out (try: means = means.positions; except AttributeError: pass; means = np.ravel(means)) and deprecate passing of ag.

# create dummy atomgroup to populate with the mean values
num_atoms = len(u.select_atoms(SELECTION))
ag = mda.Universe.empty(num_atoms, trajectory=True).select_atoms('all')
ag.positions = pca_aligned.mean.reshape((num_atoms, 3))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mean_atoms is really fragile

        self.mean_atoms = self._atoms
        self.mean_atoms.positions = self._atoms.positions

This will get overwritten immediately unless it's an in-memory universe and then suddenly frame 0 contains the mean. Updating positions in the Universe is unpredictable and we should get rid of mean_atoms.

@fiona-naughton
Copy link
Contributor Author

mean_atoms has been removed and I switched mean to an array. I decided against allowing atomgroups to be passed in for now and depreciating, since part of the original error means that setting mean didn't work anyway, so that implies people weren't using it anyway (without patching the code themselves) - but I'm open to if others think it's better we do.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

(I'm just fixing 2 pep8 things)

@orbeckst orbeckst self-assigned this May 26, 2021
@orbeckst
Copy link
Member

@IAlibay I'll merge this unless you object or CI is unhappy.

@orbeckst
Copy link
Member

Sigh, not sure what sphinx is unhappy about:

/home/runner/work/mdanalysis/mdanalysis/package/MDAnalysis/analysis/pca.py:docstring of MDAnalysis.analysis.pca.PCA.transform:5:Inline literal start-string without end-string.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall lgtm thanks @fiona-naughton, please do merge in the following PEP8 fixes

(I lack sleep, so now all PEP8 messages are movie meme-based).

PEP8inator II — The Blackening

Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
@orbeckst orbeckst merged commit 560feed into MDAnalysis:develop May 26, 2021
@orbeckst
Copy link
Member

Thank you @fiona-naughton for fixing this one!

@IAlibay IAlibay added the defect label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

PCA test sometimes fails...

4 participants