Allow PCA to run with given reference mean values#3296
Allow PCA to run with given reference mean values#3296orbeckst merged 15 commits intoMDAnalysis:developfrom
Conversation
|
Hello @fiona-naughton! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-05-26 21:51:01 UTC |
Codecov Report
@@ Coverage Diff @@
## develop #3296 +/- ##
========================================
Coverage 93.60% 93.61%
========================================
Files 176 176
Lines 22819 22820 +1
Branches 3223 3224 +1
========================================
+ Hits 21360 21363 +3
+ Misses 1408 1406 -2
Partials 51 51
Continue to review full report at Codecov.
|
| # create dummy atomgroup to populate with the mean values | ||
| num_atoms = len(u.select_atoms(SELECTION)) | ||
| ag = mda.Universe.empty(num_atoms, trajectory=True).select_atoms('all') | ||
| ag.positions = pca_aligned.mean.reshape((num_atoms, 3)) |
There was a problem hiding this comment.
according to the docs we should just be able to use mean_atoms back as the atomgroup with the mean positions right?
There was a problem hiding this comment.
From the current docs, yes - but, per discussion in #3285, it seems that mean_atoms does not actually contain the mean positions. If we fix that, then we could just use mean_atoms here; I decided to do it roundabout like this for now with a view of getting the more 'urgent' changes merged rather than debating exactly what mean_atoms should be doing (imo the doc changes in #3285 reflect what mean_atoms seem to actually be storing atm so it's not a big after that goes through).
All that being said, there is still the option to fix mean_atoms here (or in 3285), if that seems better? (Personally, it feels like both means and mean_atoms would be better as position arrays rather than atomgroups, but I'm not super familiar with pca and how people use this part of the input/output so others should weigh in first)
There was a problem hiding this comment.
There was a problem hiding this comment.
I don't remember if I had views here.
But looking at the docs and how this is being used, I'd be ok with changing means to be a coordinate array (N, 3) so that one can pass through means=ag.positions.
We didn't deprecate anything here so we can
- either argue that this is a fix of something that never worked anyway and switch to array or
- for 2.x, also allow passing of
means=agand then internally pull the positions out (try: means = means.positions; except AttributeError: pass; means = np.ravel(means)) and deprecate passing of ag.
There was a problem hiding this comment.
mean_atoms is really fragile
self.mean_atoms = self._atoms
self.mean_atoms.positions = self._atoms.positionsThis will get overwritten immediately unless it's an in-memory universe and then suddenly frame 0 contains the mean. Updating positions in the Universe is unpredictable and we should get rid of mean_atoms.
There was a problem hiding this comment.
I'd vote for just fixing it properly now (as a coordinate array) and not going through a deprecation cycle.
orbeckst
left a comment
There was a problem hiding this comment.
I consider the whole "supply/use mean" broken since the beginning. This means that whatever we want to do, we can do without deprecation: it's a fix (but we can add code to honor the docs).
- I'd get rid of
PCA.mean_atoms: it doesn't give the right positions most of the time and it's not clear why you'd need it. - Make
meansa (N. 3) array so that you can passmeans=ag.positions. Store internallyself.means = np.asarray(means)and maybe the ravelled versiolnself._xmean = np.ravel(means)(for speed — although for a normal protein with 3816 atoms, I got for ravel 39.2 µs ± 176 ns per loop, so speed is probably not an issue). (Optionally, allowmeans=agand extract the positions from the ag — this would make it backwards compatible with the docs.)
It would also be good if the docs showed the run() method.
Parenthetically, why do we require Universe and select? Wouldn't it be cleaner to just input the AtomGroup?
| # create dummy atomgroup to populate with the mean values | ||
| num_atoms = len(u.select_atoms(SELECTION)) | ||
| ag = mda.Universe.empty(num_atoms, trajectory=True).select_atoms('all') | ||
| ag.positions = pca_aligned.mean.reshape((num_atoms, 3)) |
There was a problem hiding this comment.
I don't remember if I had views here.
But looking at the docs and how this is being used, I'd be ok with changing means to be a coordinate array (N, 3) so that one can pass through means=ag.positions.
We didn't deprecate anything here so we can
- either argue that this is a fix of something that never worked anyway and switch to array or
- for 2.x, also allow passing of
means=agand then internally pull the positions out (try: means = means.positions; except AttributeError: pass; means = np.ravel(means)) and deprecate passing of ag.
| # create dummy atomgroup to populate with the mean values | ||
| num_atoms = len(u.select_atoms(SELECTION)) | ||
| ag = mda.Universe.empty(num_atoms, trajectory=True).select_atoms('all') | ||
| ag.positions = pca_aligned.mean.reshape((num_atoms, 3)) |
There was a problem hiding this comment.
mean_atoms is really fragile
self.mean_atoms = self._atoms
self.mean_atoms.positions = self._atoms.positionsThis will get overwritten immediately unless it's an in-memory universe and then suddenly frame 0 contains the mean. Updating positions in the Universe is unpredictable and we should get rid of mean_atoms.
|
|
orbeckst
left a comment
There was a problem hiding this comment.
Excellent!
(I'm just fixing 2 pep8 things)
|
@IAlibay I'll merge this unless you object or CI is unhappy. |
|
Sigh, not sure what sphinx is unhappy about: |
IAlibay
left a comment
There was a problem hiding this comment.
Overall lgtm thanks @fiona-naughton, please do merge in the following PEP8 fixes
(I lack sleep, so now all PEP8 messages are movie meme-based).
PEP8inator II — The Blackening Co-authored-by: Irfan Alibay <IAlibay@users.noreply.github.com>
|
Thank you @fiona-naughton for fixing this one! |
Fixes #2728
Changes made in this Pull Request:
PR Checklist