Skip to content

Added fitting transformations#1991

Merged
tylerjereddy merged 9 commits intoMDAnalysis:developfrom
davidercruz:fitting-transforms
Jul 10, 2019
Merged

Added fitting transformations#1991
tylerjereddy merged 9 commits intoMDAnalysis:developfrom
davidercruz:fitting-transforms

Conversation

@davidercruz
Copy link
Contributor

@davidercruz davidercruz commented Jul 17, 2018

Changes made in this Pull Request:

  • Adds transformation based on alignto, from the MDAnalysis.analysis.align module, as a way to remove rotations and translations of a given AtomGroup in a trajectory
  • Adds a transformation that removes translations overall or on xy, xz or yz planes.
  • added tests

TODO in this PR:

  • removal of rotations and translations on a given plane.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Jul 17, 2018

Codecov Report

Merging #1991 into develop will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #1991      +/-   ##
==========================================
+ Coverage    89.66%   89.7%   +0.03%     
==========================================
  Files          172     173       +1     
  Lines        21421   21493      +72     
  Branches      2792    2799       +7     
==========================================
+ Hits         19208   19280      +72     
  Misses        1616    1616              
  Partials       597     597
Impacted Files Coverage Δ
package/MDAnalysis/transformations/fit.py 100% <100%> (ø)
package/MDAnalysis/transformations/__init__.py 100% <100%> (ø) ⬆️
transformations/__init__.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2f87852...df3684c. Read the comment docs.

@davidercruz
Copy link
Contributor Author

I added a simpler form of alignto, but modified to remove rotation and translation on a given plane. I added a test to the output coordinates to check if the code is working. I'll add more extensive tests later

@davidercruz
Copy link
Contributor Author

The calculations were wrong in the fit_rot_trans function. They were passing the tests because the reference wasn't rotated in relation to the mobile atomgroup. I modified my tests to correct this.

This fit_rot_trans function can replace the alignto version in the same file. It is simpler but also works with planes.

While correcting the errors in this function I realised I made some mistakes in PR #1937. I'll have to open an issue to correct those.

Now, I face an issue. When I use atom masses as weights for alignto (and consequently fit_rot_trans) the resulting position arrays are close - but not in the decimal range as it happens when every atom has the same weight (weight = None). Furthermore, subsequent alignments result in improved RMSD.
Shouldn't the output of the first be the best fitting possible for those two atomgroups and weights?

If alignto stays the same, how do I test this "similarity" in my transformations?

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably missed many things. The quality of this PR is much better than the previous ones. I am ery happy with the progress I see :)

@jbarnoud
Copy link
Contributor

Now, I face an issue. When I use atom masses as weights for alignto (and consequently fit_rot_trans) the resulting position arrays are close - but not in the decimal range as it happens when every atom has the same weight (weight = None). Furthermore, subsequent alignments result in improved RMSD.
Shouldn't the output of the first be the best fitting possible for those two atomgroups and weights?

I am not sure I understand. Could you elaborate?

return wrapped


def fit_rot_trans(ag, reference, plane=None, weights=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what makes this different from alignto but it seems to me that alignto here is just a special case of this function. Why not have it all in one instead of two confusingly similar functions?

@jbarnoud
Copy link
Contributor

Also, for all your PRs, have a look at the coverage report.

@davidercruz
Copy link
Contributor Author

@jbarnoud @kain88-de Sorry for the delay. The way I was calculating the plane rotation and translation fittings was not correct. I was trying to find an alternative to finding the euler angles of the rmsd fitting rotation matrix.
One idea would be to perform a 2D rmsd fitting. I decided then to commit what I had and then research on how to write that fitting.
However, when I was testing my code against the output of trjconv -fit rotxy+transxy on the trajectory of a membrane peptide and, to my surprise, they were equal ( .07A RMSD)!

Now, I just need some suggestions for testing this plane fitting feature.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 26, 2018 via email

@davidercruz
Copy link
Contributor Author

davidercruz commented Jul 26, 2018

@jbarnoud Actually, rotxy+transxy does'n work like that. When fitting to the xy plane, only the z coordinates are preserved. Thus the group that you choose to fit will not rotate on the z axis (so you rotate it on Z every frame so it becomes as close as possible to the reference), but will translate on the z axis - but not in the x and y axis. This makes this fitting useful when, for example, you want to see molecule insertion in membranes, or keep the molecule as static as possible while keeping the membrane horizontal.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 26, 2018 via email

@orbeckst
Copy link
Member

@davidercruz (and @jbarnoud ?) could you please summarize what needs to be done to get this PR done?

I am not looking for a promise that this gets done over the next weekend/holidays but something that will help to get the work done. This is an open source project so I understand that everyone is short on time and priorities shift but it would be a real shame if this didn't make it into MDAnalysis. It is always possible that someone else might find the energy to finish it (e.g., @ianmkenney in my group has recently been playing with the OTFT) but in this case one should make it easy for others to pick up the work, especially if one has not been able to finish things.

@jbarnoud
Copy link
Contributor

jbarnoud commented Dec 14, 2018 via email

@orbeckst
Copy link
Member

@jbarnoud @richardjgowers @davidercruz any chance that this gets into 0.20.0 #2240 (as we had promised in the 0.19.x blog post)?

orbeckst added a commit to MDAnalysis/binder-notebook that referenced this pull request Apr 26, 2019
- fix #13
- needs MDAnalysisData >= 0.7.0 for the "membrane peptide" dataset
- needs MDAnalysis >= 0.20.0 for the transformations
  (currently PRs MDAnalysis/mdanalysis#2038 ,
  MDAnalysis/mdanalysis#1991 , and
  MDAnalysis/mdanalysis#1973 )
- updated text in notebook
- adde a few more empty lines for clarity
Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestEncore.test_triangular_matrix was failing in some CI entries--do we need some skips there? I know we've had issues with encore test fragility in the past & doesn't seem to be related to changes here.

I ran coverage and full test suite on this branch locally & it seemed okay. Pretty unlikely to break any old code at first glance, since it is effectively a new module & no old tests modified.

We should mark as "resolved" all reviewer comments that have been addressed, so that it is clearer which are outstanding--I'll try to start on that now.

@tylerjereddy
Copy link
Member

Oh right, we patched a CVE vulnerability in NumPy upstream related to pickling and np.load, so we need to adjust our code a bit for encore I think, otherwise CI will fail.

@tylerjereddy
Copy link
Member

I've made a note to return to this PR Thursday--hopefully by then the CVE-related test failure is resolved / merged & this PR can be presented with CI all green / coverage solid. Looks like we almost have an approving review from @jbarnoud, so shouldn't be that hard to push this over the line now.

@davidercruz
Copy link
Contributor Author

Thank you @tylerjereddy

orbeckst added a commit to MDAnalysis/binder-notebook that referenced this pull request Apr 29, 2019
- fix #13
- needs MDAnalysisData >= 0.7.0 for the "membrane peptide" dataset
- needs MDAnalysis >= 0.20.0 for the transformations
  (currently PRs MDAnalysis/mdanalysis#2038 ,
  MDAnalysis/mdanalysis#1991 , and
  MDAnalysis/mdanalysis#1973 )
- updated text in notebook
- adde a few more empty lines for clarity
@tylerjereddy
Copy link
Member

Rebased on latest develop branch--hopefully this gets us to green CI with a cov report for reviewers to see.

@tylerjereddy tylerjereddy force-pushed the fitting-transforms branch from ba26315 to 20b6594 Compare May 1, 2019 01:24
@tylerjereddy
Copy link
Member

CI was all green, but I've tried to bump test coverage up to 100% on the patch diff. Marked one more reviewer comment as resolved.

@tylerjereddy
Copy link
Member

Our CI is definitely prone to issues---looks like CI helpers is frequently stalling in Travis. Whoever is / was in favor of CI helpers may want to work with them to smooth it out.

@tylerjereddy
Copy link
Member

@richardjgowers Does the bottom right plot above make intuitive sense to you?

@richardjgowers
Copy link
Member

@tylerjereddy I'm not 100% on how this all works, but looks like a bug to me, I'd think that the center should still remain constant, just it's defined as COM not COG.

@tylerjereddy
Copy link
Member

looks like a bug to me

I'm not so sure--I've been thinking about this and produced a plot that tracks the center of mass and that is indeed fixed as you can see below for the same case as bottom right centroid-tracking plot above. So, this means that fit_translation in the xy plane, weighted for atomic masses, will hold the COM relatively constant, but can lead to "mobility" of the centroid. If the peptide is tilting around in the bilayer or something like that, this might make sense that the compensations look like mobility from the centroid standpoint?

pr_1991_fig2

@tylerjereddy
Copy link
Member

If we can agree that is reasonable then I'll move on to evaluating the other fitting function.

@orbeckst
Copy link
Member

orbeckst commented Jun 4, 2019

I also noticed that the machinery seems to simply apply the transformations to each ts--so this approach is more about standardizing transformations & making it easy to stack them vs. say a magically faster approach to applying them under the hood? That's probably still quite useful of course to avoid all the disk space from -ur compact -pbc mol processed trajectories eventually.

Stacking should be faster because loading data from disk is typically 5-10x more costly than the actual calculation, at least for "simple" (or highly optimized) things such as RMSD. If you can load once and calculate twice, you can probably get almost 2x overall speed-ups.

@orbeckst
Copy link
Member

orbeckst commented Jun 4, 2019

So, this means that fit_translation in the xy plane, weighted for atomic masses, will hold the COM relatively constant, but can lead to "mobility" of the centroid. If the peptide is tilting around in the bilayer or something like that, this might make sense that the compensations look like mobility from the centroid standpoint?

The only time when COG is always guaranteed to be the same as COM is when all masses are identical. Otherwise they should be different (in general – I think with an inversion symmetry they will still be identical but that seems far fetched here and easily destroyed by thermal motions).

If we can agree that is reasonable then I'll move on to evaluating the other fitting function.

I agree that this is reasonable. Thank you for the detailed evaluation!

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding the figure panel to include the second fitting function in this PR suggests that they can produce very similar results, which is a good sign:

pr_1991

However, my first draft comparison of the rmsd-to-reference profiles is a little strange on the magnitude side, albeit still sensibly favoring the the RMSD-based approach. Nonetheless, I suspect these numbers are a little off--perhaps something is off in my test code:

pr_1991_rmsd

I've assumed that using MDAnalysis.analysis.rms.rmsd(ag.positions, copy_of_ref_positions) would suffice throughout.

Once we sort through this RMSD thing, I think I can just push in my minor changes & that's about it.

@orbeckst
Copy link
Member

@tylerjereddy the RMSD is so high because the peptide in the trajectory is broken across the box. I just visualized it.

If this is all then I think this looks sensible.

Copy link
Member

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've pushed in revisions to address my own minor doc refinements & am now marking this as approved.

There's just one unresolved reviewer comment left, and while Max's concern that alignto has some API overlap is solid, I do suggest we accept the minor blemish & move on / forward, given our constraints on reviewer bandwidth and so on.

@tylerjereddy
Copy link
Member

@orbeckst Thanks, I often work on WSL in the evenings, so it is a little more annoying to try doing visualizations from command line.

@tylerjereddy
Copy link
Member

tylerjereddy commented Jun 12, 2019

Feel free to rebase if you feel the need, but as long as there are no merge conflicts I suspect most of our CI just does merges into develop anyway.

@tylerjereddy
Copy link
Member

Looks like one of the Travis entries has a setup issue of some sort. Maybe a second version of NumPy is getting picked up or something like that.

@zemanj
Copy link
Member

zemanj commented Jun 13, 2019

The problem is that conda installs a too old scipy version

scipy: 0.19.1-py35_blas_openblas_202 conda-forge [blas_openblas]

even though we require >1.0. Later, when the $BUILD_CMD is evaluated, the build notices the missing requirement

Collecting scipy>=1.0.0 (from MDAnalysis==0.19.3.dev0)

And a little further below:

ERROR: scipy 1.3.0 has requirement numpy>=1.13.3, but you'll have numpy 1.10.4 which is incompatible.

@tylerjereddy
Copy link
Member

Oy, thanks. That should probably be simplified & ideally hard fail earlier on if there's a problem, I suspect.

@orbeckst
Copy link
Member

@zemanj could you please raise an issue with your problem diagnosis #1991 (comment) ? Many thanks!!

@tylerjereddy
Copy link
Member

tylerjereddy commented Jul 9, 2019

Rebased on latest develop & force pushed -- hopefully CI is finally green

@tylerjereddy
Copy link
Member

CI is all green -- in it goes. The original PR description includes a checkmark for CHANGELOG entry, but I don't see that. It can always be done in a follow-up.

Thanks @davidercruz & others.

@tylerjereddy tylerjereddy merged commit b1c68e4 into MDAnalysis:develop Jul 10, 2019
@orbeckst
Copy link
Member

Thank you all for the hard work getting it in!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants