Skip to content

*Group.unwrap#2189

Merged
richardjgowers merged 21 commits intoMDAnalysis:developfrom
zemanj:unwrap
Feb 8, 2019
Merged

*Group.unwrap#2189
richardjgowers merged 21 commits intoMDAnalysis:developfrom
zemanj:unwrap

Conversation

@zemanj
Copy link
Member

@zemanj zemanj commented Jan 30, 2019

Fixes (in part) #1004, #1185

Changes made in this Pull Request:

  • Added *Group.unwrap(self, box=None, compound='fragments', reference='com', inplace=True)

Since I'd really like to have this feature and there wasn't too much progress going on lately in PRs #2012 and #2038, I decided to go ahead and get this going. Using the method in other methods such as *Group.center*() and the like should be straightforward but is not yet covered here.

Since in most cases, boxes are orthorhombic and the majority of molecules are not split accross boundaries, I implemented a short-cut so that only those fragments that are potentially broken are made whole. This yielded a speed-up of a factor between 4 and 5 compared to a simple loop blindly calling make_whole(fragment) on every fragment in the group (as it's implemented in PR #2038 at the moment).

Tests and changelog entry are yet to come.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@codecov
Copy link

codecov bot commented Jan 30, 2019

Codecov Report

Merging #2189 into develop will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2189      +/-   ##
===========================================
+ Coverage    89.48%   89.53%   +0.04%     
===========================================
  Files          157      157              
  Lines        18819    18883      +64     
  Branches      2723     2741      +18     
===========================================
+ Hits         16841    16906      +65     
  Misses        1377     1377              
+ Partials       601      600       -1
Impacted Files Coverage Δ
package/MDAnalysis/core/groups.py 97.6% <100%> (+0.17%) ⬆️
package/MDAnalysis/core/topologyattrs.py 97.14% <100%> (ø) ⬆️
package/MDAnalysis/coordinates/PDB.py 89.93% <0%> (+0.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 992bd73...986e59f. Read the comment docs.

@kain88-de
Copy link
Member

You can try a DNA as an example. In GROMACS the two DNA strands are modeled as individual fragments. An unwrap algorithm should however still keep both strands together. With Gromacs that required several passes over the trajectory. I will check if I can share an example configuration.

@zemanj
Copy link
Member Author

zemanj commented Jan 30, 2019

We still have the problem that using make_whole() on periodic molecules (think, for example, of an infinitely periodic graphene sheet or carbon nanotube) is likely to yield unwanted results.
But maybe... if people try unwrapping periodic molecules, we should give them what they deserve. 😁

Another issue is that find_fragments() does not correctly handle dummy particles since for some reason these don't show up as bonded atoms. For example, unwrapping TIP4P water from a gromacs topology / trajectory fails.

@zemanj
Copy link
Member Author

zemanj commented Jan 30, 2019

I just thought about having a "lazy" make_whole()utility which is able to perform unwrapping without bond information. This would be possible if the compound to unwrap does not span more than half a box length (in ortho boxes).
We could then make unwrap() an attribute of GroupBase and add a use_bonds kwarg which defaults to True. What do you think?

@zemanj
Copy link
Member Author

zemanj commented Jan 30, 2019

@kain88-de Regarding the DNA example: Right now my implementation does not keep groups of individual compounds together, since this would additionally require making the COMs or COGs of the compounds whole. Such a functionality should either go into a separate method or it should be accessible via an optional kwarg. Furthermore, this would indeed require a method which is able to compute the necessary shifts from the (naturally non-bonded) COMs or COGs of the respective compounds.

@jbarnoud
Copy link
Contributor

Si this is supposed to supersede #2012, right? Is everything from #2012 in here already?

@zemanj
Copy link
Member Author

zemanj commented Jan 30, 2019

@jbarnoud That's a bit hard to judge since #2012 still has some TODOs and as far as I could see several lines of code which need fixing. Nevertheless, I'm pretty sure my implementation already covers most of what #2012 is aiming at.

I didn't fully get what the choose_image() method is supposed to be good for; IMHO unwrapping the centers of compounds (see my comment above) to keep them as closely together as possibe cannot be achieved by solely arranging the compounds' centers around a specific position in the cell.
Maybe @richardjgowers can shed some light on what exactly the choose_image() function accomplishes?
The "keep clusters of compounds intact" functionality is not yet included in this PR, though (but will be very soon).
What I found problematic in #2012 is that the unwrap() method effectively acts on atoms which aren't part of the group (group.fragments >= group!). This is handled properly in this PR (I hope).

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, but a few suggested tweaks


Note
----
* This is the inverse transformation of packing atoms into the primary
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some ascii in make_whole which you can steal for this. Maybe also a seealso for pack_into_box (which maybe we can rename wrap?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*Group.wrap() already exists and extends the functionality of *Group.pack_into_box(). I'm in favor of deprecating pack_into_box().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that pack_into_box() can do out-of-place packing, while wrap() always acts inplace. That should be changed. Furthermore, wrap() potentially acts on atoms outside of the calling group. That's dangerous and should be changed as well.

if ortho:
positions = unique_atoms.positions
# Only unwrap if necessary:
if np.any(np.abs(positions - positions[0]) > hbox):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this optimisation is cute, but maybe deserves to live inside of make_whole

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true! That way, the optimization will always be applied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if it's done in the Cython level you can exit the any() loop early, I think this will do the whole calculation then evaluate the boolean values

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For an atomgroup consisting of 500 fragments with 25 atoms per fragment, i measured the following runtimes:

  • without optimization: 154 ms,
  • optimization in make_whole(): 65 ms
  • optimization in Python: 66.5 ms

Turns out the speed-up is not that great anymore. This simple loop runs in 22 ms on the same system:

def group_make_whole(ag):
    box = ag.dimensions
    fragments = ag.fragments
    if np.any(box[3:] != 90.0):
        for f in fragments:
            mda.lib.mdamath.make_whole(f)
    else:
        hbox = 0.5 * box[:3]
        for f in fragments:
            pos = f.positions
            if np.any(np.abs(pos - pos[0]) > hbox):
                mda.lib.mdamath.make_whole(f)

Seems to be the rest of the machinery that slows down group.unwrap().

try:
compound_indices = unique_atoms.molnums
except AttributeError:
raise NoDataError("Cannot use compound='molecules': "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existing attributeerror should suffice rather than adding another layer here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I put this here intentionally since the keyword is called 'molecules' but the AttributeError would be raised for molnums, which might be confusing. Same as in *Group.center(), btw.

@richardjgowers
Copy link
Member

@zemanj choose_image is meant to translate an AtomGroup in boxlengths (ie not really moving it in terms of PBC) to minimise its distance from a given reference point. Its to reduce code duplication as it appears elsewhere too.

@zemanj
Copy link
Member Author

zemanj commented Jan 30, 2019

@richardjgowers

Its to reduce code duplication as it appears elsewhere too.

Where else is it used? I can't find it...

@richardjgowers
Copy link
Member

@zemanj this is doing the image shift: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/core/groups.py#L1251

But more generally I think AtomGroup.choose_image is useful enough to have it solo. Can be a future PR to limit the scope of this one though

@zemanj
Copy link
Member Author

zemanj commented Jan 30, 2019

@richardjgowers Requests are addressed, I'll write the tests tomorrow.

@richardjgowers
Copy link
Member

@zemanj ok cool, this is looking really good. Once we've merged this it should be fairly easy to

  • make center_of_mass() just call self.unwrap(compound='group') to get coordinates to operate on.
  • make the transformation thing just call this method of Groups

@zemanj
Copy link
Member Author

zemanj commented Feb 4, 2019

Even though unwrapping requires Bonds, I moved *Group.unwrap() from core.topologyattrs.Bonds to core.groups.GroupBase for two reasons:

  1. unwrap() should also be accessible from ResidueGroup and SegmentGroup to be in line with *Group.wrap()
  2. If the topology has no Bonds, the error (NoDataError) and corresponding message should be coherent and sensible for all kinds of *Groups.

One can only accomplish both by either adding a custom __getattr__() not only for AtomGroup but also for the other ones, or by moving the unwrap() method to GroupBase and have it check for the Bonds attribute of its atoms. I chose the latter possibility since AtomGroup.__getattr__() already has a comment stating it's deprecated (although I don't really know why).

@zemanj
Copy link
Member Author

zemanj commented Feb 4, 2019

I changed the dtype of molnums from int to np.int64 to make appveyor happy. The Windows build complained about molnums being of type long instead of int64_t when passing it to lib.mdamath.unique_int_1d(). Should be all good to go now.

@zemanj
Copy link
Member Author

zemanj commented Feb 4, 2019

There are at least two minor issues (unnecessary import in topologyattr.py and missing test for a universe without masses) I still need to take care of, so this is not yet ready for review.

.. versionadded:: 0.20.0
"""
atoms = self.atoms
# bail out early if no bonds in topology:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a requires decorator that can do this type of thing (and create uniform error messages)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers I know. The @requires() decorator only works on AtomGroups, that's why it's not used here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One could extend the @requires decorator to check for GroupBase instances instead of AtomGroup and then check the group's .atoms member for the given attribute(s). However, I don't like this solution since it

  1. doesn't allow to check for ResiduGroup or SegmentGroup attributes and
  2. more importantly, isn't lightweight anymore since it involves object instantiation when accessing the .atoms member.

The check I implemented doesn't instantiate additional objects and its impact on performance should therefore be negligible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, really we could have something like assert 'bonds' in u.topology which works... I'll raise an issue. I just like error messages being uniform, so hitting eg 'no masses' always looks similar

@zemanj
Copy link
Member Author

zemanj commented Feb 6, 2019

Ok all checks passed. I'll just add changelog entries and this is good to go.
@MDAnalysis/coredevs pls review

@tylerjereddy
Copy link
Member

I restarted the timed out Mac entry in the Travis CI matrix. The test suite ran to completion and then the job seemed to hang. Not sure if that's been happening here lately, but travis_wait can always be added in another PR if that proves problematic moving forward.

100 % patch diff coverage is very nice to see--definitely comforting if reviewers don't really have the bandwidth to go over everything. Looks like Richard has checked more thoroughly than I.

Does this mean MDAnalysis will soon be able to process my virus trajectories (rhombic dodecahedron with superstructure split all over the place requiring all kinds of trjconv processing & translations)?

@zemanj
Copy link
Member Author

zemanj commented Feb 6, 2019

@tylerjereddy

Does this mean MDAnalysis will soon be able to process my virus trajectories (rhombic dodecahedron with superstructure split all over the place requiring all kinds of trjconv processing & translations)?

If you mean that your system's periodic unit cell is a rhombic dodecahedron, then the answer is no.
Otherwise, that depends on what you want to achieve. This PR so far only addresses the problem of making bonded structures (i.e., single molecules) whole.
If you have non-bonded structures such as molecule clusters or complexes in the system and need to keep those intact across PBC, that will require cluster unwrapping. Cluster unwrapping is about to come in a follow-up PR very soon.

@richardjgowers
Copy link
Member

@zemanj I think you can represent a rhombic dodec as a triclinic cell (I think I read somewhere a triclinic cell can represent any 3d tiling shape...)

@tylerjereddy I think this will make your unsplitting dreams come true... but if you could check before we merge that would be great. Am I right that you can use a triclinic representation?

is_unwrapped = True
for i in range(1, natoms):
for j in range(3):
if fabs(oldpos[i, j] - oldpos[0, j]) >= half_box[j]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fabsulous

from MDAnalysis.lib.mdamath import triclinic_box


def unwrap_test_universe(have_bonds=True, have_masses=True, have_molnums=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

niiiiice

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah writing tests took longer than implementing the feature itself... The cool thing about this is that the minimal test universe makes the test lightning fast and (hopefully) covers all relevant situations. I also designed it with cluster unwrapping in mind.

@zemanj
Copy link
Member Author

zemanj commented Feb 6, 2019

@richardjgowers @tylerjereddy

I think you can represent a rhombic dodec as a triclinic cell (I think I read somewhere a triclinic cell can represent any 3d tiling shape...)

Not sure about any space-filling 3d shape (sounds plausible, though), but definitely true for a rhombic dodecahedron (gromacs is doing it that way).

@tylerjereddy
Copy link
Member

I probably won't get a chance to test on my system before the merge, but sounds like these are solid steps in that direction.

I guess when I do get around to it, I can open an issue / PR if there's something special that needs tweaking or adding in.

@richardjgowers richardjgowers merged commit 3d73967 into MDAnalysis:develop Feb 8, 2019
@zemanj
Copy link
Member Author

zemanj commented Feb 8, 2019

@richardjgowers thanks! 👍

@zemanj zemanj mentioned this pull request Nov 15, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants