Skip to content

Box centering transformation#1946

Merged
jbarnoud merged 22 commits intoMDAnalysis:developfrom
davidercruz:center-transform
Jul 4, 2018
Merged

Box centering transformation#1946
jbarnoud merged 22 commits intoMDAnalysis:developfrom
davidercruz:center-transform

Conversation

@davidercruz
Copy link
Contributor

Adds a centering function

This function takes an AtomGroup as argument and will translate it's center of mass/geometry to the center of the unit cell. Optional arguments include a user defined box, AtomGroup center type (mass or geometry) and whether to use PBC or not in AtomGroup center calculation.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@davidercruz davidercruz changed the title Center transform Box centering transformation Jun 19, 2018
@coveralls
Copy link

coveralls commented Jun 19, 2018

Coverage Status

Coverage increased (+0.02%) to 89.91% when pulling 94f0196 on davidercruz:center-transform into 691e785 on MDAnalysis:develop.

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some comments from a very quick read through. I'll review the PR in more details latter.

Your tests should cover erroneous uses to make sure it fails as expected. Please add tests for each exception you raise. There are many reasons to do that but here are the two that comes to my mind right away:

  • exceptions are part of the API, if the way the exceptions are raises changes we break some user's code and we want to know about it as soon as possible;
  • raising an exception is code, but it is hopefully rarely executed, this means that we are less likely to catch en error in that code. It became painfully obvious that raising an exception is code when we ported MDAnalysis to python 3 and replaced the old %-based string formating by the more recent format-based one; suddenly we had a lot of unusable errors while the tests were happily passing.

Also you need to test more cases:

  • is the pbc argument properly honored?
  • is the correct box used?
  • what if I give none of the optional argument?
  • what if I give only some?

The tests are not only there to make sure that your code works now. They are mostly there to make sure that if I do something stupid because I try to translate half of the code base to perl at 3 AM after a mixture of beer and coffee, everything I break becomes red!

"""

pbc_arg = pbc
if isinstance(ag, AtomGroup):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than explicitly test for the type of the argument, it would be better to try running the method and raise the correct error if it fails. This allows somebody to give as argument anything that roughly behave like an atom group. The python philosophy in that matter is that if it looks like a duck and quacks like a duck, then it is a duck.

try:
    ag_center = ag.center_of_geometry(...)
except AttributeError:
    raise ValueError('{:r} is not an AtomGroup'.format(ag))

else:
boxcenter = np.sum(ts.triclinic_dimensions, axis=0) / 2
vector = boxcenter - ag_center
translate(vector)(ts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it is good to reuse existing function, I am worried this would create a big overhead for just a sum.

return reference, transformed

def test_translate():
ref_u, trans_u = translate_universes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected use of a fixture is to let pytest call the function. Here, it should be:

def test_translate(translate_universes):
    ref_u, trans_u = translate_universes
    ...

@davidercruz
Copy link
Contributor Author

davidercruz commented Jun 20, 2018

@jbarnoud Thank you.
I added some more tests. One of the tests made me realize that there was no distinction between failing because the input arg was not an AtomGroup object, or because it did not have masses. Now it's a bit clearer - but I think it can be better.

One thing regarding the box argument when pbc=True: currently the custom box is not taken into account when calculating the center of mass/geometry with PBC. I could change these two methods to correct this. What do you think?

with pytest.raises(ValueError):
translate(vector)(ts)

def test_center_in_box_args(translate_universes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be one test function per error. The way it is written, the function will stop at the first failing assertion. However, knowing if the other ones pass or fail is a valuable information to figure out what happens.

with pytest.raises(ValueError):
center_in_box(ag, pbc=True)(ts)
# what if a wrong center type name is passed?
bad_center = "charge"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may implement one day a center of charge. Having something obviously wrong such as "invalid" makes the test future proof and clearer (though, not so much clearer as the variable name is already good).



def test_center():
def test_translate_vector(translate_universes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realize: how is the dtype of ts.positions affected by the dtype of the vector?

Copy link
Contributor Author

@davidercruz davidercruz Jun 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had an error in #1902 where if the vector was not np.float32 some floating point errors would pop up, and the transformed positions would be slightly different from the expected

reference = make_Universe(trajectory=True)
transformed = make_Universe(trajectory=True)
transformed.trajectory.ts.dimensions = np.array([372.,373.,374.,90,90,90])
transformed = make_Universe(['masses'], trajectory=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to assign masses so that the center of mass and the center of geometry are different.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using the ['masses'} extra in make_Universe generates masses for the atoms. The centers are different for the transformed universe

@jbarnoud
Copy link
Contributor

One thing regarding the box argument when pbc=True: currently the custom box is not taken into account when calculating the center of mass/geometry with PBC. I could change these two methods to correct this. What do you think?

What would be the use case for a custom box in your transformation? In what case would you want to use a box size different from the one of the time step?

I do not see a use case, which does not mean there isn't one. If there is indeed no use case, then the box option of the transformation should probably be removed.

@davidercruz
Copy link
Contributor Author

@jbarnoud one use case would be to center one the atomgroup of a trajectory in the unit cell of another from a different universe, although the user could simply change the dimensions of the first to match those of the latter.

@jbarnoud
Copy link
Contributor

Then, for sure, you do not want to use the custom box for the center of mass or center of geometry as it will mess up your coordinates. It may also be less confusing to give a point to center on rather than a full custom box.

point: list, optional
overrides the unit cell center - the coordinates of the Timestep are translated so
that the center of mass/geometry of the given AtomGroup is aligned to this position
instead. Defined as a list of size 3. Example: [1, 2, 3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"array like" rather than "list"

if box:
boxcenter = np.sum(triclinic_vectors(box), axis=0) / 2
if point:
if len(point)==3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be done once in the parent function rather than every frame here.

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more changes. But it is getting there.

* 0.18.1

Enhancements
* Added a box centering trajectory transformation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the PR. At the end of the day, it may be worth grouping all the transformations in a single changelog entry.

@@ -47,9 +50,7 @@ def translate(vector):
----------
vector: list
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not supposed to be a list, but an array of anything similar. I think "array-like" is the term.

center: str, optional
used to choose the method of centering on the given atom group. Can be 'geometry'
or 'mass'
point: list, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, "array-like".

pbc_arg = pbc
try:
if center == 'geometry':
ag_center = ag.center_of_geometry(pbc=pbc_arg)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cannot be done in the parent function. The way it is written now means that the center is not updated every frame. What you can do, though, is to choose the method in the parent method:

# In the parent function
if center =='geometry':
    center_method = functools.partial(ag.center_of_geometry, pbc=pbc_arg)
elif ...

# In the wrapped function
ag_center = center_method()

def wrapped(ts):
ag_center = center_method()
if point:
if len(point)==3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment disappeared, but this remains to change: you are testing and converting the point to an array for each frame while it could be done once in the parent function.


pbc_arg = pbc
point = np.array(point)
if point.any() and len(point)!=3:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if point is [0, 0, 0], then point.any() will be False. Also, I would recommend avoiding len on arrays: I got weird behaviours at times. Anyway, the logic you had before where point could be an array or None was misplaced but safer.

"""

pbc_arg = pbc
point = np.array(point)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you specify the dtype? You had it specified, earlier.

boxcenter = np.float32(point)
else:
raise ValueError('{} is not a valid point'.format(point))
if point.any():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before: point.any() is not safe here.

@jbarnoud
Copy link
Contributor

Also, look at the doc failure on travis.

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing :)


return wrapped

def center_in_box(ag, center='geometry', point=None, pbc=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to have pbc=None? Is there a difference from having the default being True or False? If there is such a difference, it should be explicitly stated in the docstring; if not, then the default should be the correct boolean. At the moment, I do not now if PBC is taken into account by default or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it and it really makes no difference. However, the default in both center_of_mass and center_of_geometry methods is None, so I tried to keep things consistent. I added the default to the docs, however, to make things clearer for the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you should figure out what the default means for center_of_mass and center_of_geometry. I could already see that None was the default. The question remains: what am I going to get if I do not specify pbc?

Copy link
Contributor Author

@davidercruz davidercruz Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you were talking about the case when flags['use_pbc']

edit: I saw @richardjgowers 's comment. Nevermind

@jbarnoud jbarnoud mentioned this pull request Jun 24, 2018
4 tasks
@davidercruz
Copy link
Contributor Author

@richardjgowers Oh! Sorry.

Ok then, I will change None to False, since there's really only two intended behaviours

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be my last round of comments.


.. autofunction:: translate
The vector can either be user defined, using the function `translate`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be

:func:`translate`

in order to create a link to the function. Similar treatment should be applied bellow for center_in_box.

Translates the coordinates of a given :class:`~MDAnalysis.coordinates.base.Timestep`
instance so that the center of geometry/mass of the given :class:`~MDAnalysis.core.groups.AtomGroup`
is centered on the unit cell. The unit cell dimensions are taken from the input Timestep object or
can be defined using the `box` argument.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last bit is out of date.

point: array-like, optional
overrides the unit cell center - the coordinates of the Timestep are translated so
that the center of mass/geometry of the given AtomGroup is aligned to this position
instead. Defined as an array of size 3. Example: [1, 2, 3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example is confusing, just remove it.

if point is None:
boxcenter = np.sum(ts.triclinic_dimensions, axis=0) / 2
else:
boxcenter = np.float32(point)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, point already has the correct dtype, no need to convert it again.

return ts

return wrapped

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the trailing lines.

def test_center_in_box_bad_ag(translate_universes):
# this universe as a box size zero
ts = translate_universes[0].trajectory.ts
ag = translate_universes[0].residues[0].atoms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You do not need ag in that test.

with pytest.raises(ValueError):
center_in_box(bad_ag)(ts)

def test_center_in_box_bad_point(translate_universes):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should parametrize that test as described above for the bad vector.

center_in_box(ag, center=bad_center)(ts)

def test_center_in_box_no_masses(translate_universes):
# this universe as a box size zero
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the comment you care about. What matters here is that the universe does not have masses.

@pytest.fixture()
def translate_universes():
# create the Universe objects for the tests
reference = make_Universe(trajectory=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests depend on that universe not having dimensions and masses. There should be a comment here warning about these requirements.


def test_center_in_box_coords_no_options(translate_universes):
# what happens when we center the coordinates arround the center of geometry of a residue?
ref_u = translate_universes[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine, but you can also write:

ref_u, trans_u = translate_universes

from functools import partial

from ..lib.mdamath import triclinic_vectors
from ..core.groups import AtomGroup
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure you need this AtomGroup import here, maybe it's old.

In general these transformations should use duck typing, ie not explicitly checking the type of things coming in, but simply expecting them to do all the things an AtomGroup does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the import is not used anymore.

overrides the unit cell center - the coordinates of the Timestep are translated so
that the center of mass/geometry of the given AtomGroup is aligned to this position
instead. Defined as an array of size 3.
pbc: bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you've been consistent, but imho pbc is ambiguous. Could we call this wrap instead?

# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are very good, but it would be good to test that an AtomGroup's positions are correctly modified (ie more of an integration test rather than solely relying on unit tests). Something like:

u = mda.Universe()

u.trajectory.add_transformation()

ref = ....

assert u.atoms.positions = ref

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be tested now.

…ext ; DummyReader didn't have a _transformations variable
Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny details to fix.

from functools import partial

from ..lib.mdamath import triclinic_vectors
from ..core.groups import AtomGroup
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the import is not used anymore.

# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be tested now.

transformed = make_Universe(['masses'], trajectory=True)
transformed.trajectory.ts.dimensions = np.array([372., 373., 374., 90, 90, 90])
return reference, transformed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 says there should be 2 new lines between functions. Please fix in the whole PR.

@davidercruz
Copy link
Contributor Author

@jbarnoud Regarding the center in plane transformations: what kind of planes do I use? xy xz and yz? Or also other custom planes?

@jbarnoud
Copy link
Contributor

jbarnoud commented Jul 4, 2018

@davidercruz I'll be happy with the 3 main planes. In practice, the xy plane is going to be the most used. You can implement custom planes, but be careful that it should not impair the performances of the base feature.

@jbarnoud jbarnoud merged commit b718350 into MDAnalysis:develop Jul 4, 2018
@jbarnoud jbarnoud self-assigned this Jul 4, 2018
@jbarnoud jbarnoud added this to the 0.18.x milestone Jul 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants