Skip to content

barebones working prototype of trajectory transformations#1902

Merged
jbarnoud merged 2 commits intoMDAnalysis:developfrom
davidercruz:develop
Jun 13, 2018
Merged

barebones working prototype of trajectory transformations#1902
jbarnoud merged 2 commits intoMDAnalysis:developfrom
davidercruz:develop

Conversation

@davidercruz
Copy link
Contributor

WIP of #786

Changes made in this Pull Request:

  • transformations hooks are there
  • a simple translation of coordinates by a given vector

Hey, this is my first commit for my GSOC project. The transformations are alive 🎉

From our talks, transformations would be given by name, for example:

import MDAnalysis as mda

u = mda.Universe(PSF, DCD)
u.trajectory.add_transformation(translate(1, 2, 3))
# u.trajectory.ts.positions are updated for that frame

In this implementation I chose to do it as follows:

import MDAnalysis as mda

u = mda.Universe(PSF, DCD)
u.trajectory.add_transformation(mda.transformations.translate, (1, 2, 3))

Although it is more keyboard intensive to the user, it helps the user know which transformations are implemented and it allows code auto-completion (when the editor allows it). The main disadvantage is that repeating types of transformations ( two translate routines for example) are not allowed, but that is a question of changing the OrderedDict to a list of lists and updating the code accordingly.

The translation routine needs some routines to repack the atoms in the unit cell.

I need your input on checks that are missing, issues and your feedback on this approach to implement on-the-fly trajectory transformations.

Cheers

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@orbeckst
Copy link
Member

@davidercruz great to see your GSoC PR shaping up.

I added the WIP label because you said "WIP" in the summary. This means to us that you currently just want feedback on the code but that you yourself don't consider it ready to be merged. Once you want to get it merged, remove "WIP" from your description and let us know if you need someone else to remove the label (I don't know if you have permissions to set labels on your own PR or not).

Your commit is not associated with your GitHub account. Please add the email address that you're using in your commits as an email address to your GitHub account.

Have a look at the Travis tests by following the Details link. They are failing. Typically people want to see tests passing or specific questions on how to make them pass before engaging with the rest of your code.

You can ignore appveyor – we are trying to get Windows builds going (see PR #1880) but it's not there yet. Unfortunately, we need to have appveyor builds going for all PRs even though we know that they are failing.

@orbeckst
Copy link
Member

One other thing: you are working on the develop branch of your MDA fork. This will likely make your life complicated if you don't get this PR merged quickly.

It is much better to work on a feature branch.

git checkout -b feature-transformations-basics
...

and then create the PR from the feature branch. This allows you to keep your develop branch in sync with the MDAnalysis one while still working on your own branch or indeed creating other feature branches (eg if you need to add something else in MDAnalysis that can be worked on independently from the core transformations).

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think about using an ordered dict this way. I had a simple list in mind, but why not the ordered dict. The only thing I see is on this line: https://github.com/MDAnalysis/mdanalysis/pull/1902/files#diff-99815fce73651366394eb2dd8fe2f872R1207 it forces the user to write transformation that can accept an argument in addition to the time step, even if that argument is not used. I do not like that much.

Could you make a demo jupyter notebook?

return self._transforms.keys()

def _check_for_transformation(self, transform):
""" Check for the existance of an auxiliary *auxname*. If present,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is out of sync with the function it describes.

try:
self._check_for_transformation(transform)
except:
raise ValueError("No transformation named {name}".format(name=transform))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not referring to transformations by name. The error message is confusing.

if transform in self.transformation_list:
return self._transforms[transform]
else:
raise ValueError("No transformation named {name}".format(name=transform))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the other error message above.

for auxname in self.aux_list:
ts = self._auxs[auxname].update_ts(ts)
for transform in self.transformation_list:
ts = transform(ts, self._transforms[transform])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to break if the transformation does not take arguments in addition to the time step.

@richardjgowers
Copy link
Member

@davidercruz congrats on starting you GSOC project!

But I don't like the API here. It seems weird to store the arguments to the function and the function separately. If you had used a class...

class Translate
    def __init__(self, vector):
        self.vector

    def __call__(self, ts):
        ts.positions += self.vector
        return ts

Then the class Translate is like a function factory which returns objects which have meaning, Translate((1, 2, 3)) gives you an object which applies the 1, 2, 3 transformation. It also addresses @jbarnoud comment on the way to call transformations being consistent, they all just accept ts as the only argument. I'd be interested what other people think though.

Separate from that, the way of adding/removing transformations seems unwieldy and the limitation of only one of each type isn't great. If we added tags/names to each transformation it could look like:

u.trajectory.add_transformation('trans1', (translate, (1, 2, 3)))
u.trajectory.add_transformation('rotate', (rotate, (90, 0, 0)))
u.trajectory.add_transformation('trans2', (translate, (-1, -2, -3)))

#but later...
u.trajectory.remove_transformation('trans1')

--------
:meth:`add_trasnformation`
"""
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we need the check function, just rely on KeyErrors on your dict

try:
    self.transforms.pop(transform)
except KeyError:
    raise Whoops

@jbarnoud
Copy link
Contributor

I do not like transformations having to be classes. It is overly complicated for most cases. Having the transformations being generic callables means they can be a simple function, but a class instance remains an option.

I was thinking about storing the transformations in a list but I am curious about the use of an ordered dict. Storing the arguments in the dictionary has the problem I describe above, but it may make the serialization easier in some cases. It will not always solve the serialization problem, though. So it may be adding complexity to only solve half a problem.

If we go for the dictionary, I like the labels the way @richardjgowers describe them.

The way I see it, the example from @richardjgowers should look like:

def translate(*vector):
    def wrapped(ts):
        ts.position += vector
        return ts
    return wrapped

u.trajectory.add_transformation('trans1', translate(1, 2, 3))
u.trajectory.add_transformation('rotate', rotate(90, 0, 0))
u.trajectory.add_transformation('trans2', translate(-1, -2, -3))

#and later...
u.trajectory.remove_transformation('trans1')

@davidercruz
Copy link
Contributor Author

Hey there, thank you for your feedback. @orbeckst I added the two versions of the email (gmail ignores dots on emails). I will commit to the develop for now, but will change for the next PR's.

@richardjgowers and @jbarnoud:
I changed the code according to your feedback. Now the API works as follows:

u.trajectory.add_transformation('trans1', mda.transformations.translate(1,2,3))
u.trajectory.transformation_list
Out:  ['trans1']

#... things are done
 u.trajectory.remove_transformation('trans1')
 u.trajectory.transformation_list
Out: []

I like this a lot more, actually.
What do you think?

@richardjgowers
Copy link
Member

Yeah looks good, tests + docs next!

@jbarnoud your wrapper function works nicely too! (but doesn't exclude me using a class sneakily :p)

@orbeckst
Copy link
Member

Small comment (sorry, did not follow the full discussion in detail): I don't particularly like putting "types" into names so instead of

u.trajectory.transformation_list

I would prefer

u.trajectory.transformations

And if this is supposed to work like a list then it would be nice if it behaved like one, eg

u.trajectory.transformations.append(mda.transformations.translate(1,2,3))

However, given that you're naming them (which is great), why not just expose transformations with a dict-like interface

u.trajectory.transformations['trans1'] = mda.transformations.translate(1,2,3)

You can still do

u.trajectory.transformations.keys()

to get the names.

@jbarnoud
Copy link
Contributor

jbarnoud commented May 19, 2018 via email

@davidercruz
Copy link
Contributor Author

davidercruz commented May 19, 2018

The idea behind hiding the transforms dict was that it would be consistent with the auxiliaries dict and methods. That's why I created a transformations_list property - because there is a aux_list property that returns the keys of the _aux dict.

edit: now I realize that in this case I should have added a get_transformations_descriptions method as well.

@orbeckst
Copy link
Member

I see; consistency in the interface is a good reason.

@davidercruz
Copy link
Contributor Author

I added some documentation. Now I have a question about the tests. I have to unit tests for each transformation itself (inside a transformations directory in the testsuite). But I'm a little lost on what tests should I write in the context of the reader API.

@jbarnoud
Copy link
Contributor

jbarnoud commented May 21, 2018 via email

@davidercruz
Copy link
Contributor Author

@jbarnoud thank you. I had forgot about checking if sliced iterations were working. Turns out that the coordinates were not being changed so u.trajectory[0] would return the initial values instead of the translated ones.

Regarding the memory reader, as expected, the transformations are being applied every time we iterate over the trajectory. I added a new variable to the Timestep class, applied_transformations, which is a dict that has the frame as key and a list of the names of the transformations as value.

The only "problem" right now is that the trajectory must be transferred to memory before adding transformations to the workflow. If done before, the transformations are "forgotten":

u = mda.Universe(PSF, DCD)

u.trajectory.add_transformation("trans1", mda.transformations.translate(1,1,1))

u.trajectory.transformations
Out: OrderedDict([('trans1',
              <function MDAnalysis.transformations.translate.wrapped>)])

u.transfer_to_memory()
u.trajectory.transformations
Out: OrderedDict()

Another possible problem is removing, or changing transformations after the trajectory is loaded to memory: if the user removes a transformation, the positions don't revert to the original.

@orbeckst
Copy link
Member

The only "problem" right now is that the trajectory must be transferred to memory before adding transformations to the workflow. If done before, the transformations are "forgotten":

How about making the transfer_to_memory() function aware of transformations (and possibly auxiliaries, too) so that it will correctly copy the transformations?

@orbeckst
Copy link
Member

Another possible problem is removing, or changing transformations after the trajectory is loaded to memory: if the user removes a transformation, the positions don't revert to the original.

I can't think of a good way to fix this. I think you should raise an error (TypeError or whatever is raised when one tries to change a tuple's content??) if a user tries to change or delete transformations from the MemoryReader.

""" Check if a given transformation name has already been applied to a
given timestep """
if ts.frame not in ts.applied_transformations.keys():
ts.applied_transformations[ts.frame] = [transname]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this assume that the timestep is the same instance from one frame to the other? I think we can assume that some readers may not recycle the TimeStep instance and use a new one each time a frame is read.

Copy link
Contributor Author

@davidercruz davidercruz May 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. When you read from a file the same frame over and over again, it will always be a new instance of Timestep - coordinates are reset, so the transformations are always applied. But for the memoryreader, as the coordinates are changed in the "source", this will avoid the problem of applying transformations over already transformed coordinates.

def _check_and_apply_transformation(self, transname, ts):
""" Check if a given transformation name has already been applied to a
given timestep """
if ts.frame not in ts.applied_transformations.keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need for the .keys(): dictionaries behave as a collection of keys.

@jbarnoud
Copy link
Contributor

I am afraid the check for the memory reader is a bit brittle. It allows to add transformations at the end of the transformation workflow, but any other change to the workflow (reordering, new transformation but not at the end, removal of a transformation...) is going to break the check.

One solution would be to fix the workflow when the reader is created. This means you instantiate the reader with the workflow, and you cannot change it afterward. By doing so, you only have to check if the transformations have been applied already, and not what transformations.

Setting the transformation workflow at universe creation works for my use of the thing, and it makes the API much simpler: there is no need for a function to add or remove transformations, transformations do not have to be named anymore. Though I do not know if it would fit the expected needs of the others.

u = mda.Universe(TPR, XTC, transformations=[trans1, trans2(1, 2, 3), trans3])
# or
reader = MyReader(..., transformations=[...])
u = mda.Universe(TPR, reader)

@richardjgowers
Copy link
Member

Yeah I don't really like checking what's been applied etc.

Would it work if we overloaded MemoryReader.add_transformation to add dummy transformations?

def add_transformation(self, transname, transform):
    # apply the transformation all at once over all frames
    self._apply_transformation_to_traj(transform)
    # then add a dummy transformation that does nothing
    super().add_transformation(transname, lambda x: x)

@jbarnoud
Copy link
Contributor

@richardjgowers That would work until somebody tries to modify the transformation workflow. Combined with the fixed workflow I described above it would remove all the book keeping, though.

@davidercruz
Copy link
Contributor Author

@jbarnoud and @richardjgowers I implemented those changes. However, in the MemoryReader, I still need book keeping. How can I iterate over all the frames inside the MemoryReader? If I do that, I can ditch book keeping completely.

Cheers

@richardjgowers
Copy link
Member

@davidercruz something like

class MemoryReader
    def _apply_transformation(trans):
        for i, ts in enumerate(self):
            # figure out what the modified Timestep for this frame is
            ts = trans(ts)
            # then store these modified coordinates in the master array
            self.coordinate_array[i, :, :] = ts.positions

But there's a nice headache where you have to figure out which of the 3 dimensions of the array is the frame dimension, check out memory.py

@jbarnoud
Copy link
Contributor

@davidercruz It would be nice to have a transformations attribute in the readers as well. Then you can use a @property to make the transformations read only.

For the memory reader, make sure that you do not apply the transformations when you copy to memory and when you explicitly apply the transformations. Please test the copy to memory from both XTC and DCD; they may use different mechanisms (iteration vs timeseries).

@jbarnoud
Copy link
Contributor

Argh. I said something wrong! Transformations may need atom groups as arguments; it is therefore not possible to have them fixed at universe creation time. Having the transformation workflow read only at some point is the only way to avoid error prone book keeping, though. We could have a restriction that the add_transformations method can only be called once (subsequent calls raise an exception)?

@davidercruz
Copy link
Contributor Author

@jbarnoud So, what you are saying is that, either the user adds the transformations at universe creation,
or he/she adds them after creating the universe using add_transformations. Then either way, no more transformations can be added thereafter.
Or, do we ditch the "transformations as kwargs to the universe instancing" method and revert to the alternative behaviour?

@jbarnoud
Copy link
Contributor

I was thinking the former. But it would not be a problem for me to ditch the keyword argument to reduce the amount of tests.

@davidercruz
Copy link
Contributor Author

Ok, I think it's getting better. There's still the book keeping for the MemoryReader, but I'm looking into what @richardjgowers proposed, and I'll be in touch.

@davidercruz
Copy link
Contributor Author

@mnmelo @jbarnoud Thanks for the feedback!

I made the requested changes. I had to change the argument parsing for transformations in the Universe class for them to behave as in the add_transformations method i.e

u = mda.Universe(top,traj, transformations = thistransform(args))

workflow = [thistransform(args), anothertransform(args)]
u2 = mda.Universe(top,traj, transformations = workflow)

instead of requiring a list even if only a single transformation is needed

self._transforms = kwargs.pop('transformations', [])
self._transforms = []
trans_arg = kwargs.pop('transformations', None)
if isinstance(trans_arg, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to break if the user passes a tuple, a numpy array, a generator, or anything that is not a list. I would switch the test:

if callable(trans_arg):
    self._transforms.append(trans_args)
elif trans_args:
    self._transforms = trans_arg

if isinstance(trans_arg, list):
self._transforms = trans_arg
elif trans_arg:
self._transforms.append(kwargs.pop('transformations'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is already popped out of the dict. It cannot be popped again. If this did not break the tests, it means you are missing a test. Did you test you can pass a callable to the universe keyword argument?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was unsure if I would do the checking or just force the args to be lists, and when I rolled back the changes I messed up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see, the test is still missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the test in ``test_universe.py`?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good place indeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added tests for both cases: callable and workflow list

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did something to the file, it appears as every line changed. Something related to windows end of lines maybe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's all good now

This list can only be modified once, and further calls of this function will
raise an exception.

u = MDAnalysis.Universe(topology, coordinates)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bloc of code should be in an explicit code block, it should star with .. code:: python to avoid sphinx to choke on *workflow as it does now.: http://www.sphinx-doc.org/en/stable/markup/code.html.

The sphinx error to prevent is:

/home/travis/build/MDAnalysis/mdanalysis/package/MDAnalysis/coordinates/DCD.py:docstring of MDAnalysis.coordinates.DCD.DCDReader.add_transformations:15:Inline emphasis start-string without end-string.


if not coordinatefile:
coordinatefile = None
# parse transformations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need all this crud here. Will it work if you just do:

self.load_new()  # this populates self.traj
trans_arg = kwargs.whatever()
if trans_arg:
    transforms = [trans_arg] if callable(trans_arg) else trans_arg
    self.trajectory.add_transformations(transforms)

then all the transformation stuff is done in the same place

u = mda.Universe(PSF,DCD, transformations=translate([10,10,10]))
uref = mda.Universe(PSF,DCD)
ref = translate([10,10,10])(uref.trajectory[0])
assert_almost_equal(u.trajectory[0].positions, ref, decimal=6)
Copy link
Member

@richardjgowers richardjgowers Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So by calling u.trajectory[0] you trigger the reload of the first frame (ie it goes through Reader.__getitem__). So currently it's not tested that doing Universe(stuff, transformations=[stuff]) applies the transformations to the Universe on creation. You should check this as assert_almost_equal(u.trajectory.ts.positions, ref)

err_msg="Coordinates disagree at frame {0:d}".format(
ts_orig.frame))

def test_transform_iteration(self, universe, transformed):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests look good, I'd add in a test for rewind() too, it should restore the first frame.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jun 6, 2018

I think the last thing is to update the changelog.

Copy link
Contributor

@jbarnoud jbarnoud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If @richardjgowers approves it, the PR can be rebased into develop.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidercruz looks good, but I managed to break it (but it's not really your fault)

# load a single frame trajectory
u = mda.Universe('something.gro')

# add a transformation
u.trajectory.add_transformation(whatever)

# play with positions
u.atoms.positions *= 0

# reset the positions with rewind
u.trajectory.rewind()

# this should now be the 0th frame again, with the transformations
u.atoms.positions == 0 # ?

So basically SingleFrameReader.rewind doesn't respect transformations (or anything really...) . I think if you make Single.rewind just call _read_first_timestep again, and apply transformations, it should work...

except ValueError:
raise ValueError("Can't add transformations again. Please create new Universe object")
else:
self.ts = self._apply_transformations(self.ts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentations, but if this is working it also means it's not being tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since else: is expecting the next line of code to have an indentation, it doesn't really matter how much spaces that indentation has. It could be a single space and it would work. But yes it goes against the style guide


def add_transformations(self, *transformations):
""" Adds all the transformations to be applied to the trajectory.
Overrides :meth:`~MDAnalysis.coordinates.base.ProtoReader.add_transformations`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't user documentation, so should be a comment. Ie this isn't helpful when I'm in a ipython repr trying to figure out how to add transformations :)

@davidercruz
Copy link
Contributor Author

davidercruz commented Jun 9, 2018

@richardjgowers while solving the rewind issue (#1929) I realized that the copy functions should be changed to call add_transformations. As they are right now, when the user creates a copy, the transformations are not copied. This could become a feature or I could change it to also copy the transformations. What do you say?

@davidercruz
Copy link
Contributor Author

Now rewind works as intended. I didn't change the copy() methods, as it can be a way to read the un-transformed coordinates without creating a new universe. On the other hand, since the auxiliaries are also copied, copying the transformations would keep things consistent.

@coveralls
Copy link

coveralls commented Jun 9, 2018

Coverage Status

Coverage increased (+0.05%) to 89.84% when pulling 74b9d46 on davidercruz:develop into 70b9e9f on MDAnalysis:develop.

@jbarnoud
Copy link
Contributor

jbarnoud commented Jun 9, 2018 via email

@richardjgowers
Copy link
Member

@davidercruz re copy stuff. You'll need to add some handling of copying transformations to the ReaderBase in this PR, ideally this should fail noisily for now...

class ReaderBase():
    def copy():
        blah blah blah

        if self.transformations:
            raise NotImplementedError("can't copy transformations, sorry")

Then you can open an issue about adding this functionality. For now it's important that it doesn't silently look like it works

@davidercruz
Copy link
Contributor Author

@jbarnoud @richardjgowers it was easy to change really, so now its done. I added tests for the copy methods in the context of the transformations.

@richardjgowers
Copy link
Member

@davidercruz looks good, but you'll need to do a rebase

@jbarnoud
Copy link
Contributor

@davidercruz The point of the rebase is not only to have your code on more up-to-date bases, but also to group your commit in logical units. Have a look at https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history especially the part on squashing commits together.

@richardjgowers
Copy link
Member

@davidercruz put that you fixed #1929 in the CHANGELOG

…unction (Issue MDAnalysis#786)

Transformations are applied by the readers, either by being added by the user or passed as a kwarg when instancing
the universe; added tests for the modified reader classes: ProtoReader, SingleFrameReader, ChainReader and MemoryReader
… be re-read from the input file (MDAnalysis#1929)

Added tests to the rewind method in the context of transformations
@jbarnoud jbarnoud added this to the 0.18.x milestone Jun 13, 2018
@jbarnoud jbarnoud merged commit 44631d0 into MDAnalysis:develop Jun 13, 2018
@jbarnoud
Copy link
Contributor

Youhou! Congrats @davidercruz for your GSoC merged PR! 🎆

@davidercruz davidercruz mentioned this pull request Jun 21, 2018
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants