barebones working prototype of trajectory transformations#1902
barebones working prototype of trajectory transformations#1902jbarnoud merged 2 commits intoMDAnalysis:developfrom
Conversation
|
@davidercruz great to see your GSoC PR shaping up. I added the WIP label because you said "WIP" in the summary. This means to us that you currently just want feedback on the code but that you yourself don't consider it ready to be merged. Once you want to get it merged, remove "WIP" from your description and let us know if you need someone else to remove the label (I don't know if you have permissions to set labels on your own PR or not). Your commit is not associated with your GitHub account. Please add the email address that you're using in your commits as an email address to your GitHub account. Have a look at the Travis tests by following the Details link. They are failing. Typically people want to see tests passing or specific questions on how to make them pass before engaging with the rest of your code. You can ignore appveyor – we are trying to get Windows builds going (see PR #1880) but it's not there yet. Unfortunately, we need to have appveyor builds going for all PRs even though we know that they are failing. |
|
One other thing: you are working on the develop branch of your MDA fork. This will likely make your life complicated if you don't get this PR merged quickly. It is much better to work on a feature branch. and then create the PR from the feature branch. This allows you to keep your develop branch in sync with the MDAnalysis one while still working on your own branch or indeed creating other feature branches (eg if you need to add something else in MDAnalysis that can be worked on independently from the core transformations). |
jbarnoud
left a comment
There was a problem hiding this comment.
I did not think about using an ordered dict this way. I had a simple list in mind, but why not the ordered dict. The only thing I see is on this line: https://github.com/MDAnalysis/mdanalysis/pull/1902/files#diff-99815fce73651366394eb2dd8fe2f872R1207 it forces the user to write transformation that can accept an argument in addition to the time step, even if that argument is not used. I do not like that much.
Could you make a demo jupyter notebook?
| return self._transforms.keys() | ||
|
|
||
| def _check_for_transformation(self, transform): | ||
| """ Check for the existance of an auxiliary *auxname*. If present, |
There was a problem hiding this comment.
This docstring is out of sync with the function it describes.
| try: | ||
| self._check_for_transformation(transform) | ||
| except: | ||
| raise ValueError("No transformation named {name}".format(name=transform)) |
There was a problem hiding this comment.
You are not referring to transformations by name. The error message is confusing.
| if transform in self.transformation_list: | ||
| return self._transforms[transform] | ||
| else: | ||
| raise ValueError("No transformation named {name}".format(name=transform)) |
There was a problem hiding this comment.
Same as the other error message above.
| for auxname in self.aux_list: | ||
| ts = self._auxs[auxname].update_ts(ts) | ||
| for transform in self.transformation_list: | ||
| ts = transform(ts, self._transforms[transform]) |
There was a problem hiding this comment.
This is going to break if the transformation does not take arguments in addition to the time step.
|
@davidercruz congrats on starting you GSOC project! But I don't like the API here. It seems weird to store the arguments to the function and the function separately. If you had used a class... class Translate
def __init__(self, vector):
self.vector
def __call__(self, ts):
ts.positions += self.vector
return tsThen the class Separate from that, the way of adding/removing transformations seems unwieldy and the limitation of only one of each type isn't great. If we added tags/names to each transformation it could look like: u.trajectory.add_transformation('trans1', (translate, (1, 2, 3)))
u.trajectory.add_transformation('rotate', (rotate, (90, 0, 0)))
u.trajectory.add_transformation('trans2', (translate, (-1, -2, -3)))
#but later...
u.trajectory.remove_transformation('trans1') |
| -------- | ||
| :meth:`add_trasnformation` | ||
| """ | ||
| try: |
There was a problem hiding this comment.
Not sure we need the check function, just rely on KeyErrors on your dict
try:
self.transforms.pop(transform)
except KeyError:
raise Whoops|
I do not like transformations having to be classes. It is overly complicated for most cases. Having the transformations being generic callables means they can be a simple function, but a class instance remains an option. I was thinking about storing the transformations in a list but I am curious about the use of an ordered dict. Storing the arguments in the dictionary has the problem I describe above, but it may make the serialization easier in some cases. It will not always solve the serialization problem, though. So it may be adding complexity to only solve half a problem. If we go for the dictionary, I like the labels the way @richardjgowers describe them. The way I see it, the example from @richardjgowers should look like: def translate(*vector):
def wrapped(ts):
ts.position += vector
return ts
return wrapped
u.trajectory.add_transformation('trans1', translate(1, 2, 3))
u.trajectory.add_transformation('rotate', rotate(90, 0, 0))
u.trajectory.add_transformation('trans2', translate(-1, -2, -3))
#and later...
u.trajectory.remove_transformation('trans1') |
|
Hey there, thank you for your feedback. @orbeckst I added the two versions of the email (gmail ignores dots on emails). I will commit to the develop for now, but will change for the next PR's. @richardjgowers and @jbarnoud: u.trajectory.add_transformation('trans1', mda.transformations.translate(1,2,3))
u.trajectory.transformation_list
Out: ['trans1']
#... things are done
u.trajectory.remove_transformation('trans1')
u.trajectory.transformation_list
Out: []I like this a lot more, actually. |
|
Yeah looks good, tests + docs next! @jbarnoud your wrapper function works nicely too! (but doesn't exclude me using a class sneakily :p) |
|
Small comment (sorry, did not follow the full discussion in detail): I don't particularly like putting "types" into names so instead of u.trajectory.transformation_listI would prefer u.trajectory.transformationsAnd if this is supposed to work like a list then it would be nice if it behaved like one, eg u.trajectory.transformations.append(mda.transformations.translate(1,2,3))However, given that you're naming them (which is great), why not just expose u.trajectory.transformations['trans1'] = mda.transformations.translate(1,2,3)You can still do u.trajectory.transformations.keys()to get the names. |
|
We could expose the dict directly.Le 19 mai 2018 12:01 AM, Oliver Beckstein <notifications@github.com> a écrit :Small comment (sorry, did not follow the full discussion in detail): I don't particularly like putting "types" into names so instead of
u.trajectory.transformation_list
I would prefer
u.trajectory.transformations
And if this is supposed to work like a list then it would be nice if it behaved like one, eg
u.trajectory.transformations.append(mda.transformations.translate(1,2,3))
However, given that you're naming them (which is great), why not just expose transformations with a dict-like interface
u.trajectory.transformations['trans1'] = mda.transformations.translate(1,2,3)
You can still do
u.trajectory.transformations.keys()
to get the names.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
|
|
The idea behind hiding the transforms dict was that it would be consistent with the auxiliaries dict and methods. That's why I created a edit: now I realize that in this case I should have added a |
|
I see; consistency in the interface is a good reason. |
|
I added some documentation. Now I have a question about the tests. I have to unit tests for each transformation itself (inside a transformations directory in the testsuite). But I'm a little lost on what tests should I write in the context of the reader API. |
|
I 'll have a look latter at the documentation you wrote. For the tests, here are a couple of ideas :
* are the transformations applied on all the frames when you iterate on a transformation?
* are the transformations applied when you iterate on the trajectory with a slice?
* what happens when you switch to a different frame with u.trajctory[5]?
* what happens if you switch to a frame and back to the previous?
* if you switch to the frame you already are at?
* if you iterate twice?
The transformations should probably be tested with in the reader base test class so you check all the readers. Make sure the tests run for the memory reader that is going to fail and annoy you.
|
|
@jbarnoud thank you. I had forgot about checking if sliced iterations were working. Turns out that the coordinates were not being changed so Regarding the memory reader, as expected, the transformations are being applied every time we iterate over the trajectory. I added a new variable to the The only "problem" right now is that the trajectory must be transferred to memory before adding transformations to the workflow. If done before, the transformations are "forgotten": u = mda.Universe(PSF, DCD)
u.trajectory.add_transformation("trans1", mda.transformations.translate(1,1,1))
u.trajectory.transformations
Out: OrderedDict([('trans1',
<function MDAnalysis.transformations.translate.wrapped>)])
u.transfer_to_memory()
u.trajectory.transformations
Out: OrderedDict()Another possible problem is removing, or changing transformations after the trajectory is loaded to memory: if the user removes a transformation, the positions don't revert to the original. |
How about making the |
I can't think of a good way to fix this. I think you should raise an error ( |
| """ Check if a given transformation name has already been applied to a | ||
| given timestep """ | ||
| if ts.frame not in ts.applied_transformations.keys(): | ||
| ts.applied_transformations[ts.frame] = [transname] |
There was a problem hiding this comment.
Doesn't this assume that the timestep is the same instance from one frame to the other? I think we can assume that some readers may not recycle the TimeStep instance and use a new one each time a frame is read.
There was a problem hiding this comment.
You are right. When you read from a file the same frame over and over again, it will always be a new instance of Timestep - coordinates are reset, so the transformations are always applied. But for the memoryreader, as the coordinates are changed in the "source", this will avoid the problem of applying transformations over already transformed coordinates.
| def _check_and_apply_transformation(self, transname, ts): | ||
| """ Check if a given transformation name has already been applied to a | ||
| given timestep """ | ||
| if ts.frame not in ts.applied_transformations.keys(): |
There was a problem hiding this comment.
There is no need for the .keys(): dictionaries behave as a collection of keys.
|
I am afraid the check for the memory reader is a bit brittle. It allows to add transformations at the end of the transformation workflow, but any other change to the workflow (reordering, new transformation but not at the end, removal of a transformation...) is going to break the check. One solution would be to fix the workflow when the reader is created. This means you instantiate the reader with the workflow, and you cannot change it afterward. By doing so, you only have to check if the transformations have been applied already, and not what transformations. Setting the transformation workflow at universe creation works for my use of the thing, and it makes the API much simpler: there is no need for a function to add or remove transformations, transformations do not have to be named anymore. Though I do not know if it would fit the expected needs of the others. u = mda.Universe(TPR, XTC, transformations=[trans1, trans2(1, 2, 3), trans3])
# or
reader = MyReader(..., transformations=[...])
u = mda.Universe(TPR, reader) |
|
Yeah I don't really like checking what's been applied etc. Would it work if we overloaded def add_transformation(self, transname, transform):
# apply the transformation all at once over all frames
self._apply_transformation_to_traj(transform)
# then add a dummy transformation that does nothing
super().add_transformation(transname, lambda x: x) |
|
@richardjgowers That would work until somebody tries to modify the transformation workflow. Combined with the fixed workflow I described above it would remove all the book keeping, though. |
|
@jbarnoud and @richardjgowers I implemented those changes. However, in the MemoryReader, I still need book keeping. How can I iterate over all the frames inside the MemoryReader? If I do that, I can ditch book keeping completely. Cheers |
|
@davidercruz something like class MemoryReader
def _apply_transformation(trans):
for i, ts in enumerate(self):
# figure out what the modified Timestep for this frame is
ts = trans(ts)
# then store these modified coordinates in the master array
self.coordinate_array[i, :, :] = ts.positionsBut there's a nice headache where you have to figure out which of the 3 dimensions of the array is the frame dimension, check out memory.py |
|
@davidercruz It would be nice to have a For the memory reader, make sure that you do not apply the transformations when you copy to memory and when you explicitly apply the transformations. Please test the copy to memory from both XTC and DCD; they may use different mechanisms (iteration vs timeseries). |
|
Argh. I said something wrong! Transformations may need atom groups as arguments; it is therefore not possible to have them fixed at universe creation time. Having the transformation workflow read only at some point is the only way to avoid error prone book keeping, though. We could have a restriction that the |
|
@jbarnoud So, what you are saying is that, either the user adds the transformations at universe creation, |
|
I was thinking the former. But it would not be a problem for me to ditch the keyword argument to reduce the amount of tests. |
|
Ok, I think it's getting better. There's still the book keeping for the MemoryReader, but I'm looking into what @richardjgowers proposed, and I'll be in touch. |
|
@mnmelo @jbarnoud Thanks for the feedback! I made the requested changes. I had to change the argument parsing for transformations in the Universe class for them to behave as in the u = mda.Universe(top,traj, transformations = thistransform(args))
workflow = [thistransform(args), anothertransform(args)]
u2 = mda.Universe(top,traj, transformations = workflow)instead of requiring a list even if only a single transformation is needed |
package/MDAnalysis/core/universe.py
Outdated
| self._transforms = kwargs.pop('transformations', []) | ||
| self._transforms = [] | ||
| trans_arg = kwargs.pop('transformations', None) | ||
| if isinstance(trans_arg, list): |
There was a problem hiding this comment.
This is going to break if the user passes a tuple, a numpy array, a generator, or anything that is not a list. I would switch the test:
if callable(trans_arg):
self._transforms.append(trans_args)
elif trans_args:
self._transforms = trans_arg
package/MDAnalysis/core/universe.py
Outdated
| if isinstance(trans_arg, list): | ||
| self._transforms = trans_arg | ||
| elif trans_arg: | ||
| self._transforms.append(kwargs.pop('transformations')) |
There was a problem hiding this comment.
The argument is already popped out of the dict. It cannot be popped again. If this did not break the tests, it means you are missing a test. Did you test you can pass a callable to the universe keyword argument?
There was a problem hiding this comment.
You are right. I was unsure if I would do the checking or just force the args to be lists, and when I rolled back the changes I messed up.
There was a problem hiding this comment.
From what I see, the test is still missing.
There was a problem hiding this comment.
Should I add the test in ``test_universe.py`?
There was a problem hiding this comment.
Sounds like a good place indeed.
There was a problem hiding this comment.
I added tests for both cases: callable and workflow list
There was a problem hiding this comment.
You did something to the file, it appears as every line changed. Something related to windows end of lines maybe?
There was a problem hiding this comment.
it's all good now
| This list can only be modified once, and further calls of this function will | ||
| raise an exception. | ||
|
|
||
| u = MDAnalysis.Universe(topology, coordinates) |
There was a problem hiding this comment.
This bloc of code should be in an explicit code block, it should star with .. code:: python to avoid sphinx to choke on *workflow as it does now.: http://www.sphinx-doc.org/en/stable/markup/code.html.
The sphinx error to prevent is:
/home/travis/build/MDAnalysis/mdanalysis/package/MDAnalysis/coordinates/DCD.py:docstring of MDAnalysis.coordinates.DCD.DCDReader.add_transformations:15:Inline emphasis start-string without end-string.
|
|
||
| if not coordinatefile: | ||
| coordinatefile = None | ||
| # parse transformations |
There was a problem hiding this comment.
I'm not sure we need all this crud here. Will it work if you just do:
self.load_new() # this populates self.traj
trans_arg = kwargs.whatever()
if trans_arg:
transforms = [trans_arg] if callable(trans_arg) else trans_arg
self.trajectory.add_transformations(transforms)then all the transformation stuff is done in the same place
| u = mda.Universe(PSF,DCD, transformations=translate([10,10,10])) | ||
| uref = mda.Universe(PSF,DCD) | ||
| ref = translate([10,10,10])(uref.trajectory[0]) | ||
| assert_almost_equal(u.trajectory[0].positions, ref, decimal=6) |
There was a problem hiding this comment.
So by calling u.trajectory[0] you trigger the reload of the first frame (ie it goes through Reader.__getitem__). So currently it's not tested that doing Universe(stuff, transformations=[stuff]) applies the transformations to the Universe on creation. You should check this as assert_almost_equal(u.trajectory.ts.positions, ref)
| err_msg="Coordinates disagree at frame {0:d}".format( | ||
| ts_orig.frame)) | ||
|
|
||
| def test_transform_iteration(self, universe, transformed): |
There was a problem hiding this comment.
these tests look good, I'd add in a test for rewind() too, it should restore the first frame.
|
I think the last thing is to update the changelog. |
jbarnoud
left a comment
There was a problem hiding this comment.
If @richardjgowers approves it, the PR can be rebased into develop.
richardjgowers
left a comment
There was a problem hiding this comment.
@davidercruz looks good, but I managed to break it (but it's not really your fault)
# load a single frame trajectory
u = mda.Universe('something.gro')
# add a transformation
u.trajectory.add_transformation(whatever)
# play with positions
u.atoms.positions *= 0
# reset the positions with rewind
u.trajectory.rewind()
# this should now be the 0th frame again, with the transformations
u.atoms.positions == 0 # ?So basically SingleFrameReader.rewind doesn't respect transformations (or anything really...) . I think if you make Single.rewind just call _read_first_timestep again, and apply transformations, it should work...
| except ValueError: | ||
| raise ValueError("Can't add transformations again. Please create new Universe object") | ||
| else: | ||
| self.ts = self._apply_transformations(self.ts) |
There was a problem hiding this comment.
indentations, but if this is working it also means it's not being tested?
There was a problem hiding this comment.
Since else: is expecting the next line of code to have an indentation, it doesn't really matter how much spaces that indentation has. It could be a single space and it would work. But yes it goes against the style guide
|
|
||
| def add_transformations(self, *transformations): | ||
| """ Adds all the transformations to be applied to the trajectory. | ||
| Overrides :meth:`~MDAnalysis.coordinates.base.ProtoReader.add_transformations` |
There was a problem hiding this comment.
this isn't user documentation, so should be a comment. Ie this isn't helpful when I'm in a ipython repr trying to figure out how to add transformations :)
|
@richardjgowers while solving the rewind issue (#1929) I realized that the |
|
Now |
|
Copy should make a copy as close as possible from the original and therefore include the transformations. Le 9 juin 2018 5:53 PM, Davide Cruz <notifications@github.com> a écrit :Now rewind works as intended. I didn't change the copy() methods, as it can be a way to read the un-transformed coordinates without creating a new universe. On the other hand, since the auxiliaries are also copied, copying the transformations would keep things consistent.
—You are receiving this because you were mentioned.Reply to this email directly, view it on GitHub, or mute the thread.
|
|
@davidercruz re copy stuff. You'll need to add some handling of copying transformations to the class ReaderBase():
def copy():
blah blah blah
if self.transformations:
raise NotImplementedError("can't copy transformations, sorry")Then you can open an issue about adding this functionality. For now it's important that it doesn't silently look like it works |
|
@jbarnoud @richardjgowers it was easy to change really, so now its done. I added tests for the copy methods in the context of the transformations. |
|
@davidercruz looks good, but you'll need to do a rebase |
|
@davidercruz The point of the rebase is not only to have your code on more up-to-date bases, but also to group your commit in logical units. Have a look at https://robots.thoughtbot.com/git-interactive-rebase-squash-amend-rewriting-history especially the part on squashing commits together. |
|
@davidercruz put that you fixed #1929 in the CHANGELOG |
…unction (Issue MDAnalysis#786) Transformations are applied by the readers, either by being added by the user or passed as a kwarg when instancing the universe; added tests for the modified reader classes: ProtoReader, SingleFrameReader, ChainReader and MemoryReader
… be re-read from the input file (MDAnalysis#1929) Added tests to the rewind method in the context of transformations
|
Youhou! Congrats @davidercruz for your GSoC merged PR! 🎆 |
WIP of #786
Changes made in this Pull Request:
Hey, this is my first commit for my GSOC project. The transformations are alive 🎉
From our talks, transformations would be given by name, for example:
In this implementation I chose to do it as follows:
Although it is more keyboard intensive to the user, it helps the user know which transformations are implemented and it allows code auto-completion (when the editor allows it). The main disadvantage is that repeating types of transformations ( two translate routines for example) are not allowed, but that is a question of changing the OrderedDict to a list of lists and updating the code accordingly.
The translation routine needs some routines to repack the atoms in the unit cell.
I need your input on checks that are missing, issues and your feedback on this approach to implement on-the-fly trajectory transformations.
Cheers
PR Checklist