Skip to content

On-the-fly coordinate transformations #786

@jbarnoud

Description

@jbarnoud

On-the-fly coordinate transformations

So far, the only transformation MDAnalysis does to coordinates read from a trajectory is a unit conversion. Any other transformation must be done by th user on a frame per frame basis. Yet, in some use case, the user does not directly access the frame, and therefore cannot apply the transformations. This is especially the case with analyses and visualizations. Hereby, I propose a general mechanism to declare coordinate transformations that will be applied by the reader.

Use cases

The main use cases for the proposal are analyses and visualizations that require structure alignements or periodic boundary corrections.

Analyses such as RMSD calculations require the structures to be aligned. So far, the RMSD class control the fit with a keyword, and only implement a single way of doing that fit. This means that if the user wants to do a different fit, he must save an intermediate trajectory. It also means an other analysis that require a fit needs to implement the way to handle it; such implementation being redundant and potentially inconsistent with the other analyses.

@dotsdl posted recently a blog post on nglview. This library allows to visualize a trajectory from MDAnalysis in a jupyter notebook. Like most (if not all) visualization software, nglview does not fix the periodic boundary conditions, which can lead to ugly artifact with bond crossing the box. MDAnalysis could fix the periodic box in a way transparent for the visualization library.

Also, some transformations are painful to do with the tools embedded in the simulation packages, and can require multiple call to tools like gmx trjconv with several intermediate trajectory. With the proposed mechanism it would be possible to declare a workflow of transformations in MDAnalysis and apply them frame by frame, without intermediate files.

New APIs

User facing API

Only few changes are needed in the user facing API: a minima a method should be added to Universe to register a transformation, few methods could also be added to to inspect and modify the transformation workflow.

u = mda.Universe(this, that)
# Register a callback as a transformation
u.add_transformation(mda.transformations.awsome_transformation)
u.add_transformation(mda.transformations.other_transformation)
for ts in u.trajectory:
    # Do something with the frame;
    # the coordinates are transformed by
    # `awsome_transformation` and
    # `other_transformation`, in order.

A transformation is implemented as a callback that takes a TimeStep as argument and modify in on place. Using a callback means that a transformation can be implemented in the simple way possible for that transformation. The simplest transformations can be implemented as a function, so as transformations that require a constant argument:

import numpy
import functools

def constant_translation(ts, vector):
    """
    Translate all the atoms by a given vector
    """
    ts.positions += vector

# Calling my_translation(ts) will be the same as
# calling constant_translation(ts, vector) with
# vector = numpy.array([2, 3, 4])
my_translation = functools.partial(constant_translation,
                            vector=numpy.array([2, 3, 4]))

u = mda.Universe(this, that)
u.add_transformation(my_translation)

The more complicated transformations can be implemented as classes with a __call__ method.

Ideally, transformations that require access to the topology (e.g. making molecule whole) can access it via the universe attribute of the TimeStep for a minimum burden of the user.

It would be nice to be able to add several transformation in one go as a workflow:

workflow = [awsome_transform,
            other_transform,
            super_transform]
u = mda.Universe(this, that)
u.add_transformation_workflow(workflow)

Internal API

Internally, the Universe class must allow to register transformations and must pass them to the coordinate.base.ProtoReader class that apply them, in order, when reading a new frame.

This proposal share problems with proposal #785 by @richardjgowers. Both proposal need a way to declare method, and to execute them when reading a new frame. Auxiliary data should be read before executing the coordinate transformations so the transformation can use the auxiliary data.

Optionally, it could be useful to allow a coordinate transformation to add auxiliary (e.g. rotation matrix used for the fitting). This should be trivial if auxiliary data are attached the TimeStep.

Collection of transformations

For this proposal to be really useful, MDAnalysis must have a collection of transformations ready to use. The more obvious ones are the already implemented structure fitting, and PBC correction methods.

Limitations

Some exiting transformation require to change the topology. It is the case to any transformation that adds virtual particle, or any transformation that coarse grain or back map a trajectory.

For exemple, I recently wrote an script that adds virtual particle to each frame of a trajectory to test the possible bias of an analysis. Also, several of us work with coarse-grained models and need to analyse atomistic simulation as if they were coarse-grained.

These transformations require to adapt the topology of the system, and I do not see a clean way of doing it. Also, it is not clear if such transformations are in the scope of this proposal (even though I would love to be able to use them).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions