MemoryReader facelift by richardjgowers · Pull Request #2080 · MDAnalysis/mdanalysis

richardjgowers · 2018-09-19T23:02:47Z

Fixes #2076 #2077 #1041 #2081
Replaces #1537

Changes made in this Pull Request:

MemoryReader now has velocities and forces (which hopefully work identically to positions)
MemoryReader now has variable dimensions (ie NPT conditions)
Universe.transfer_to_memory takes advantage of all of the above
Universe.empty now uses a MemoryReader (now it does the above)

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

richardjgowers · 2018-09-19T23:06:40Z

I'm a little confused what the expectations are for the MemoryReader wrt modifying attributes of Timesteps. From what I can tell, it tries to give views not copies of the underlying arrays. This means that if you edit the positions, change frame, your changes are saved (ie go back to that frame and you've still got the modified positions rather than the original). Is this correct?

Therefore, does it make sense to make .dimensions always an array of dimensions so that it can always follow this pattern?

orbeckst · 2018-09-20T00:50:13Z

Yes, I'd say the expectation is that everything is permanent. In this regard, MemoryReader is a very different beast from all the other Readers. But I think it's required so that MemoryReader can become a swiss-army knife for trajectory manipulation.

I think that the new transformations are also permanent for MemoryReader (@jbarnoud @davidercruz ?)

jbarnoud · 2018-09-20T06:01:16Z

The transformations are indeed permanent on the memory reader. In practice, the transformation is applied on all frames in one go, and then purged from the list of transformation to not apply the same transformation again when changing frame.

The expectation with the memory reader is that changes done to the coordinates are kept, on the contrary to the other readers. Having the box following the same pattern is what I would expect.

jbarnoud · 2018-09-20T06:59:19Z

package/MDAnalysis/coordinates/dummy.py

-from .base import SingleFrameReaderBase
-
-
-class DummyReader(SingleFrameReaderBase):


Shouldn't this be deprecated first?

It was only used internally, and I think it's so buggy (it doesn't iterate?) I doubt anyone found any good use for it.

richardjgowers · 2018-09-24T21:19:02Z

package/MDAnalysis/coordinates/memory.py

            filename=self.filename,
        )
        new[self.ts.frame]
-        new.ts = self.ts.copy()


this line was problematic as base.Timestep.copy() isn't respectful of MemoryReader behaviour, see #2081. I don't think removing it causes any problems, as new[self.ts.frame] should populate the Timestep with everything needed

richardjgowers · 2018-09-24T21:20:37Z

@MDAnalysis/coredevs ok this could do with some eyes, it's got some nice MR fixes and improvements

orbeckst

I have some minor comments/questions; someone else should probably also have a look.

orbeckst · 2018-09-25T00:52:49Z

package/MDAnalysis/coordinates/memory.py

        self.ts.dt = dt
-        if dimensions is not None:
-            self.ts.dimensions = dimensions
+        if dimensions is None:


you could do

if dimensions is None: pass elif dimensions.shape == (6,): dimensions = np.tile(dimensions, (self.n_frames, 1)) elif dimensions.shape != (self.n_frames, 6): raise Hell self.dimensions_array = dimensions

orbeckst · 2018-09-25T00:56:28Z

package/MDAnalysis/core/universe.py

-                n_atoms=n_atoms,
-                velocities=velocities, forces=forces)
+            coords = np.zeros((n_atoms, 3), dtype=np.float32)
+            if velocities:


Maybe ternary?

vels = np.zeros((n_atoms, 3), dtype=np.float32) if velocities else None forces = np.zeros((n_atoms, 3), dtype=np.float32) if forces else None

for compactness?

orbeckst · 2018-09-25T00:57:05Z

package/MDAnalysis/core/universe.py

+                forces = np.zeros((n_atoms, 3), dtype=np.float32)
+            else:
+                forces = None
+            u.trajectory = get_reader_for(coords)(


Does an numpy array trigger MemoryReader selection? (I think so... can't remember the discussion a while back.)

Maybe add a comment that this creates a MemoryReader.

orbeckst · 2018-09-25T01:01:00Z

package/MDAnalysis/core/universe.py

-            # This is significantly faster, but only implemented for certain
-            # trajectory file formats
-            try:
-                coordinates = self.trajectory.timeseries(


Is it true that we do not have dedicated C-level timeseries anymore and thus the self.trajectory.timeseries was essentially just doing what happened in the except block? Or are there still cases where the timeseries approach would be beneficial?

Only DCDReader ever had a fast timeseries (and still does). It didn't return dimensions information though, which is needed. We could change DCD.timeseries to have DCD.timeseries(...., dimensions=True) to then return both coordinates and dimensions.

Then maybe the base.Reader could implement the entire else: branch below (I started this in #1400 once...)

So yeah maybe I should try and use the fast DCD routine here still since it still exists, then manually grab the dimensions..

I doubt that you improve performance by cycling through the trajectory twice. Maybe leave as you have, add a note, and raise an issue so that it is not forgotten. A before/after benchmark (or Benchmark) would be good, just so that we have an idea how much slower this gets. analysis.encore uses DCD timeseries reading quite a bit, IIRC.

orbeckst · 2018-09-25T01:01:29Z

package/MDAnalysis/core/universe.py

+            pm = ProgressMeter(n_frames, interval=1,
+                               verbose=verbose, format=pm_format)
+            coordinates = np.zeros((n_frames, n_atoms, 3), dtype=np.float32)
+            # TODO: This assumes constant...


I don't understand this comment.

orbeckst · 2018-09-25T01:03:06Z

package/MDAnalysis/core/universe.py

+            has_fors = ts.has_forces
+            has_dims = ts.dimensions is not None
+
+            if has_vels:


ternary operator for readability?

velocities = np.zeros_like(coordinates) if has_vels or None

Does Python short-circuit, i.e., is the creation of np.zeros_like(coordinates) skipped if not needed?

Python (at least 3.5.3) properly short-circuits and does not execute the function in the skipped part:

In [1]: def dome(): ...: print("did it") ...: return 10 ...: In [2]: x = dome() if True else 0 did it In [3]: x = dome() if False else 0

orbeckst · 2018-09-25T01:11:42Z

testsuite/MDAnalysisTests/coordinates/test_memory.py

+        mr2 = mr_reader.copy()
+
+        ts = mr2.ts
+        setattr(ts, attr, 7)


Remind me how this works: Are you setting e.g. ts.positions to 7? All positions in the array or are you replacing the array with the integer "7"?

If it is the latter, shouldn't we doing the former?

orbeckst · 2018-09-25T01:12:17Z

testsuite/MDAnalysisTests/coordinates/test_memory.py

+        ts = mr2[1]
+        ts = mr2[0]
+
+        assert_equal(getattr(ts, attr), 7)


array comparison? But the arrays should be float32 so you really ought to use assert_almost_equal.

codecov · 2018-09-25T16:49:19Z

Codecov Report

Merging #2080 into develop will decrease coverage by <.01%.
The diff coverage is 96.15%.

@@             Coverage Diff             @@
##           develop    #2080      +/-   ##
===========================================
- Coverage    89.41%   89.41%   -0.01%     
===========================================
  Files          159      157       -2     
  Lines        18731    18759      +28     
  Branches      2696     2709      +13     
===========================================
+ Hits         16749    16774      +25     
- Misses        1381     1382       +1     
- Partials       601      603       +2

Impacted Files	Coverage Δ
package/MDAnalysis/coordinates/__init__.py	`100% <ø> (ø)`	⬆️
package/MDAnalysis/coordinates/memory.py	`96.02% <100%> (+1.74%)`	⬆️
package/MDAnalysis/core/universe.py	`94.39% <88.46%> (-0.65%)`	⬇️
coordinates/__init__.py	`100% <0%> (ø)`	⬆️
dummy.py

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update edb1643...d86dc42. Read the comment docs.

orbeckst · 2018-09-25T17:13:57Z

package/MDAnalysis/coordinates/memory.py

+            # single box, tile this to trajectory length
+            # allows modifying the box of some frames
+            dimensions = np.tile(dimensions, (self.n_frames, 1))
+        elif dimensions.shape != (self.n_frames, 6):


Does this need to be protected against AttributeError if anything is passed through that is not a numpy array?

orbeckst · 2018-09-25T17:15:35Z

package/MDAnalysis/core/universe.py

          mapping of residues to segments
        trajectory : bool, optional
-          if True, attaches a dummy reader to the Universe, therefore
+          if True, attaches a MemoryReader to the Universe, therefore


Make this (and others) a link

:class:`~MDAnalysis.coordinates.memory.MemoryReader`

orbeckst · 2018-09-25T17:18:58Z

package/MDAnalysis/coordinates/memory.py

-                                 "This must be a array of shape (6,) or "
-                                 "(n_frames, 6)".format(dimensions.shape))
-            self.dimensions_array = dimensions
+            dimensions = np.zeros((self.n_frames, 6), dtype=np.float64)


Did you decide to always have dimensions as an array, even if it contains zeros? (I assume this is more consistent with how ts.dimensions works, and it allows setting the box if required – more flexibility!) 👍

zemanj

Instead of type checking, I'd prefer calling numpy.asarray(attr) if attr is supposed to be a numpy array (eventually). That allows greater type flexibility and will hopefully throw an appropriate error message if converting to a numpy array fails.

zemanj · 2018-09-30T17:38:35Z

package/MDAnalysis/coordinates/memory.py

+            try:
+                if velocities.ndim == 2:
+                    velocities = velocities[np.newaxis, :, :]
+            except AttributeError:


This just checks whether the velocities object has an attribute ndim. For type checking, better use isinstance(). But since that's not really "pythonic" anyway, I'd prefer calling numpy.asarray(velocities) before doing attribute checks.

zemanj · 2018-09-30T17:39:42Z

package/MDAnalysis/coordinates/memory.py

+            try:
+                if forces.ndim == 2:
+                    forces = forces[np.newaxis, :, :]
+            except AttributeError:


As above,I'd prefer using numpy.asarray(forces).

zemanj · 2018-09-30T17:41:13Z

package/MDAnalysis/coordinates/memory.py

+                    raise ValueError("Provided dimensions array has shape {}. "
+                                     "This must be a array of shape (6,) or "
+                                     "(n_frames, 6)".format(dimensions.shape))
+            except AttributeError:


Again, better use numpy.asarray(dimensions)

Universe.empty now creates a MemoryReader trajectory Removed DummyReader (it was buggy) added varying box size to MemoryReader Universe.transfer_to_memory now also transfer velocities and forces MemoryReader now always has array of dimensions fixes Issue #2081 #2076 #2077 #1041

richardjgowers · 2018-10-01T15:29:20Z

Ok changed to use np.asarray, should be good to go hopefully

richardjgowers · 2018-10-04T17:47:10Z

@MDAnalysis/coredevs if this gets reviewed I can finish up 0.19.0

orbeckst · 2018-10-04T19:24:18Z

@zemanj could you please have a look when you can spare a moment?

orbeckst

This looks ok to me – but I am not an expert on the MemoryReader. If anyone else (@zemanj @jbarnoud @wouterboomsma ) might want to have a quick look then that would be great.

(We would like to merge this soon in order to get the 0.19.0 release out...)

@richardjgowers perhaps open an issue noting that we might want to revisit the MemoryReader once we have timeseries support with dimensions for all readers (so that fast timeseries-slurpers such as the one for DCD and possibly XTC) can be used.

wouterboomsma · 2018-10-09T19:02:17Z

Thanks for notifying me, @orbeckst. I haven't followed recent development in MDAnalysis very closely, and therefore don't have anything intelligent to add to what has already been discussed. Good to see that MemoryReader is still being used. Thanks for putting in the effort in bringing this up to date @richardjgowers.

richardjgowers added the Work in progress label Sep 19, 2018

jbarnoud reviewed Sep 20, 2018

View reviewed changes

richardjgowers force-pushed the issue-2076-bettermemoryreader branch from 29f904f to 91e59d9 Compare September 24, 2018 21:14

richardjgowers commented Sep 24, 2018

View reviewed changes

richardjgowers added Format-MemoryReader and removed Work in progress labels Sep 24, 2018

richardjgowers changed the title ~~Issue 2076 bettermemoryreader~~ MemoryReader facelift Sep 24, 2018

orbeckst reviewed Sep 25, 2018

View reviewed changes

orbeckst mentioned this pull request Sep 27, 2018

memory reader assumes all trajectories start at time 0 #1769

Open

zemanj reviewed Sep 30, 2018

View reviewed changes

richardjgowers force-pushed the issue-2076-bettermemoryreader branch from 68900cc to d86dc42 Compare October 1, 2018 15:27

richardjgowers added this to the 0.19.0 milestone Oct 3, 2018

orbeckst approved these changes Oct 8, 2018

View reviewed changes

richardjgowers merged commit 97db6d1 into develop Oct 9, 2018

richardjgowers deleted the issue-2076-bettermemoryreader branch October 9, 2018 00:13

		from .base import SingleFrameReaderBase


		class DummyReader(SingleFrameReaderBase):

Conversation

richardjgowers commented Sep 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

richardjgowers commented Sep 19, 2018

Uh oh!

orbeckst commented Sep 20, 2018

Uh oh!

jbarnoud commented Sep 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers commented Sep 24, 2018

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Sep 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zemanj left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers commented Oct 1, 2018

Uh oh!

richardjgowers commented Oct 4, 2018

Uh oh!

orbeckst commented Oct 4, 2018

Uh oh!

orbeckst left a comment

Choose a reason for hiding this comment

Uh oh!

wouterboomsma commented Oct 9, 2018

Uh oh!

Reviewers

Assignees

Labels

richardjgowers commented Sep 19, 2018 •

edited

Loading

codecov bot commented Sep 25, 2018 •

edited

Loading

zemanj left a comment •

edited

Loading