Conversation
| from six.moves import range | ||
| import six | ||
|
|
||
| import logging as log |
|
I will be quite busy until the end of next week. I'll review the PR arount the 19th if nobody did it until then. |
| return ts | ||
|
|
||
| def timeseries(self, asel, start=None, stop=None, step=None, skip=None, | ||
| def timeseries(self, asel=None, start=None, stop=None, step=None, skip=None, |
There was a problem hiding this comment.
asel=None should be documented
| :Maintainer: Wouter Boomsma <wb@bio.ku.dk>, wouterboomsma on github | ||
|
|
||
|
|
||
| .. versionadded:: 0.15.0 |
There was a problem hiding this comment.
Strange that I missed that one. Fixed in 2637a20.
|
I ticked tests and docs (although see comments). |
|
|
||
| """ | ||
|
|
||
| format = 'memory' |
There was a problem hiding this comment.
format here must be upper case
There was a problem hiding this comment.
I followed the style used by other readers. The DCDReader class for instance has a format class variable. However, I'd be happy to change it if there is a consensus about this - just let me know.
There was a problem hiding this comment.
@richardjgowers – when I look through the readers/writers I only see lower case format.
grep -e 'Format =' MDAnalysis/coordinates/*py
Or do you mean, the format value should be "Memory""MEMORY"?
There was a problem hiding this comment.
Right sorry, I meant format = 'MEMORY'. All the guessers etc currently cast all format strings to uppercase.
|
@orbeckst You expressed some interest in the def merge_universes(universes):
"""
Merge list of universes into one
Parameters
----------
`universes` : list of Universe objects
Returns
----------
Universe object
"""
for universe in universes:
universe.transfer_to_memory()
return mda.Universe(
universes[0].filename,
np.concatenate(tuple([e.trajectory.timeseries() for e in universes]),
axis=1),
format=MemoryReader) |
|
Re: On the plus side for this PR: less work for you :-). |
|
@wouterboomsma looking good once last comments are done |
|
@wouterboomsma did you ever test how big the speed improvement is for iterative algorithms using the memory reader? Would be nice to know for annoucement news of the new version. |
|
@wouterboomsma yeah that's neater, though thinking about it though, |
|
@richardjgowers I agree completely. Fixed in f4e3b65. |
|
@wouterboomsma sorry about the double change, late night code reviews! WRT speed changes, I think we're thinking of advertising it in a blog post, so something basic like from MDAnalysis.tests.datafiles import PSF, DCD # use the reference system
u = mda.Universe(PSF, DCD)
ag = u.select_atoms('resid 1')
for ts in u.trajectory:
print ag.center_of_mass()To see if this gets much faster if the trajectory is transferred to memory (and the setup cost of transferring into memory). There's a few calculations I can think of (eg mean squared displacement) which require a lot of thrashing around the trajectory, so it'd be cool if the MemoryReader made this faster. A quick iteration test should tell if this is the case. |
That is a good example since it requires reading the trajectory a lot. Or just running a |
|
@richardjgowers, @kain88-de Generally, I wouldn't expect large improvements as long as you formulate it as a iterative problem. The benefits would arise by letting numpy do the entire calculation at once - but this means you cannot use the calculations in AtomGroup directly, at least as far as I can see. Here is an extrapolation of the example you sent, where the first two versions use the standard iterative approach with and without in_memory, and the last uses a pure numpy approach: import time
import numpy as np
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD
t0 = time.time()
u1 = mda.Universe(PSF, DCD)
t1 = time.time()
print "Read universe from DCD: {0}".format(t1-t0)
u2 = mda.Universe(PSF, DCD)
t0 = time.time()
u2.transfer_to_memory()
t1 = time.time()
print "Transfer_to_memory: {0}".format(t1-t0)
ag = u1.select_atoms('resid 1')
coms1 = []
t0 = time.time()
for ts in u1.trajectory:
coms1.append(ag.center_of_mass())
t1 = time.time()
print "Iterative procedure: {0}".format(t1-t0)
ag2 = u2.select_atoms('resid 1')
coms2 = []
t0 = time.time()
for ts in u2.trajectory:
coms2.append(ag2.center_of_mass())
t1 = time.time()
print "Iterative procedure in memory: {0}".format(t1-t0)
t0 = time.time()
coms3 = np.average(u2.trajectory.timeseries(ag2), weights=(ag2.masses/np.sum(ag2.masses)), axis=0)
t1 = time.time()
print "Non-iterative procedure: {0}".format(t1-t0)
# Verify that solutions are the same
assert((abs(np.array(coms1) - np.array(coms2)) < 0.0001).all())
assert((abs(np.array(coms1) - coms3) < 0.0001).all())The timings I get are: Several things are clear. The first read dominates any calculations that we do afterwards, so this is probably not the best test. Other than that, we can see that the numpy calculation is about 20 times as fast - but only if you execute it as a non-iterative procedure. I don't think the difference between "iterative procedure" and "iterative procedure in memory" are significant (they fluctuate around similar values). |
|
Btw, I got this strange TimeOutError problem again, leading to a failed Travis build. This is probably caused by the fact that |
|
@wouterboomsma DCD is a bit unfair since it is almost just a memory map. XTC are PDB should give a fairer comparison since they take longer to read. That would also be good information for people who use Gromacs. Still I'm a bit disappointed that the in memory approach doesn't help that much in the iterative approach. I hoped for better values even with DCD. This is not the fault of your memory reader though, it's a general problem of the iterations. |
…ory option of Universe constructor
|
Here is an updated version, now using XTCReader, and repeating each of the main tests a thousand time to get some more reliable numbers: import time
import numpy as np
import MDAnalysis.analysis.align as align
import MDAnalysis as mda
from MDAnalysis.tests.datafiles import PSF, DCD, GRO, XTC
# Setup universes
t0 = time.time()
u1 = mda.Universe(GRO, XTC)
t1 = time.time()
print "Read universe: {0}".format(t1-t0)
u2 = mda.Universe(GRO, XTC)
# Measure time for transfer to memory
t0 = time.time()
u2.transfer_to_memory()
t1 = time.time()
print "Transfer_to_memory: {0}".format(t1-t0)
# Standard case: iterative procedure
ag = u1.select_atoms('resid 1')
t0 = time.time()
for i in range(1000):
coms1 = []
for ts in u1.trajectory:
coms1.append(ag.center_of_mass())
t1 = time.time()
print "Iterative procedure: {0}".format(t1-t0)
# Iterative procedure in-memory
ag2 = u2.select_atoms('resid 1')
t0 = time.time()
for i in range(1000):
coms2 = []
for ts in u2.trajectory:
coms2.append(ag2.center_of_mass())
t1 = time.time()
print "Iterative procedure in memory: {0}".format(t1-t0)
# Direct numpy calculation in-memory
t0 = time.time()
for i in range(1000):
coms3 = np.average(u2.trajectory.timeseries(ag2), weights=(ag2.masses/np.sum(ag2.masses)), axis=0)
t1 = time.time()
print "Non-iterative procedure: {0}".format(t1-t0)
# Direct numpy calculation without overhead of timeseries() call
array = u2.trajectory.timeseries(ag2)
t0 = time.time()
for i in range(1000):
coms3 = np.average(array, weights=(ag2.masses/np.sum(ag2.masses)), axis=0)
t1 = time.time()
print "Non-iterative procedure excluding timeseries() call: {0}".format(t1-t0)
# Verify solutions are the same
assert((abs(np.array(coms1) - np.array(coms2)) < 0.0001).all())
assert((abs(np.array(coms1) - coms3) < 0.0001).all())Times on my laptop are now: Indeed, we here see some interesting differences. The in-memory version is about 75 times faster than the standard XTCReader. Surprisingly, in this case, it is also faster than the non-iterative numpy procedure. It turns out that this is due to the overhead in the timeseries() call. The last value shows that there is another potential factor of 10 to be gained by extracting the numpy array once and for all (or if we removed some of the overhead in the timeseries() call) |
|
Cool that's more like it. |
richardjgowers
left a comment
There was a problem hiding this comment.
Looking good.
As a tangent, would it be possible to save/load these trajectories using np.save/load (or something more robust like hdf5?). You could then even associate the .npy file extension with MemoryReader
| - ``False``: show all status messages and do not change the the logging | ||
| level (default) | ||
|
|
||
| *in_memory* |
There was a problem hiding this comment.
Maybe mention that this is faster?
There was a problem hiding this comment.
I haven't tested the speed very systematically for the alignment, so I've just added ", which can improve performance substantially in some cases".
| writer = None | ||
| if in_memory: | ||
| traj.transfer_to_memory() | ||
| frames = traj.trajectory |
There was a problem hiding this comment.
You can just move the above frames = to below this if block to simplify
There was a problem hiding this comment.
Do you mean moving the frames = traj.trajectory line from the first line of the method (line 684) to after the if-block (line 709). If so, this would not work, since the else branch in the if-statement also uses the frames variable. Perhaps I'm misunderstanding what you meant.
There was a problem hiding this comment.
Ah my mistake, didn't see that
|
CHANGELOG entry and done? |
|
@richardjgowers Regarding your tangent, do you mean implementing a |
|
Regarding the CHANGELOG, there seem to be no entries for 0.16.0 yet, and I'm not entirely sure about the format, but here is a suggestion: Enhancements
|
|
@wouterboomsma thanks, took care of the CHANGELOG for you |
|
Put it under the 0.15.1 heading in your CHANGELOG. Once you merge/rebase against develop you will see that the 0.15.1 was changed to 0.16.0 Keep your entries shorter, put them under Enhancements, and reference the PR (or issue) number. Don't forget to add authors' handles to the entry and if they don't appear yet in AUTHORS also add everyone to that file. Oliver Beckstein
|
|
Hooray, congratulations and thanks a lot! Oliver Beckstein
|
- changed "Maintained by Oliver Beckstein and Elizabeth Denning" to "Maintained by the MDAnalysis Core Devs Team" - added Wouter Boomsma (for PR #976) - removed explicit mentions for KDTree (Hamelryck) because since 0.11.0 we do no longer include the actual code (see PR #395 and #383) - removed explicit mentions for helanal (Hall) because Ben Hall is a contributing author
|
Great! Thank you all for your time in reviewing this PR. |
* Added MemoryReader and tests. * Added documentation, and removed unused import * Minor bugfixes and documentation updates related to MemoryReader. * Changed reader format string: memory->MEMORY * Removed check for existence of group.universe * Use isinstance instead of checking against format class variable. * Inreased travis process timeout to 400s * Bugfix in MemoryReader: now forces numpy array copy when using in_memory option of Universe constructor * Extra documentation and minor bugfix
JmhX7IYP+2pnu4iv7mzp+xy8W5x1pzFOZhzfS4LslB2Ac+4SENM1eAZebiZQYSYl qyYkj9bLS2RqC40dQVfAA4dMrAuWpNfgZkTeCCU7pm29TodN0hmljNG/rLoGG/iH ieA+pSultKwyryY0UZXHAMmNRIViEfy1VsjE2Hswc/AIiY9W1yfLONXVjtELjFsU jv4u1ljR+lhuP6ZllEwucKeD5ojQDyXBU9WXPi2bUwFEFpNdn5zXVRTV/kntb86d sbOE1TQpy+WeSNTyqmEtdjQRuLH21Jm9kTnWFjkKjw2lL+odUOAoOEC4tLSin30i 6KK2lVvSfXJi9e5XOL+nXWqvwr+jU5jqNV+PC60aEe9N1xhVapYroY84bIJwcuYQ 2Axu0vJWNLOpnLGAH7kYYaeEFJdsZt283kDy1paagWmXqyy3p3hh1VD8NEWWcXQ4 Da07VzZh61RvLHFWIzyXkizPjaCMCfCPIpwRACht5OJdRH0LwEhGRhJhSHEUdh7p WGNEQr5XVnd2Xb+dHfpBeq6a0vPmOv8oDXzRCnGpDeTjGE3kEQy8PGeDqm4dsD+B wOmeCYN/NmUnurpZrW2x151bKgCbcjwiTksCsIhZmN1kRHqAR+NkoHSUOiji6INw LvpyU91X7ji1yyYlhPTu =N0P/ -----END PGP SIGNATURE----- updated AUTHORS - changed "Maintained by Oliver Beckstein and Elizabeth Denning" to "Maintained by the MDAnalysis Core Devs Team" - added Wouter Boomsma (for PR MDAnalysis#976) - removed explicit mentions for KDTree (Hamelryck) because since 0.11.0 we do no longer include the actual code (see PR MDAnalysis#395 and MDAnalysis#383) - removed explicit mentions for helanal (Hall) because Ben Hall is a contributing author
MemoryReader: Support for loading MDAnalysis trajectories into a numpy array, interfaced through the standard MDAnalysis reader interface. Makes it possible to manipulate coordinates in memory - for instance aligning frames in-memory.
PR Checklist