Change positions order from 'F' to 'C' == faster wrapping by zemanj · Pull Request #2211 · MDAnalysis/mdanalysis

zemanj · 2019-02-28T03:18:26Z

Fixes #2210

Changes made in this Pull Request:

Changed coordinates.base.Timestep.order to 'C'
Added optional keywords return_counts, return_masks, and assume_unsorted to lib.util.unique_int_1d() and improved its performance
Added lib.util.iscontiguous_int_1d()
Added lib.util.indices_to_slice_1d()
Added *Group.iscontiguous property
Added *Group._ix_or_slice property
Added AtomGroup._positions_view_or_copy property returning a view for sliceable AtomGroups (otherwise a copy)
Added AtomGroup._positions_c_view_or_copy property returning a C-contiguous view for contiguous AtomGroups (otherwise a copy)
Added lib.util.unique_masks_int_1d()
Added lib.util.argwhere_int_1d() (will be used in a future PR for faster unwrapping)
Added lib.util.check_compound()
Added AtomGroup.compound_indices()
Added lib.util.coords_add_vector()
Added lib.util._coords_add_vectors()
Replaced Universe._fragdict by Universe._fraginfo property (accelerates AtomGroup.fragments and AtomGroup.fragindices)
Improved performance of *Group.center(), especially for per-compound computations and contiguous AtomGroups
Improved performance of *Group.wrap(), especially for per-compound computations and contiguous AtomGroups (up to a factor of 39.3)

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

codecov · 2019-02-28T03:42:27Z

Codecov Report

Merging #2211 into develop will decrease coverage by 1.14%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop    #2211      +/-   ##
===========================================
- Coverage    90.71%   89.57%   -1.15%     
===========================================
  Files           15      158     +143     
  Lines         1982    19692   +17710     
  Branches         0     2760    +2760     
===========================================
+ Hits          1798    17639   +15841     
- Misses         184     1458    +1274     
- Partials         0      595     +595

Impacted Files	Coverage Δ
package/MDAnalysis/coordinates/DLPoly.py	`95.18% <100%> (ø)`
package/MDAnalysis/core/groups.py	`98.24% <100%> (ø)`
package/MDAnalysis/transformations/rotate.py	`100% <100%> (ø)`
package/MDAnalysis/coordinates/base.py	`93.3% <100%> (ø)`
package/MDAnalysis/analysis/rms.py	`90.66% <100%> (ø)`
package/MDAnalysis/core/universe.py	`95% <100%> (ø)`
package/MDAnalysis/core/topologyattrs.py	`97.75% <100%> (ø)`
package/MDAnalysis/transformations/translate.py	`100% <100%> (ø)`
package/MDAnalysis/lib/util.py	`88.59% <100%> (ø)`
coordinates/base.py	`94.79% <0%> (ø)`	⬆️
... and 144 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9428866...09a1fbe. Read the comment docs.

zemanj · 2019-02-28T03:56:56Z

In some tests, Travis/conda/whatever seems to screw up:
unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory

EDIT:
Things seem to miraculously work now. If this happens again, we might consider adding gxx_linux-64 to CONDA_MIN_DEPENDENCIES in the Linux tests.

…- to C-contiguous

…or contiguous AtomGroups)

…argwhere_int_1d()

…ib.util.unique_int_1d

…nd fragments faster

…al functions

…upBase._compound_indices() to AtomGroup.compound_indices()

…rameter to unique_int_1d and unique_masks_1d

…ns by AtomGroup._positions_view_or_copy and AtomGroup._positions_c_view_or_copy; use new util functions in *Group.wrap() and *Group.center(); new parameter check_weights in *Group.center()

zemanj · 2019-03-26T16:52:00Z

Performance Report

Test Code

# -*- coding: utf-8 -*-
import numpy as np
import MDAnalysis as mda
from timeit import timeit


########## HELPER FUNCTIONS ##########

def run(func, nframes):
    for ts in u.trajectory[:nframes]:
        func()

def trajectory_read(nframes):
    for ts in u.trajectory[:nframes]:
        pass

def time_group_wrap(group, compound, center, inplace, nruns, nframes_per_run,
                    t_read):
    func = lambda: group.wrap(compound=compound, center=center, inplace=inplace)
    t_wrap = timeit(lambda: run(func, nframes_per_run), number=nruns)
    t_wrap *= 1.0e6 / (nruns * nframes_per_run)
    t = t_wrap - t_read
    print("| {:9s} | {:6s} | {:7s} | {:10.1f} |"
          "".format(compound, center, str(inplace), t))

def run_measurements(groups, compounds, centers, nruns, nframes_per_run):
    t_read = timeit(lambda: trajectory_read(nframes_per_run), number=nruns)
    t_read *= 1.0e6 / (nruns * nframes_per_run)
    for cont, group in groups:
        print("\n{} {}\n".format(cont, repr(group).strip("<>")))
        print("| compound  | center | inplace | µs / frame |")
        print("| :---      | :---:  | :---    |       ---: |")
        for compound in compounds:
            if compound == 'atoms':
                _centers = ['n/a']
            else:
                _centers = centers
            for center in _centers:
                for inplace in [False, True]:
                    time_group_wrap(group, compound, center, inplace,
                                    nruns, nframes_per_run, t_read)


########## SETUP ##########

# Load Universe:
u = mda.Universe("topol.tpr", "traj_comp.xtc")
# populate u._fraginfo:
_ = u.atoms.fragments

# Set up groups, compounds, and centers:
np.random.seed(1337)
rnd_aix = u.atoms.ix.copy()
np.random.shuffle(rnd_aix)
groups = [("contiguous", u.atoms),
          ("non-contiguous", u.atoms[::-1]),
          ("scrambled", u.atoms[rnd_aix])]
compounds = ['atoms', 'group', 'segments', 'residues', 'molecules', 'fragments']
centers = ['com', 'cog']

# measurement parameters:
nruns = 1000
nframes_per_run = 100

print("{} runs, {} frames / run".format(nruns, nframes_per_run))


########## MEASUREMENTS ##########

run_measurements(groups, compounds, centers, nruns, nframes_per_run)

Runtime Comparison

Test CPU: AMD Ryzen Threadripper 1920X 12-Core Processor @ 3.5 GHz, process pinned to core 0
Test data: GROMACS xtc trajectory from a simulation of Neat BMIm BF4, 500 ion pairs (15000 atoms) in a fully periodic cubic unit cell (NpT ensemble) read from a RAM disk
Test configuration: 1000 runs, 100 frames / run

Note that the runtimes presented below are execution times of AtomGroup.wrap() only, i.e., the time needed for trajectory reading time has been subtracted.

contiguous AtomGroup with 15000 atoms (`u.atoms`)

compound	center	inplace	µs / frame (new)	µs / frame (old)	speedup factor (old/new)
atoms	n/a	False	199.6	211.3	1.06
atoms	n/a	True	49.8	394.7	7.93
group	com	False	238.2	722.4	3.03
group	com	True	85.3	905.6	10.62
group	cog	False	206.4	599.8	2.91
group	cog	True	56.0	782.7	13.98
segments	com	False	489.3	2011.1	4.11
segments	com	True	339.9	2194.2	6.46
segments	cog	False	457.6	1858.4	4.06
segments	cog	True	299.7	2041.2	6.81
residues	com	False	1198.2	33676.5	28.11
residues	com	True	1035.7	33874.6	32.71
residues	cog	False	1147.2	33407.9	29.12
residues	cog	True	994.9	33745.0	33.92
molecules	com	False	843.2	18311.0	21.72
molecules	com	True	690.1	18520.9	26.84
molecules	cog	False	803.5	18148.0	22.59
molecules	cog	True	650.1	18377.9	28.27
fragments	com	False	829.2	25149.5	30.33
fragments	com	True	674.4	25387.2	37.64
fragments	cog	False	788.2	24823.8	31.49
fragments	cog	True	635.7	25004.9	39.33
average	com/cog	False	654.6	14447.2	22.07
average	com/cog	True	501.1	14657.2	29.25
average	com/cog	False/True	577.8	14552.2	25.18

non-contiguous AtomGroup with 15000 atoms (`u.atoms[::-1]`)

compound	center	inplace	µs / frame (new)	µs / frame (old)	speedup factor (old/new)
atoms	n/a	False	197.9	213.0	1.08
atoms	n/a	True	176.8	415.9	2.35
group	com	False	233.3	724.2	3.10
group	com	True	140.6	906.3	6.45
group	cog	False	205.5	606.5	2.95
group	cog	True	112.7	793.2	7.04
segments	com	False	479.4	2026.0	4.23
segments	com	True	386.6	2206.9	5.71
segments	cog	False	440.4	1872.4	4.25
segments	cog	True	348.1	2062.9	5.93
residues	com	False	1183.0	33750.9	28.53
residues	com	True	1079.9	33996.4	31.48
residues	cog	False	1136.6	33460.7	29.44
residues	cog	True	1040.0	33677.7	32.38
molecules	com	False	854.0	18210.1	21.32
molecules	com	True	766.6	18426.2	24.04
molecules	cog	False	816.3	18034.4	22.09
molecules	cog	True	726.9	18206.0	25.05
fragments	com	False	840.0	25127.3	29.91
fragments	com	True	753.1	25364.3	33.68
fragments	cog	False	798.9	24892.8	31.16
fragments	cog	True	709.5	25024.5	35.27
average	com/cog	False	653.2	14447.1	22.12
average	com/cog	True	567.3	14643.7	25.81
average	com/cog	False/True	610.3	14545.4	23.83

scrambled AtomGroup with 15000 atoms (`u.atoms[shuffled_indices]`)

compound	center	inplace	µs / frame (new)	µs / frame (old)	speedup factor (old/new)
atoms	n/a	False	201.7	213.2	1.06
atoms	n/a	True	378.4	396.1	1.05
group	com	False	238.7	723.0	3.03
group	com	True	411.8	904.5	2.20
group	cog	False	217.0	599.8	2.76
group	cog	True	383.6	784.2	2.04
segments	com	False	538.4	2015.5	3.74
segments	com	True	705.2	2201.4	3.12
segments	cog	False	489.3	1863.1	3.81
segments	cog	True	670.7	2047.4	3.05
residues	com	False	1771.1	33739.9	19.05
residues	com	True	1940.2	33851.5	17.45
residues	cog	False	1719.7	33423.1	19.44
residues	cog	True	1892.0	34512.8	18.24
molecules	com	False	1377.8	18672.9	13.55
molecules	com	True	1546.2	18348.2	11.87
molecules	cog	False	1323.1	17975.6	13.59
molecules	cog	True	1494.7	18158.8	12.15
fragments	com	False	1359.3	25020.9	18.41
fragments	com	True	1528.4	25237.3	16.51
fragments	cog	False	1319.2	24755.0	18.77
fragments	cog	True	1485.6	24898.2	16.76
average	com/cog	False	959.6	14454.7	15.06
average	com/cog	True	1130.6	14667.3	12.97
average	com/cog	False/True	1045.1	14561.0	13.93

zemanj · 2019-03-26T17:42:54Z

@MDAnalysis/coredevs Should be good to go. I'm sorry that this got a little out of hand, maybe I should have split that into several PRs...

tylerjereddy · 2019-03-29T03:29:52Z

package/MDAnalysis/lib/_cutil.pyx

+        coordinates[i, 2] = coordinates[i, 2] + vector[2]
+
+
+def _coords_add_vectors(np.ndarray[np.float32_t, ndim=2] coordinates not None,


Is there a compelling reason to use the old buffer interface instead of a typed memoryview here?

@tylerjereddy Yes, there is. Depending on input dtype and/or shape, I use numpy operations such as ndarray.astype() or radds like coordinates += vectors. The problem with memoryviews is that they don't offer such functionality. Having memoryviews as input would require unnecessary instantiations of ndarray objects with np.asarray() prior to these operations... You see my point?
So we really do want to have numpy.ndarray objects as input here. As you might have noticed, I also initialize a memoryview of the same data in line 203:

cdef float[:, :] coords = coordinates

Whenever I use custom cython operations, this memoryview is used instead of the ndarray objects.

I still try to guard against introduction of the old buffer interface in new code. Usually there is a way to work around using the old approach with careful design.

I agree it is a pain to deal with asarray() stuff a lot.

Usually there is a way to work around using the old approach with careful design.

Do you have anything specific in mind? Like a Python wrapper taking care of array initialization and returning? (If that's not too costly)

Not sure, but probably not a blocking issue here if you don't have something in mind already.

I'm also not sure if Cython will ever really deprecate the old approach. I've had a hard time replacing the old buffers with memoryviews in SciPy for the reasons you outline--reviewers don't like performance drops when the buffer is just a medium for transmitting the ndarray.

tylerjereddy · 2019-03-29T03:39:53Z

package/MDAnalysis/lib/_cutil.pyx

+    .. versionadded:: 0.20.0
+    """
+    cdef np.intp_t n = coordinates.shape[0]
+    cdef np.ndarray[np.float64_t, ndim=2] center


old buffer interface

This is there for a reason:
The center object is what the function returns in the end, and that should be a numpy.ndarray.
If I used a memoryview instead, I'd have to return np.asarray(center). Since center is initialized with np.zeros(...), which returns an ndarray anyway, this would mean throwing away this first ndarray object and creating a new one in the return statement, effectively killing performance.

tylerjereddy · 2019-03-29T03:42:44Z

package/MDAnalysis/lib/_cutil.pyx

+    cdef np.intp_t n_unique
+    cdef np.ndarray[np.intp_t, ndim=1] counts
+    cdef np.ndarray[object, ndim=1] masks
+    cdef np.ndarray[np.intp_t, ndim=1] unique = np.empty(n_values,


old buffer interfaces on lines above

That's here for the same reasons as above: counts, masks, and unique are the arrays being returned in the end and we want to avoid unnecessary calls to np.asarray(). Note that the low-level implementations of unique_int_1d() operate on memoryviews of these buffers.

tylerjereddy · 2019-03-29T03:45:41Z

package/MDAnalysis/lib/_cutil.pyx

+
+    .. versionadded:: 0.20.0
+    """
+    cdef np.ndarray[np.intp_t, ndim=1] result = np.empty(arr.shape[0], dtype=np.intp)


old buffer interface

Same reason as above

richardjgowers · 2019-04-03T11:04:36Z

package/MDAnalysis/lib/_cutil.pyx

+    return (x == INFINITY) | (x == -INFINITY)
+
+
+def coords_add_vector(float[:, :] coordinates not None,


I don't really get this, but I guess I need to find the time to play with it myself. I don't like the idea that we're implementing what should already be done optimally in numpy. If we start having functions like this, then the library could quickly balloon with similar....

I absolutely agree that this should be done with numpy. However, the current implementation in numpy is dead slow (about a factor 10 slower than it could be) due of the way numpy implements broadcasting (resulting in a gazillion of __memmove_ssse3()s) before calling its vectorized addition routine (sse2_binary_add_FLOAT()). I should probably think about a more efficient way of implementing this in numpy instead of implementing a workaraound in MDAnalysis...

+1 for moving upstream instead of here if there's some fundamental flaw in NumPy operations that can be changed without drawbacks

@tylerjereddy I'm currently investigating where and how shortcuts in/around numpy's npyiter_copy_to_buffers() might be implemented. Unfortunately, understanding this section of the numpy code base is not an entirely trivial task...

@zemanj I went ahead and tried to summarize your idea in the weekly NumPy meeting community topics section. I hope that's ok & you are welcome to join / listen in on the call using the link on that doc.

Feel free to clean up my summary there too--anyone can edit that.

The call is in a few hours--12 pm Pacific, which may be rather late for you.

@tylerjereddy Thanks! I added two links, one to the gdb log and one to the call stack visualization I posted in the discussion of #2210. I won't be able to join the meeting today, so feel free to postpone the topic.

This part of numpy is really complex and probably can be improved. And yes it is complex...

The buffer code should skip copying to the buffer when the correct data is already in it (at least in most cases). But I am not sure it is very smart about trying to ensure that this actually happens. I have to check the exact case you are observing, possibly there are also some cases which could dispatch to a non-buffered path, but do not?

richardjgowers

I think this needs the reimplementation of basic numpy operations (coords += pos) removed before we can merge

zemanj · 2019-05-09T18:11:10Z

I agree, and consequently, I should also revert the change in memory layout. This has to wait until numpy offers more efficient in-place operations for C-contiguous arrays.

…

On May 8, 2019 12:06:46 PM GMT+02:00, Richard Gowers ***@***.***> wrote: richardjgowers requested changes on this pull request. I think this needs the reimplementation of basic numpy operations (coords += pos) removed before we can merge -- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: #2211 (review)

IAlibay · 2021-05-23T18:21:53Z

@zemanj @richardjgowers there's a ton of improvement here that aren't what was implemented in #3132 - (i.e. going beyond 'F' -> 'C') probably worth opening a new PR for this that targets that?

zemanj changed the title ~~Chang positions order from 'F' to 'C' == faster wrapping~~ Change positions order from 'F' to 'C' == faster wrapping Feb 28, 2019

zemanj mentioned this pull request Feb 28, 2019

Change Memory layout of positions from Fortran- to C-contiguous #2210

Closed

zemanj added performance Work in progress labels Feb 28, 2019

zemanj added 17 commits March 13, 2019 03:52

Change memory model of positions, velocities, and forces from Fortran…

dfb21e6

…- to C-contiguous

added lib.util.isrange_int_1d()

d24172f

added *Group.iscontiguous and AtomGroup._positions (coordinate view f…

77eeded

…or contiguous AtomGroups)

return_counts parameter for lib.util.unique_int_1d(); added lib.util.…

3c8ba37

…argwhere_int_1d()

added versionchanged entry for coordinates.base.Timestep (order F -> C)

33d9572

added AtomGroup._check_compound() and AtomGroup._compound_indices()

9da245e

docstring for AtomGroup._check_compound()

ee842f7

use new memory layout and tools to make *Group.center() faster

4e84bfe

accelerate *Group.wrap(), especially for contiguous AtomGroups

e59a35d

rename lib.util.isrange_int_1d() ==> lib.util.iscontiguous_int_1d()

d44cdd7

ramp up coverage

3e76d83

assert AtomGroup._positions is C-contiguous

b8ea7fe

added lib.util.unique_masks_int_1d; added return_masks parameter to l…

0be18b6

…ib.util.unique_int_1d

use unique_masks_int_1d in group.wrap()

9965ccf

replace Universe._fragdict by Universe._fraginfo to get fragindices a…

5861116

…nd fragments faster

add lib.util.coords_add_vec()

d7f0850

alleviate performance impact of C-contiguous position arrays on sever…

b9b752a

…al functions

zemanj force-pushed the faster_wrapping branch from cffeb6a to b9b752a Compare March 13, 2019 02:53

zemanj added 6 commits March 13, 2019 16:15

moved atoms._check_compound() to lib.util.check_compound(); moved Gro…

8506fe4

…upBase._compound_indices() to AtomGroup.compound_indices()

added lib.util.indices_to_slice_1d()

40c5a6f

renamed coords_add_vec -> coords_add_vector; added assume_unsorted pa…

35bc6f5

…rameter to unique_int_1d and unique_masks_1d

added util._coords_add_vectors() and util.coords_center()

78e0c95

tests for new util/_cutil functions

f09a7d4

added new property AtomGroup._ix_or_slice; replaced Atomroup._positio…

a6b21cd

…ns by AtomGroup._positions_view_or_copy and AtomGroup._positions_c_view_or_copy; use new util functions in *Group.wrap() and *Group.center(); new parameter check_weights in *Group.center()

zemanj removed the Work in progress label Mar 20, 2019

zemanj added 4 commits March 26, 2019 12:42

fixed *Group.iscontiguous side effects

6e6a9cc

test for *Group._ix_or_slice

c9d9fed

changelog entries for PR MDAnalysis#2211

ff8c9dd

make *Group.iscontiguous a tiny bit faster

09a1fbe

tylerjereddy reviewed Mar 29, 2019

View reviewed changes

richardjgowers reviewed Apr 3, 2019

View reviewed changes

tylerjereddy mentioned this pull request Apr 11, 2019

ENH: broadcasting perf on C-contiguous arrays numpy/numpy#13307

Open

richardjgowers requested changes May 8, 2019

View reviewed changes

IAlibay added the close? Evaluate if issue/PR is stale and can be closed. label May 18, 2021

richardjgowers closed this May 23, 2021

		coordinates[i, 2] = coordinates[i, 2] + vector[2]


		def _coords_add_vectors(np.ndarray[np.float32_t, ndim=2] coordinates not None,

		return (x == INFINITY) \| (x == -INFINITY)


		def coords_add_vector(float[:, :] coordinates not None,

Conversation

zemanj commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

codecov bot commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

zemanj commented Feb 28, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zemanj commented Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Report

Test Code

Runtime Comparison

contiguous AtomGroup with 15000 atoms (u.atoms)

non-contiguous AtomGroup with 15000 atoms (u.atoms[::-1])

scrambled AtomGroup with 15000 atoms (u.atoms[shuffled_indices])

Uh oh!

zemanj commented Mar 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zemanj Apr 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers left a comment

Choose a reason for hiding this comment

Uh oh!

zemanj commented May 9, 2019 via email

Uh oh!

IAlibay commented May 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

zemanj commented Feb 28, 2019 •

edited

Loading

codecov bot commented Feb 28, 2019 •

edited

Loading

zemanj commented Feb 28, 2019 •

edited

Loading

zemanj commented Mar 26, 2019 •

edited

Loading

contiguous AtomGroup with 15000 atoms (`u.atoms`)

non-contiguous AtomGroup with 15000 atoms (`u.atoms[::-1]`)

scrambled AtomGroup with 15000 atoms (`u.atoms[shuffled_indices]`)

zemanj Apr 8, 2019 •

edited

Loading