Skip to content

Change positions order from 'F' to 'C' == faster wrapping#2211

Closed
zemanj wants to merge 27 commits intoMDAnalysis:developfrom
zemanj:faster_wrapping
Closed

Change positions order from 'F' to 'C' == faster wrapping#2211
zemanj wants to merge 27 commits intoMDAnalysis:developfrom
zemanj:faster_wrapping

Conversation

@zemanj
Copy link
Member

@zemanj zemanj commented Feb 28, 2019

Fixes #2210

Changes made in this Pull Request:

  • Changed coordinates.base.Timestep.order to 'C'
  • Added optional keywords return_counts, return_masks, and assume_unsorted to lib.util.unique_int_1d() and improved its performance
  • Added lib.util.iscontiguous_int_1d()
  • Added lib.util.indices_to_slice_1d()
  • Added *Group.iscontiguous property
  • Added *Group._ix_or_slice property
  • Added AtomGroup._positions_view_or_copy property returning a view for sliceable AtomGroups (otherwise a copy)
  • Added AtomGroup._positions_c_view_or_copy property returning a C-contiguous view for contiguous AtomGroups (otherwise a copy)
  • Added lib.util.unique_masks_int_1d()
  • Added lib.util.argwhere_int_1d() (will be used in a future PR for faster unwrapping)
  • Added lib.util.check_compound()
  • Added AtomGroup.compound_indices()
  • Added lib.util.coords_add_vector()
  • Added lib.util._coords_add_vectors()
  • Replaced Universe._fragdict by Universe._fraginfo property (accelerates AtomGroup.fragments and AtomGroup.fragindices)
  • Improved performance of *Group.center(), especially for per-compound computations and contiguous AtomGroups
  • Improved performance of *Group.wrap(), especially for per-compound computations and contiguous AtomGroups (up to a factor of 39.3)

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@zemanj zemanj changed the title Chang positions order from 'F' to 'C' == faster wrapping Change positions order from 'F' to 'C' == faster wrapping Feb 28, 2019
@codecov
Copy link

codecov bot commented Feb 28, 2019

Codecov Report

Merging #2211 into develop will decrease coverage by 1.14%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2211      +/-   ##
===========================================
- Coverage    90.71%   89.57%   -1.15%     
===========================================
  Files           15      158     +143     
  Lines         1982    19692   +17710     
  Branches         0     2760    +2760     
===========================================
+ Hits          1798    17639   +15841     
- Misses         184     1458    +1274     
- Partials         0      595     +595
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/DLPoly.py 95.18% <100%> (ø)
package/MDAnalysis/core/groups.py 98.24% <100%> (ø)
package/MDAnalysis/transformations/rotate.py 100% <100%> (ø)
package/MDAnalysis/coordinates/base.py 93.3% <100%> (ø)
package/MDAnalysis/analysis/rms.py 90.66% <100%> (ø)
package/MDAnalysis/core/universe.py 95% <100%> (ø)
package/MDAnalysis/core/topologyattrs.py 97.75% <100%> (ø)
package/MDAnalysis/transformations/translate.py 100% <100%> (ø)
package/MDAnalysis/lib/util.py 88.59% <100%> (ø)
coordinates/base.py 94.79% <0%> (ø) ⬆️
... and 144 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9428866...09a1fbe. Read the comment docs.

@zemanj
Copy link
Member Author

zemanj commented Feb 28, 2019

In some tests, Travis/conda/whatever seems to screw up:
unable to execute 'x86_64-conda_cos6-linux-gnu-gcc': No such file or directory

EDIT:
Things seem to miraculously work now. If this happens again, we might consider adding gxx_linux-64 to CONDA_MIN_DEPENDENCIES in the Linux tests.

zemanj added 6 commits March 13, 2019 16:15
…upBase._compound_indices() to AtomGroup.compound_indices()
…rameter to unique_int_1d and unique_masks_1d
…ns by AtomGroup._positions_view_or_copy and AtomGroup._positions_c_view_or_copy; use new util functions in *Group.wrap() and *Group.center(); new parameter check_weights in *Group.center()
@zemanj
Copy link
Member Author

zemanj commented Mar 26, 2019

Performance Report

Test Code

# -*- coding: utf-8 -*-
import numpy as np
import MDAnalysis as mda
from timeit import timeit


########## HELPER FUNCTIONS ##########

def run(func, nframes):
    for ts in u.trajectory[:nframes]:
        func()

def trajectory_read(nframes):
    for ts in u.trajectory[:nframes]:
        pass

def time_group_wrap(group, compound, center, inplace, nruns, nframes_per_run,
                    t_read):
    func = lambda: group.wrap(compound=compound, center=center, inplace=inplace)
    t_wrap = timeit(lambda: run(func, nframes_per_run), number=nruns)
    t_wrap *= 1.0e6 / (nruns * nframes_per_run)
    t = t_wrap - t_read
    print("| {:9s} | {:6s} | {:7s} | {:10.1f} |"
          "".format(compound, center, str(inplace), t))

def run_measurements(groups, compounds, centers, nruns, nframes_per_run):
    t_read = timeit(lambda: trajectory_read(nframes_per_run), number=nruns)
    t_read *= 1.0e6 / (nruns * nframes_per_run)
    for cont, group in groups:
        print("\n{} {}\n".format(cont, repr(group).strip("<>")))
        print("| compound  | center | inplace | µs / frame |")
        print("| :---      | :---:  | :---    |       ---: |")
        for compound in compounds:
            if compound == 'atoms':
                _centers = ['n/a']
            else:
                _centers = centers
            for center in _centers:
                for inplace in [False, True]:
                    time_group_wrap(group, compound, center, inplace,
                                    nruns, nframes_per_run, t_read)


########## SETUP ##########

# Load Universe:
u = mda.Universe("topol.tpr", "traj_comp.xtc")
# populate u._fraginfo:
_ = u.atoms.fragments

# Set up groups, compounds, and centers:
np.random.seed(1337)
rnd_aix = u.atoms.ix.copy()
np.random.shuffle(rnd_aix)
groups = [("contiguous", u.atoms),
          ("non-contiguous", u.atoms[::-1]),
          ("scrambled", u.atoms[rnd_aix])]
compounds = ['atoms', 'group', 'segments', 'residues', 'molecules', 'fragments']
centers = ['com', 'cog']

# measurement parameters:
nruns = 1000
nframes_per_run = 100

print("{} runs, {} frames / run".format(nruns, nframes_per_run))


########## MEASUREMENTS ##########

run_measurements(groups, compounds, centers, nruns, nframes_per_run)

Runtime Comparison

Test CPU: AMD Ryzen Threadripper 1920X 12-Core Processor @ 3.5 GHz, process pinned to core 0
Test data: GROMACS xtc trajectory from a simulation of Neat BMIm BF4, 500 ion pairs (15000 atoms) in a fully periodic cubic unit cell (NpT ensemble) read from a RAM disk
Test configuration: 1000 runs, 100 frames / run

Note that the runtimes presented below are execution times of AtomGroup.wrap() only, i.e., the time needed for trajectory reading time has been subtracted.

contiguous AtomGroup with 15000 atoms (u.atoms)

compound center inplace µs / frame (new) µs / frame (old) speedup factor (old/new)
atoms n/a False 199.6 211.3 1.06
atoms n/a True 49.8 394.7 7.93
group com False 238.2 722.4 3.03
group com True 85.3 905.6 10.62
group cog False 206.4 599.8 2.91
group cog True 56.0 782.7 13.98
segments com False 489.3 2011.1 4.11
segments com True 339.9 2194.2 6.46
segments cog False 457.6 1858.4 4.06
segments cog True 299.7 2041.2 6.81
residues com False 1198.2 33676.5 28.11
residues com True 1035.7 33874.6 32.71
residues cog False 1147.2 33407.9 29.12
residues cog True 994.9 33745.0 33.92
molecules com False 843.2 18311.0 21.72
molecules com True 690.1 18520.9 26.84
molecules cog False 803.5 18148.0 22.59
molecules cog True 650.1 18377.9 28.27
fragments com False 829.2 25149.5 30.33
fragments com True 674.4 25387.2 37.64
fragments cog False 788.2 24823.8 31.49
fragments cog True 635.7 25004.9 39.33
average com/cog False 654.6 14447.2 22.07
average com/cog True 501.1 14657.2 29.25
average com/cog False/True 577.8 14552.2 25.18

non-contiguous AtomGroup with 15000 atoms (u.atoms[::-1])

compound center inplace µs / frame (new) µs / frame (old) speedup factor (old/new)
atoms n/a False 197.9 213.0 1.08
atoms n/a True 176.8 415.9 2.35
group com False 233.3 724.2 3.10
group com True 140.6 906.3 6.45
group cog False 205.5 606.5 2.95
group cog True 112.7 793.2 7.04
segments com False 479.4 2026.0 4.23
segments com True 386.6 2206.9 5.71
segments cog False 440.4 1872.4 4.25
segments cog True 348.1 2062.9 5.93
residues com False 1183.0 33750.9 28.53
residues com True 1079.9 33996.4 31.48
residues cog False 1136.6 33460.7 29.44
residues cog True 1040.0 33677.7 32.38
molecules com False 854.0 18210.1 21.32
molecules com True 766.6 18426.2 24.04
molecules cog False 816.3 18034.4 22.09
molecules cog True 726.9 18206.0 25.05
fragments com False 840.0 25127.3 29.91
fragments com True 753.1 25364.3 33.68
fragments cog False 798.9 24892.8 31.16
fragments cog True 709.5 25024.5 35.27
average com/cog False 653.2 14447.1 22.12
average com/cog True 567.3 14643.7 25.81
average com/cog False/True 610.3 14545.4 23.83

scrambled AtomGroup with 15000 atoms (u.atoms[shuffled_indices])

compound center inplace µs / frame (new) µs / frame (old) speedup factor (old/new)
atoms n/a False 201.7 213.2 1.06
atoms n/a True 378.4 396.1 1.05
group com False 238.7 723.0 3.03
group com True 411.8 904.5 2.20
group cog False 217.0 599.8 2.76
group cog True 383.6 784.2 2.04
segments com False 538.4 2015.5 3.74
segments com True 705.2 2201.4 3.12
segments cog False 489.3 1863.1 3.81
segments cog True 670.7 2047.4 3.05
residues com False 1771.1 33739.9 19.05
residues com True 1940.2 33851.5 17.45
residues cog False 1719.7 33423.1 19.44
residues cog True 1892.0 34512.8 18.24
molecules com False 1377.8 18672.9 13.55
molecules com True 1546.2 18348.2 11.87
molecules cog False 1323.1 17975.6 13.59
molecules cog True 1494.7 18158.8 12.15
fragments com False 1359.3 25020.9 18.41
fragments com True 1528.4 25237.3 16.51
fragments cog False 1319.2 24755.0 18.77
fragments cog True 1485.6 24898.2 16.76
average com/cog False 959.6 14454.7 15.06
average com/cog True 1130.6 14667.3 12.97
average com/cog False/True 1045.1 14561.0 13.93

@zemanj
Copy link
Member Author

zemanj commented Mar 26, 2019

@MDAnalysis/coredevs Should be good to go. I'm sorry that this got a little out of hand, maybe I should have split that into several PRs...

coordinates[i, 2] = coordinates[i, 2] + vector[2]


def _coords_add_vectors(np.ndarray[np.float32_t, ndim=2] coordinates not None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a compelling reason to use the old buffer interface instead of a typed memoryview here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerjereddy Yes, there is. Depending on input dtype and/or shape, I use numpy operations such as ndarray.astype() or radds like coordinates += vectors. The problem with memoryviews is that they don't offer such functionality. Having memoryviews as input would require unnecessary instantiations of ndarray objects with np.asarray() prior to these operations... You see my point?
So we really do want to have numpy.ndarray objects as input here. As you might have noticed, I also initialize a memoryview of the same data in line 203:

cdef float[:, :] coords = coordinates

Whenever I use custom cython operations, this memoryview is used instead of the ndarray objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still try to guard against introduction of the old buffer interface in new code. Usually there is a way to work around using the old approach with careful design.

I agree it is a pain to deal with asarray() stuff a lot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually there is a way to work around using the old approach with careful design.

Do you have anything specific in mind? Like a Python wrapper taking care of array initialization and returning? (If that's not too costly)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, but probably not a blocking issue here if you don't have something in mind already.

I'm also not sure if Cython will ever really deprecate the old approach. I've had a hard time replacing the old buffers with memoryviews in SciPy for the reasons you outline--reviewers don't like performance drops when the buffer is just a medium for transmitting the ndarray.

.. versionadded:: 0.20.0
"""
cdef np.intp_t n = coordinates.shape[0]
cdef np.ndarray[np.float64_t, ndim=2] center
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old buffer interface

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is there for a reason:
The center object is what the function returns in the end, and that should be a numpy.ndarray.
If I used a memoryview instead, I'd have to return np.asarray(center). Since center is initialized with np.zeros(...), which returns an ndarray anyway, this would mean throwing away this first ndarray object and creating a new one in the return statement, effectively killing performance.

cdef np.intp_t n_unique
cdef np.ndarray[np.intp_t, ndim=1] counts
cdef np.ndarray[object, ndim=1] masks
cdef np.ndarray[np.intp_t, ndim=1] unique = np.empty(n_values,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old buffer interfaces on lines above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's here for the same reasons as above: counts, masks, and unique are the arrays being returned in the end and we want to avoid unnecessary calls to np.asarray(). Note that the low-level implementations of unique_int_1d() operate on memoryviews of these buffers.


.. versionadded:: 0.20.0
"""
cdef np.ndarray[np.intp_t, ndim=1] result = np.empty(arr.shape[0], dtype=np.intp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old buffer interface

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above

return (x == INFINITY) | (x == -INFINITY)


def coords_add_vector(float[:, :] coordinates not None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get this, but I guess I need to find the time to play with it myself. I don't like the idea that we're implementing what should already be done optimally in numpy. If we start having functions like this, then the library could quickly balloon with similar....

Copy link
Member Author

@zemanj zemanj Apr 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I absolutely agree that this should be done with numpy. However, the current implementation in numpy is dead slow (about a factor 10 slower than it could be) due of the way numpy implements broadcasting (resulting in a gazillion of __memmove_ssse3()s) before calling its vectorized addition routine (sse2_binary_add_FLOAT()). I should probably think about a more efficient way of implementing this in numpy instead of implementing a workaraound in MDAnalysis...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for moving upstream instead of here if there's some fundamental flaw in NumPy operations that can be changed without drawbacks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerjereddy I'm currently investigating where and how shortcuts in/around numpy's npyiter_copy_to_buffers() might be implemented. Unfortunately, understanding this section of the numpy code base is not an entirely trivial task...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zemanj I went ahead and tried to summarize your idea in the weekly NumPy meeting community topics section. I hope that's ok & you are welcome to join / listen in on the call using the link on that doc.

Feel free to clean up my summary there too--anyone can edit that.

The call is in a few hours--12 pm Pacific, which may be rather late for you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tylerjereddy Thanks! I added two links, one to the gdb log and one to the call stack visualization I posted in the discussion of #2210. I won't be able to join the meeting today, so feel free to postpone the topic.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of numpy is really complex and probably can be improved. And yes it is complex...

The buffer code should skip copying to the buffer when the correct data is already in it (at least in most cases). But I am not sure it is very smart about trying to ensure that this actually happens. I have to check the exact case you are observing, possibly there are also some cases which could dispatch to a non-buffered path, but do not?

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs the reimplementation of basic numpy operations (coords += pos) removed before we can merge

@zemanj
Copy link
Member Author

zemanj commented May 9, 2019 via email

@IAlibay IAlibay added the close? Evaluate if issue/PR is stale and can be closed. label May 18, 2021
@IAlibay
Copy link
Member

IAlibay commented May 23, 2021

@zemanj @richardjgowers there's a ton of improvement here that aren't what was implemented in #3132 - (i.e. going beyond 'F' -> 'C') probably worth opening a new PR for this that targets that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

close? Evaluate if issue/PR is stale and can be closed. performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Change Memory layout of positions from Fortran- to C-contiguous

5 participants