Augment Coordinates for periodic Boundary conditions by ayushsuhane · Pull Request #1977 · MDAnalysis/mdanalysis

ayushsuhane · 2018-07-08T21:13:53Z

Changes made in this Pull Request:

Cython implementation of adding duplicate particles for periodic boundary conditions
class Periodic_cKDTree similar to PeriodicKDTree but a wrapper around more fast scipy.spatial.cKDTree.

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

ayushsuhane · 2018-07-08T21:22:02Z

Since replacement of Bio.cKDTree with scipy.spatial.cKDTree is beneficial, is this the correct way forward?

The class Periodic_cKDTree deals with both periodic and non-periodic boundary conditons. I have also defined a search_query function which can directly be used in guess_bonds in MDAnalysis.topology.guessers.py. With capped_distance and Periodic_cKDTree, the selection.py can be rewritten in a way such that it would be easy to include the grid search algorithm in it as well, where the actual algorithm will be hidden(but optional) from the user.

richardjgowers · 2018-07-08T21:35:06Z

package/MDAnalysis/lib/c_distances.pyx

+      Input coordinate array to generate duplicate images
+    box : array
+      Dimensions of the box of shape (3, 3)
+    reciprocal : array


is this the reciprocal of box? if so, we can just calculate it in the function

richardjgowers · 2018-07-08T21:35:34Z

package/MDAnalysis/lib/c_distances.pyx

+    cdef float other[3]
+
+    cdef int dim
+    dim = coordinates.shape[1]


we can hardcode dim to 3, it might sometimes help the compiler

richardjgowers · 2018-07-08T21:36:11Z

package/MDAnalysis/lib/c_distances.pyx

+                    indices[p] = i
+                    p += 1
+
+


if you can fix up the formatting it would be easier to read, there's 3 blank lines here

richardjgowers · 2018-07-08T21:36:40Z

package/MDAnalysis/lib/c_distances.pyx

+    cdef float sum1
+    cdef ssize_t dim
+
+    dim=3


hardcode dim to 3

coveralls · 2018-07-08T21:41:41Z

Coverage decreased (-0.4%) to 89.584% when pulling 408e12c on ayushsuhane:augment into 1a1b2f1 on MDAnalysis:develop.

richardjgowers · 2018-07-08T22:25:03Z

package/MDAnalysis/lib/c_distances.pyx

+def undo_augment(int[:] results, int[:] translation, int nreal):
+    """Translate augmented indices back to originals
+
+    Note: modifies results in place!


This doesn't do the modification in place any more, is there a reason you do the copy now instead?

Actually, I thought for complex selections it might be useful to keep all the particles (original + images). But it looks like its better to modify the results, otherwise it will lead to multiple computations of similar points with increase in number of selections(operations).
Meanwhile, I will also try to think of a case, where it might fail.

ayushsuhane · 2018-07-10T05:56:09Z

I was working on the tests, but got stuck in few problems and would like to share them here:

As you might see in the augment function, the shape of output array is (N, 3). If I supply only a single coordinate which lies close to the face and/or edge, it will not be able to calculate all the periodic images. Possible solutions are to increase the size of output array by preferably 6 times, but it will effect the performance for large array of particles. Another solution could be dynamic allocation of memory for output and indices.
This function has another parameter cutoff, which will be required while creating the tree. What could be a sensible default value for it?
Instead of augmenting the coordinates, other option could be to cythonize the find_images function which should also improve the performance i.e. directly returning the minimum distance through the function.

jbarnoud · 2018-07-10T06:59:59Z

As you might see in the augment function, the shape of output array is (N, 3). If I supply only a single coordinate which lies close to the face and/or edge, it will not be able to calculate all the periodic images. Possible solutions are to increase the size of output array by preferably 6 times, but it will effect the performance for large array of particles. Another solution could be dynamic allocation of memory for output and indices.

I do not understand the problem. Could you elaborate?

ayushsuhane · 2018-07-10T07:09:46Z

For instance, if the augment function is called as augment(a, box, cutoff) where a = np.array([[1,1,1]], dtype = np.float32), box = np.array([10, 10, 10, 90, 90, 90], dtype=np.float32) and radius > 2. It should ideally create 6 images points, but since the output array is allocated as cdef float[:, :] output = np.zeros((N, 3), dtype=np.float32), it cannot store all the images and the function, in its current form will return only 1 image.

jbarnoud · 2018-07-10T08:27:28Z

But, now, if you provide anything less than 6 atoms you have the same problem. Haven't you?

ayushsuhane · 2018-07-10T08:37:41Z

Yes, Exactly. For larger number of atoms, it can be assumed that few particles will be close to the boundary and therefore, it works, but the current form will fail for few number of particles. Either we can put a condition that output should be minimum 100 or 1000, which wouldn't have an influence on the performance and will also yield correct number of images for evenly distributed data. But if all the edge cases needs to be considered, then we need to approach it from a different strategy.

jbarnoud · 2018-07-10T08:45:07Z

At which point in the code do you know for sure what the size of output should be?

ayushsuhane · 2018-07-10T08:54:36Z

Only at the end. That is why it is allocated a size of (N,3) but the size of returned array is (p,3) where, it is assumed that p<N

jbarnoud · 2018-07-10T09:57:29Z

Can you really assume p < N? If you can have p = 6 for N = 1, then cannot you have p = 6 * N? What happens if all your points are at the border of the box?

richardjgowers · 2018-07-10T12:14:15Z

Yeah this is something I forgot to do, I think I have a solution....

…

On Tue, Jul 10, 2018 at 4:57 AM, Jonathan Barnoud ***@***.***> wrote: Can you really assume p < N? If you can have p = 6 for N = 1, then cannot you have p = 6 * N? What happens if all your points are at the border of the box? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1977 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AI0jB-h7wCMGJuSEd7scr0WrDZwQUncJks5uFHqKgaJpZM4VG4AZ> .

richardjgowers · 2018-07-10T14:21:09Z

@ayushsuhane ok I think we can just use a cpp vector of floats and ints

from libcpp.vector cimport vector

import numpy as np


def thing():
    cdef vector[float] output
    cdef vector[int] indices
    
    # each time we add an augmented coordinate
    for j in range(3):
        output.push_back(coord[j] + shiftX[j] + shiftY[j])
    indices.push_back(i)
    
    # at the end to return results
    n = indices.size()

    return np.asarray(output).reshape(n, 3), np.asarray(indices)

richardjgowers · 2018-07-11T15:45:54Z

package/MDAnalysis/lib/c_distances.pyx

+    given by pointers a and b
+    """
+    cdef ssize_t n
+    cdef float[:] result = numpy.zeros((3,), dtype=numpy.float32)


in general you don't want any numpy calls inside a cdef function, this should all just be pure C. We only want numpy calls at the start of a function (to prepare things) and at the end (to correctly format things for the user)

codecov · 2018-07-12T00:54:42Z

Codecov Report

Merging #1977 into develop will increase coverage by <.01%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop    #1977      +/-   ##
===========================================
+ Coverage    88.48%   88.48%   +<.01%     
===========================================
  Files          142      142              
  Lines        17202    17203       +1     
  Branches      2635     2635              
===========================================
+ Hits         15221    15222       +1     
  Misses        1385     1385              
  Partials       596      596

Impacted Files	Coverage Δ
package/MDAnalysis/lib/distances.py	`87.22% <100%> (+0.04%)`	⬆️
package/MDAnalysis/coordinates/GRO.py	`93.87% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50d8687...7ef48ef. Read the comment docs.

ayushsuhane · 2018-07-12T02:44:34Z

np.unique(array, axis=0) fails during the Travis build. Should I change it, or add upgrade numpy during the installation in Tarvis.

richardjgowers · 2018-07-13T19:16:17Z

package/MDAnalysis/lib/pkdtree.py

+            self._indices = np.array(list(
+                                     itertools.chain.from_iterable(indices)),
+                                     dtype=np.int32)
+        self._indices = np.unique(self._indices)


there's a faster unique in lib._cutil that we can use

richardjgowers · 2018-07-13T19:19:46Z

package/MDAnalysis/lib/pkdtree.py

+            pairs = np.array(list(self.ckdt.query_pairs(radius)),
+                             dtype=np.int32)
+        if pairs.size > 0:
+            pairs = np.unique(np.sort(pairs), axis=0)


use the cutil.unique one here too

richardjgowers · 2018-07-13T19:21:03Z

package/MDAnalysis/lib/c_distances.pyx

@@ -20,7 +20,6 @@
 # J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
 #
 #


can you revert the changes to this file to keep history tidy?

Just to make sure, are you asking to commit a revert or to make a separate branch and push force to the current branch, but I guess you have to rebase it afterwards to not include any of the current history. Is this right?

richardjgowers

The augment changes look like they're finished, so if you can split them into a new PR we can merge those. The KDTree changes will require more work as we need to replace the existing class (seamlessly as it's well used)

richardjgowers · 2018-07-13T19:48:57Z

package/MDAnalysis/lib/pkdtree.py

        return self._indices
+
+
+class Periodic_cKDTree(object):


We need to replace the existing pkdtree rather than just add another option here

richardjgowers · 2018-07-14T22:22:42Z

@ayushsuhane yeah just a PR with only the augment functions and tests, but don't lose the progress you've made on KDTree, it's not too far from being finished either

ayushsuhane · 2018-07-14T22:31:09Z

@richardjgowers Do we need anything else here?

I will open another PR for replacing Bio.KDTree .

richardjgowers · 2018-07-14T22:40:22Z

package/MDAnalysis/lib/_augment.pyx

+    cdef int N
+    N = results.shape[0]
+
+    for i in range(N):


i isn't cdef'd here

richardjgowers · 2018-07-14T22:42:34Z

package/MDAnalysis/lib/_augment.pyx

+      original indices of the augmented coordinates
+    """
+    cdef bint lo_x, hi_x, lo_y, hi_y, lo_z, hi_z
+    cdef int i, j, p, N


p is unused

richardjgowers · 2018-07-14T22:44:43Z

package/MDAnalysis/lib/_augment.pyx

+
+import cython
+import numpy
+cimport numpy


We're never using the cimport of numpy here. We'd be using it if we did something like cdef numpy.ndarray[float] arr1, the numpy calls on the right hand side use the import numpy import

richardjgowers · 2018-07-14T22:45:23Z

package/MDAnalysis/lib/_augment.pyx

+#
+
+import cython
+import numpy


as np, then change the calls to np.thing

richardjgowers · 2018-07-14T22:47:19Z

package/MDAnalysis/lib/_augment.pyx

+    ----------
+    coordinates : array
+      Input coordinate array to generate duplicate images
+    dm : array


most other functions take box in the [lx, ly, lz,, 90 90 90] form, so use that form here. Then do the conversion to triclinic vectors inside the function. You can keep the dm variable, just have it created inside the function

richardjgowers · 2018-07-14T22:52:14Z

package/MDAnalysis/lib/_augment.pyx

+
+    Returns
+    -------
+    results : ndarray of ints


indices which have been translated to refer to the original indices

richardjgowers · 2018-07-14T22:52:36Z

package/MDAnalysis/lib/_augment.pyx

+    output : array
+      coordinates of duplicates generated due to periodic boundary conditions
+    indices : array
+      original indices of the augmented coordinates


.. seealso:: undo_augment

richardjgowers · 2018-07-14T22:52:52Z

package/MDAnalysis/lib/_augment.pyx

+
+@cython.boundscheck(False)
+@cython.wraparound(False)
+def augment(float[:, ::1] coordinates, float[:, ::1] dm, float r):


rename this to augment_coordinates so it's clear it works on coordinates

richardjgowers · 2018-07-14T22:54:04Z

package/MDAnalysis/lib/_augment.pyx

+@cython.wraparound(False)
+def augment(float[:, ::1] coordinates, float[:, ::1] dm, float r):
+    """Calculate augmented coordinate set
+


Add a description of what this function does. Something like "calculates which particles are within r of a box boundary and creates duplicate images of these on the opposite side of the box"

richardjgowers · 2018-07-14T22:54:20Z

package/MDAnalysis/lib/_augment.pyx

+    Parameters
+    ----------
+    coordinates : array
+      Input coordinate array to generate duplicate images


These must all be within the primary unit cell for the algorithm to work

richardjgowers · 2018-07-16T15:33:34Z

testsuite/MDAnalysisTests/lib/test_augment.py

+           [1.1, -0.1, 1.1]),
+          ([1.1, 0.5, 0.5], [0.5, -0.1, 0.5]))
+
+radius = 1.5


this should be inside the test function

richardjgowers · 2018-07-16T15:33:44Z

testsuite/MDAnalysisTests/lib/test_augment.py

+
+@pytest.mark.parametrize('b, qres', product(boxes, zip(queries, images)))
+def test_augment(b, qres):
+    b = np.array(b, dtype=np.float32)


just store b as a numpy array?

richardjgowers · 2018-07-16T15:35:46Z

testsuite/MDAnalysisTests/lib/test_augment.py

+
+# Find images for several query points,
+# here in fractional coordinates using augment
+queries = ([0.1, 0.5, 0.5],  # box face


can you reformat this so the queries and images are side by side, it's hard to read this way..

qres = [ ([[0.1, 0.5, 0.5]], [[1.1, 0.5 0.5]]), # box face etc ]

richardjgowers · 2018-07-16T15:36:16Z

testsuite/MDAnalysisTests/lib/test_augment.py

+radius = 1.5
+
+
+@pytest.mark.parametrize('b, qres', product(boxes, zip(queries, images)))


don't use product, just have two mark.parametrize, one for b one for qres

richardjgowers · 2018-07-16T15:36:56Z

package/setup.py

+    aug = MDAExtension('MDAnalysis.lib._augment',
+                         sources=['MDAnalysis/lib/_augment' + source_suffix],
+                         language='c++',
+                         libraries=mathlib,


can you check if we need mathlib? I thought we needed it for sqrt

richardjgowers · 2018-07-16T15:37:42Z

package/MDAnalysis/lib/_augment.pyx

+    cdef float coord[3]
+    cdef float end[3]
+    cdef float other[3]
+    cdef float[:, ::1] dm = np.zeros((3, 3), dtype=np.float32)


dm and reciprocal don't need to be numpy arrays, can we just use C floats?

ayushsuhane · 2018-07-17T05:43:37Z

Should the return datatype for indices be changed to int64 specifically? Or the current way of typecasting before calling unique_int_1d is workable.

richardjgowers · 2018-07-17T21:12:09Z

@ayushsuhane are you talking about how unique_int takes int64s but I've used int32s here? Might be good to unify that tbh

ayushsuhane · 2018-07-18T04:54:16Z

@richardjgowers Should it handle both or everything should be converted to int64?

ayushsuhane · 2018-07-18T10:01:03Z

Meanwhile, I just changed it to np.int64 , if required to handle both of the int types I came across ctypedef fused which can be used.

@richardjgowers @jbarnoud can you review it. Once it is merged then we can focus on the KDTree, since it also require augment function.

richardjgowers

Code is looking good.

WRT docs, can you import augment + undo into lib.distances, then also add in the .. autofunction:: lib._augment.augment_coordinates to the distances docs. Ie they should look like they're from that module even though they are written somewhere else

richardjgowers · 2018-07-18T12:46:18Z

package/doc/sphinx/source/documentation_pages/lib/augment.rst

@@ -0,0 +1,2 @@
+.. automodule:: MDAnalysis.lib.augment


the doc build failed because you refer to lib.augment here but you named the module lib._augment elsewhere

richardjgowers · 2018-07-18T14:18:15Z

@ayushsuhane WRT dot and cross. If you could put them into _cutil and make a header file (pxd) that would also be good. If you could use fused types for them (float/double) that would be even better. If it ends up being difficult we can do it in a later PR, I'd like to get this merged

…for _augment

ayushsuhane · 2018-07-19T09:22:20Z

@richardjgowers were you referring to similar structure for the docs or did I get you wrong?

richardjgowers · 2018-07-19T16:03:57Z

@ayushsuhane yep that was right, thanks!

richardjgowers reviewed Jul 8, 2018

View reviewed changes

package/MDAnalysis/lib/c_distances.pyx Outdated

cdef float sum1

cdef ssize_t dim

dim=3

Copy link
Copy Markdown

Member

richardjgowers Jul 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcode dim to 3

richardjgowers reviewed Jul 8, 2018

View reviewed changes

richardjgowers self-assigned this Jul 11, 2018

richardjgowers reviewed Jul 11, 2018

View reviewed changes

richardjgowers reviewed Jul 13, 2018

View reviewed changes

richardjgowers requested changes Jul 13, 2018

View reviewed changes

added Augment functionality, tests, modified CHANGELOG

900b09c

ayushsuhane force-pushed the augment branch from f4d50ab to 900b09c Compare July 14, 2018 22:08

Added a testcase for multiple queries

8c8edc7

richardjgowers requested changes Jul 14, 2018

View reviewed changes

ayushsuhane added 2 commits July 14, 2018 18:22

Modified documentation, changed the name of function

0eac39b

added six.moves import

d7dd262

richardjgowers requested changes Jul 16, 2018

View reviewed changes

cdef'd float instead of numpy array, changed the format of tests

776c83a

Modified input type to undo_augment, added documentation stub

feebb7d

richardjgowers requested changes Jul 18, 2018

View reviewed changes

moved _dot, _cross, _norm to _cutil.pyx, modified document structure …

56c0b03

…for _augment

removed augment.rst

7ef48ef

richardjgowers approved these changes Jul 19, 2018

View reviewed changes

richardjgowers merged commit 238904d into MDAnalysis:develop Jul 19, 2018

ayushsuhane deleted the augment branch July 20, 2018 06:21

zemanj mentioned this pull request Sep 3, 2018

Improved docstrings in lib/_augment.pyx #2062

Merged

4 tasks

		radius = 1.5


		@pytest.mark.parametrize('b, qres', product(boxes, zip(queries, images)))

Conversation

ayushsuhane commented Jul 8, 2018

PR Checklist

Uh oh!

ayushsuhane commented Jul 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jul 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayushsuhane commented Jul 10, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbarnoud commented Jul 10, 2018

Uh oh!

ayushsuhane commented Jul 10, 2018

Uh oh!

jbarnoud commented Jul 10, 2018

Uh oh!

ayushsuhane commented Jul 10, 2018

Uh oh!

jbarnoud commented Jul 10, 2018

Uh oh!

ayushsuhane commented Jul 10, 2018

Uh oh!

jbarnoud commented Jul 10, 2018

Uh oh!

richardjgowers commented Jul 10, 2018 via email

Uh oh!

richardjgowers commented Jul 10, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jul 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ayushsuhane commented Jul 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayushsuhane Jul 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers commented Jul 14, 2018

Uh oh!

ayushsuhane commented Jul 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayushsuhane commented Jul 8, 2018 •

edited

Loading

coveralls commented Jul 8, 2018 •

edited

Loading

ayushsuhane commented Jul 10, 2018 •

edited

Loading

codecov bot commented Jul 12, 2018 •

edited

Loading

ayushsuhane commented Jul 12, 2018 •

edited

Loading

ayushsuhane Jul 14, 2018 •

edited

Loading

ayushsuhane commented Jul 14, 2018 •

edited

Loading

ayushsuhane commented Jul 18, 2018 •

edited

Loading