Skip to content

Augment Coordinates for periodic Boundary conditions#1977

Merged
richardjgowers merged 8 commits intoMDAnalysis:developfrom
ayushsuhane:augment
Jul 19, 2018
Merged

Augment Coordinates for periodic Boundary conditions#1977
richardjgowers merged 8 commits intoMDAnalysis:developfrom
ayushsuhane:augment

Conversation

@ayushsuhane
Copy link
Copy Markdown
Contributor

Changes made in this Pull Request:

  • Cython implementation of adding duplicate particles for periodic boundary conditions
  • class Periodic_cKDTree similar to PeriodicKDTree but a wrapper around more fast scipy.spatial.cKDTree.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

ayushsuhane commented Jul 8, 2018

Since replacement of Bio.cKDTree with scipy.spatial.cKDTree is beneficial, is this the correct way forward?

The class Periodic_cKDTree deals with both periodic and non-periodic boundary conditons. I have also defined a search_query function which can directly be used in guess_bonds in MDAnalysis.topology.guessers.py. With capped_distance and Periodic_cKDTree, the selection.py can be rewritten in a way such that it would be easy to include the grid search algorithm in it as well, where the actual algorithm will be hidden(but optional) from the user.

Input coordinate array to generate duplicate images
box : array
Dimensions of the box of shape (3, 3)
reciprocal : array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the reciprocal of box? if so, we can just calculate it in the function

cdef float other[3]

cdef int dim
dim = coordinates.shape[1]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can hardcode dim to 3, it might sometimes help the compiler

indices[p] = i
p += 1


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can fix up the formatting it would be easier to read, there's 3 blank lines here

cdef float sum1
cdef ssize_t dim

dim=3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hardcode dim to 3

@coveralls
Copy link
Copy Markdown

coveralls commented Jul 8, 2018

Coverage Status

Coverage decreased (-0.4%) to 89.584% when pulling 408e12c on ayushsuhane:augment into 1a1b2f1 on MDAnalysis:develop.

def undo_augment(int[:] results, int[:] translation, int nreal):
"""Translate augmented indices back to originals

Note: modifies results in place!
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do the modification in place any more, is there a reason you do the copy now instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I thought for complex selections it might be useful to keep all the particles (original + images). But it looks like its better to modify the results, otherwise it will lead to multiple computations of similar points with increase in number of selections(operations).
Meanwhile, I will also try to think of a case, where it might fail.

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

ayushsuhane commented Jul 10, 2018

I was working on the tests, but got stuck in few problems and would like to share them here:

  1. As you might see in the augment function, the shape of output array is (N, 3). If I supply only a single coordinate which lies close to the face and/or edge, it will not be able to calculate all the periodic images. Possible solutions are to increase the size of output array by preferably 6 times, but it will effect the performance for large array of particles. Another solution could be dynamic allocation of memory for output and indices.

  2. This function has another parameter cutoff, which will be required while creating the tree. What could be a sensible default value for it?

  3. Instead of augmenting the coordinates, other option could be to cythonize the find_images function which should also improve the performance i.e. directly returning the minimum distance through the function.

@jbarnoud
Copy link
Copy Markdown
Contributor

As you might see in the augment function, the shape of output array is (N, 3). If I supply only a single coordinate which lies close to the face and/or edge, it will not be able to calculate all the periodic images. Possible solutions are to increase the size of output array by preferably 6 times, but it will effect the performance for large array of particles. Another solution could be dynamic allocation of memory for output and indices.

I do not understand the problem. Could you elaborate?

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

For instance, if the augment function is called as augment(a, box, cutoff) where a = np.array([[1,1,1]], dtype = np.float32), box = np.array([10, 10, 10, 90, 90, 90], dtype=np.float32) and radius > 2. It should ideally create 6 images points, but since the output array is allocated as cdef float[:, :] output = np.zeros((N, 3), dtype=np.float32), it cannot store all the images and the function, in its current form will return only 1 image.

@jbarnoud
Copy link
Copy Markdown
Contributor

But, now, if you provide anything less than 6 atoms you have the same problem. Haven't you?

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

Yes, Exactly. For larger number of atoms, it can be assumed that few particles will be close to the boundary and therefore, it works, but the current form will fail for few number of particles. Either we can put a condition that output should be minimum 100 or 1000, which wouldn't have an influence on the performance and will also yield correct number of images for evenly distributed data. But if all the edge cases needs to be considered, then we need to approach it from a different strategy.

@jbarnoud
Copy link
Copy Markdown
Contributor

At which point in the code do you know for sure what the size of output should be?

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

Only at the end. That is why it is allocated a size of (N,3) but the size of returned array is (p,3) where, it is assumed that p<N

@jbarnoud
Copy link
Copy Markdown
Contributor

Can you really assume p < N? If you can have p = 6 for N = 1, then cannot you have p = 6 * N? What happens if all your points are at the border of the box?

@richardjgowers
Copy link
Copy Markdown
Member

richardjgowers commented Jul 10, 2018 via email

@richardjgowers
Copy link
Copy Markdown
Member

@ayushsuhane ok I think we can just use a cpp vector of floats and ints

from libcpp.vector cimport vector

import numpy as np


def thing():
    cdef vector[float] output
    cdef vector[int] indices
    
    # each time we add an augmented coordinate
    for j in range(3):
        output.push_back(coord[j] + shiftX[j] + shiftY[j])
    indices.push_back(i)
    
    # at the end to return results
    n = indices.size()

    return np.asarray(output).reshape(n, 3), np.asarray(indices)

@richardjgowers richardjgowers self-assigned this Jul 11, 2018
given by pointers a and b
"""
cdef ssize_t n
cdef float[:] result = numpy.zeros((3,), dtype=numpy.float32)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general you don't want any numpy calls inside a cdef function, this should all just be pure C. We only want numpy calls at the start of a function (to prepare things) and at the end (to correctly format things for the user)

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 12, 2018

Codecov Report

Merging #1977 into develop will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1977      +/-   ##
===========================================
+ Coverage    88.48%   88.48%   +<.01%     
===========================================
  Files          142      142              
  Lines        17202    17203       +1     
  Branches      2635     2635              
===========================================
+ Hits         15221    15222       +1     
  Misses        1385     1385              
  Partials       596      596
Impacted Files Coverage Δ
package/MDAnalysis/lib/distances.py 87.22% <100%> (+0.04%) ⬆️
package/MDAnalysis/coordinates/GRO.py 93.87% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 50d8687...7ef48ef. Read the comment docs.

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

ayushsuhane commented Jul 12, 2018

np.unique(array, axis=0) fails during the Travis build. Should I change it, or add upgrade numpy during the installation in Tarvis.

self._indices = np.array(list(
itertools.chain.from_iterable(indices)),
dtype=np.int32)
self._indices = np.unique(self._indices)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a faster unique in lib._cutil that we can use

pairs = np.array(list(self.ckdt.query_pairs(radius)),
dtype=np.int32)
if pairs.size > 0:
pairs = np.unique(np.sort(pairs), axis=0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the cutil.unique one here too

@@ -20,7 +20,6 @@
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
#
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you revert the changes to this file to keep history tidy?

Copy link
Copy Markdown
Contributor Author

@ayushsuhane ayushsuhane Jul 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, are you asking to commit a revert or to make a separate branch and push force to the current branch, but I guess you have to rebase it afterwards to not include any of the current history. Is this right?

Copy link
Copy Markdown
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The augment changes look like they're finished, so if you can split them into a new PR we can merge those. The KDTree changes will require more work as we need to replace the existing class (seamlessly as it's well used)

return self._indices


class Periodic_cKDTree(object):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to replace the existing pkdtree rather than just add another option here

@richardjgowers
Copy link
Copy Markdown
Member

@ayushsuhane yeah just a PR with only the augment functions and tests, but don't lose the progress you've made on KDTree, it's not too far from being finished either

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

ayushsuhane commented Jul 14, 2018

@richardjgowers Do we need anything else here?

I will open another PR for replacing Bio.KDTree .

cdef int N
N = results.shape[0]

for i in range(N):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i isn't cdef'd here

original indices of the augmented coordinates
"""
cdef bint lo_x, hi_x, lo_y, hi_y, lo_z, hi_z
cdef int i, j, p, N
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p is unused


import cython
import numpy
cimport numpy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're never using the cimport of numpy here. We'd be using it if we did something like cdef numpy.ndarray[float] arr1, the numpy calls on the right hand side use the import numpy import

#

import cython
import numpy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as np, then change the calls to np.thing

----------
coordinates : array
Input coordinate array to generate duplicate images
dm : array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most other functions take box in the [lx, ly, lz,, 90 90 90] form, so use that form here. Then do the conversion to triclinic vectors inside the function. You can keep the dm variable, just have it created inside the function


Returns
-------
results : ndarray of ints
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indices which have been translated to refer to the original indices

output : array
coordinates of duplicates generated due to periodic boundary conditions
indices : array
original indices of the augmented coordinates
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. seealso:: undo_augment


@cython.boundscheck(False)
@cython.wraparound(False)
def augment(float[:, ::1] coordinates, float[:, ::1] dm, float r):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this to augment_coordinates so it's clear it works on coordinates

@cython.wraparound(False)
def augment(float[:, ::1] coordinates, float[:, ::1] dm, float r):
"""Calculate augmented coordinate set

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a description of what this function does. Something like "calculates which particles are within r of a box boundary and creates duplicate images of these on the opposite side of the box"

Parameters
----------
coordinates : array
Input coordinate array to generate duplicate images
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These must all be within the primary unit cell for the algorithm to work

[1.1, -0.1, 1.1]),
([1.1, 0.5, 0.5], [0.5, -0.1, 0.5]))

radius = 1.5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be inside the test function


@pytest.mark.parametrize('b, qres', product(boxes, zip(queries, images)))
def test_augment(b, qres):
b = np.array(b, dtype=np.float32)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just store b as a numpy array?


# Find images for several query points,
# here in fractional coordinates using augment
queries = ([0.1, 0.5, 0.5], # box face
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you reformat this so the queries and images are side by side, it's hard to read this way..

qres = [
    ([[0.1, 0.5, 0.5]], [[1.1, 0.5 0.5]]),  # box face
    etc
]

radius = 1.5


@pytest.mark.parametrize('b, qres', product(boxes, zip(queries, images)))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use product, just have two mark.parametrize, one for b one for qres

package/setup.py Outdated
aug = MDAExtension('MDAnalysis.lib._augment',
sources=['MDAnalysis/lib/_augment' + source_suffix],
language='c++',
libraries=mathlib,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check if we need mathlib? I thought we needed it for sqrt

cdef float coord[3]
cdef float end[3]
cdef float other[3]
cdef float[:, ::1] dm = np.zeros((3, 3), dtype=np.float32)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dm and reciprocal don't need to be numpy arrays, can we just use C floats?

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

Should the return datatype for indices be changed to int64 specifically? Or the current way of typecasting before calling unique_int_1d is workable.

@richardjgowers
Copy link
Copy Markdown
Member

@ayushsuhane are you talking about how unique_int takes int64s but I've used int32s here? Might be good to unify that tbh

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

ayushsuhane commented Jul 18, 2018

@richardjgowers Should it handle both or everything should be converted to int64?

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

Meanwhile, I just changed it to np.int64 , if required to handle both of the int types I came across ctypedef fused which can be used.

@richardjgowers @jbarnoud can you review it. Once it is merged then we can focus on the KDTree, since it also require augment function.

Copy link
Copy Markdown
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is looking good.

WRT docs, can you import augment + undo into lib.distances, then also add in the .. autofunction:: lib._augment.augment_coordinates to the distances docs. Ie they should look like they're from that module even though they are written somewhere else

@@ -0,0 +1,2 @@
.. automodule:: MDAnalysis.lib.augment
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc build failed because you refer to lib.augment here but you named the module lib._augment elsewhere

@richardjgowers
Copy link
Copy Markdown
Member

@ayushsuhane WRT dot and cross. If you could put them into _cutil and make a header file (pxd) that would also be good. If you could use fused types for them (float/double) that would be even better. If it ends up being difficult we can do it in a later PR, I'd like to get this merged

@ayushsuhane
Copy link
Copy Markdown
Contributor Author

@richardjgowers were you referring to similar structure for the docs or did I get you wrong?

@richardjgowers richardjgowers merged commit 238904d into MDAnalysis:develop Jul 19, 2018
@richardjgowers
Copy link
Copy Markdown
Member

@ayushsuhane yep that was right, thanks!

@ayushsuhane ayushsuhane deleted the augment branch July 20, 2018 06:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants