Added capped function for grid search by ayushsuhane · Pull Request #2008 · MDAnalysis/mdanalysis

ayushsuhane · 2018-07-27T03:37:15Z

Following #1996 , added capped function on top of the grid search to find all the pairs from reference and configuration state and their respective distances. Also, I have modified the __init__ of class FastNS. For now, it takes box, and atom positions as input parameters.

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

Note

The function cannot handle non-periodic cases in its current state.

codecov · 2018-07-28T19:17:14Z

Codecov Report

Merging #2008 into develop will increase coverage by <.01%.
The diff coverage is 89.65%.

@@             Coverage Diff             @@
##           develop    #2008      +/-   ##
===========================================
+ Coverage    88.59%   88.59%   +<.01%     
===========================================
  Files          143      143              
  Lines        17305    17363      +58     
  Branches      2649     2658       +9     
===========================================
+ Hits         15331    15383      +52     
- Misses        1376     1379       +3     
- Partials       598      601       +3

Impacted Files	Coverage Δ
package/MDAnalysis/lib/__init__.py	`100% <100%> (ø)`	⬆️
package/MDAnalysis/lib/distances.py	`87.88% <89.47%> (+0.24%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e90dfd2...b7751d1. Read the comment docs.

richardjgowers · 2018-07-29T15:47:30Z

package/MDAnalysis/lib/distances.py

+    if min_cutoff is not None:
+        idx = pair_distance > min_cutoff
+        pairs, pair_distance = pairs[idx], pair_distance[idx]
+    if pairs.size > 0:


@ayushsuhane I don't really understand what this step is doing?

The problem with concatenate and search id is that it wouldn't mask the pairs within search ids. The pairs which are returned from capped function are such that in a pair (i, j) i is from the reference positions and j is from the configuration positions. But here, it will return all the indices i.e (i, j) might be such that i and j both are indices of the search group.

Ok so it sounds like this is much better for self_capped_distance than capped_distance right now. Ideally we'd create 2 nsgrids and tell them to search each other?

richardjgowers · 2018-07-29T15:48:58Z

package/MDAnalysis/lib/distances.py

+    search_ids = np.arange(len(configuration), len(all_coords))
+    results = gridsearch.search(search_ids=search_ids)
+    pairs = results.get_pairs()
+    pairs[:, 1] = undo_augment(pairs[:, 1], mapping, len(configuration))


Why are we doing an undo_augment if we never augmented? Is this because we had to concatenate the coordinates together?

Yeah, I understand it could become confusing. But I just used it since we already had that function which served the purpose. But yes, since we are concatenating and to get the same results from other capped_functions, all the indices should be translated back to their corresponding group indices.

richardjgowers · 2018-07-29T15:54:57Z

package/MDAnalysis/lib/nsgrid.pyx

+
+        self.grid = NSGrid(self.coords_bbox.shape[0], cutoff, self.box, max_gridsize, debug=debug)
+        self.prepared = False
+        if prepare:


I don't get why prepare would be optional. Can you just make it always prepare on init and remove the prepare method?

richardjgowers · 2018-07-29T15:56:49Z

package/MDAnalysis/lib/nsgrid.pyx

+        cdef real[:] coord_i, coord_j
+        from collections import defaultdict
+
+        indices_buffer = defaultdict(list)


@ayushsuhane can you use C++ vecs here, and does it benchmark faster if you do?

richardjgowers · 2018-08-05T20:05:40Z

package/MDAnalysis/lib/nsgrid.pyx

+        self.fast_update(box)
+
+
+    cdef void fast_pbc_dx(self, rvec ref, rvec other, rvec dx) nogil:


Ok yeah I was wrong here

kain88-de · 2018-07-30T07:44:28Z

package/MDAnalysis/lib/nsgrid.pyx

+        for i in range (DIM-1, -1, -1):
+            while dx[i] > self.c_pbcbox.hbox_diag[i]:
+                for j in range (i, -1, -1):
+                    dx[j] -= self.c_pbcbox.box[i][j]


@ayushsuhane Please have a close look at this loop. It does subtract the correct vector to land at point C in your example.

Yes, I understand this loop. But I was wondering that even though the particles are in a brick shaped box, can we apply minimum_image_ortho here? While the above loop will land at C, I think if we apply minimum_image from calc_distances.h, it will return distance corresponding to AD rather than AC.

Even if the particles are put inside a brick-shaped box, the box itself is not brick-shaped so the loop is needed to use the correct box vectors and we can't just rely on minimum_image_ortho.

richardjgowers · 2018-07-31T23:11:11Z

testsuite/MDAnalysisTests/lib/test_distances.py


 boxes_1 = (np.array([1, 2, 3, 90, 90, 90], dtype=np.float32),  # ortho
-           np.array([1, 2, 3, 30, 45, 60], dtype=np.float32),  # tri_box
+           np.array([1, 2, 3, 45, 60, 90], dtype=np.float32),  # tri_box


do the tests fail with the 30 angle?

No, It was some misunderstanding. I will change it again.

ayushsuhane · 2018-07-31T23:59:22Z

I tried generating maps of cell_neighbours as well, but it looks like it is a bad idea. Probably it is due to generating dictionary which takes relatively longer time than vectors. Here is a snippet of the code

ctypedef cset[ns_int] intset
ctypedef cmap[ns_int, intset] intmap
cdef intmap generate_neighbors(self):
        cdef ns_int x1, yi, zi
        cdef ns_int cell
        cdef ivec cellxyz, neighbor_cellxyz
        cdef ns_int neighbor_cellid
        
        self.cell_neighbors.empty()
        
        for cell in range(self.size):
            self.cellid2cellxyz(cell, cellxyz)
            for xi in range(DIM):
                for yi in range(DIM):
                    for zi in range(DIM):
                        neighbor_cellxyz[XX] = cellxyz[XX] + (xi - 1)
                        neighbor_cellxyz[YY] = cellxyz[YY] + (yi - 1)
                        neighbor_cellxyz[ZZ] = cellxyz[ZZ] + (zi - 1)
                        # Bring the cells back to the brick shaped box
                        for n in range(DIM-1, -1, -1):
                            while neighbor_cellxyz[n] > self.ncells[n]:
                                neighbor_cellxyz[n] -= self.ncells[n]
                            while neighbor_cellxyz[n] < 0:
                                neighbor_cellxyz[n] += self.ncells[n]
                        neighbor_cellid = neighbor_cellxyz[XX] + neighbor_cellxyz[YY]*self.cell_offsets[YY] + \
                                           neighbor_cellxyz[ZZ]*self.cell_offsets[ZZ]
                        self.cell_neighbors[cell].insert(neighbor_cellid)
        return self.cell_neighbors

I think using some alternative hashmaps for indices can serve the purpose if we want to go that way. In any case, iterating over indices is a better solution in my opinion (as it will be faster for less number of atoms as well, while iterating over cells will lead to too many if evaluations).

@richardjgowers Is this how you were envisioning to create two grids and check the pairs? As far as benchmarks of vectors are concerned, using vectors of vectors in place of dictionary reduces the time by half, whereas replacing vectors with list does not lead to any significant change in performance. However, it increases the readability of the code and we don't need to address the memory allocations.

@seb-buch @kain88-de @jbarnoud @richardjgowers @orbeckst , Please review the code. I have tried to remove most of the part which wasn't desired (specifically the get_coordinates). In my opinion, coordinates can be selected directly via python after we evaluate the indices and distances. For a single query, coordinates_buffer was consuming a significant amount of time. I plan to add noPBC option first and then add the documentation. Ofcourse if @seb-buch is OK with it. But in any case, merging before the GSOC would be helpful.

For self_search, it returns all the pairs and indices. To obtain the unique pairs, np.unique should be used. If this is how the method is desired, I will add this in the documentation.

While the current version is faster than the implemented methods in MDAnalysis for most of the cases, there are still opportunities for improvements. For instance, using probe[XX] = searchcoords[current_beadid, XX] + (xi - 1) * self.cutoff[XX] in search to evaluate the cell_id, and not evaluating the distances if cell_id is already checked. Currently, it evaluates all the adjacent cells. One way could be to use a stencil[27], and using every index as a flag for every probe. However, I am not sure how to deal with the periodic case in an elegant way (by not increasing the number of evaluations for every particle). This will be particularly useful when the cutoff is very small (but greater than the cellsize).

richardjgowers · 2018-08-01T01:06:30Z

package/MDAnalysis/lib/nsgrid.pyx

+                                    continue
+                                #find distance between search coords[i] and coords[bid]
+                                d2 = self.box.fast_distance2(&self.coords_bbox[current_beadid, XX], &self.coords_bbox[bid, XX])
+                                if d2 < cutoff2:


looks like npairs has the wrong indentation here, you don't want to increment if d2 < EPSILON. I'd do if d2 cutoff and d2 > EPSILON:

package/MDAnalysis/lib/nsgrid.pyx

+
+            self.indices_buffer[beadid_i].push_back(beadid_j)
+
+            dist = sqrt(dist2)


richardjgowers · 2018-08-01T01:13:06Z

package/MDAnalysis/lib/nsgrid.pyx

+            self.distances_buffer[beadid_i].push_back(dist)
+
+        for i in range(self.searchcoords.shape[0]):
+            sorted_indices = np.argsort(self.indices_buffer[i])


I'm not sure we have to sort tbh, I'd just give back indices in the order we found them. You can sort inside the tests to check results

richardjgowers

Need some simple docs on each method

richardjgowers · 2018-08-02T20:37:54Z

package/MDAnalysis/lib/nsgrid.pyx

+        self.pair_distances2_buffer.push_back(distance2)
+        self.npairs += 1
+
+    def get_pairs(self):


docs docs docs. It took me a while to understand what all the different get_X mean in this class

Yes, I get your point. I was thinking that first we should fix the API and then I would add the docs. However, even for the reviewers to understand, docs are critical. I will add them today.

I started to write some docs on my branch... However I am officially on vacation so I do not spend much time of them.
Anyway, I will take some time tomorrow to merge the changes from @ayushsuhane and the very little I did on the docs.

richardjgowers · 2018-08-02T20:38:18Z

package/setup.py

+                             ['MDAnalysis/lib/nsgrid' + source_suffix],
+                             include_dirs=include_dirs,
+                             language='c++',
+                             libraries=parallel_libraries,


remove parallel libs here

richardjgowers · 2018-08-02T20:41:02Z

@ayushsuhane Looks like just a precision error on the test, use assert_almost_equal

ayushsuhane · 2018-08-02T20:48:22Z

I am not sure about both the errors.

(1) I didnt' change anything in the pkdtree, but still it raises an error while it didn't raise an error earlier.
(2) I checked the contiguous flags separately, which passes in tests, but raises an error with numpy. Is it because of a different numpy version? How can I get past it?

ayushsuhane · 2018-08-02T20:54:08Z

To handle the no pbc case, I have used python methods blatantly inside if box is not None. A/c to you, will it have a significant effect? If yes, how should I change it. Also, are you happy with the way it handles pbc?

richardjgowers · 2018-08-02T21:11:30Z

@ayushsuhane can you link me a line? I can't find it. In general, it's ok to have some python to check things at the start, you just can't have any inside loops

ayushsuhane · 2018-08-02T21:55:23Z

package/MDAnalysis/lib/nsgrid.pyx

+        if (box is None) or (np.allclose(box[:3], 0.) and box.shape[0] == 6):
+            bmax = np.max(coords, axis=0)
+            bmin = np.min(coords, axis=0)
+            for i in range(DIM):


Mostly looping over all the coordinates to shift the origin since box origin is always (0,0,0) which is used to get the cellids later on. Similarly, np.min and np.max are also dark yellow if you look into annotate.

richardjgowers · 2018-08-03T14:38:26Z

package/MDAnalysis/lib/nsgrid.pyx

+                pseudobox[DIM + i] = 90.
+            box = pseudobox
+            # shift the origin
+            coords -= bmin


Here you're modifying coords in place, which changes the copy that the user holds (which might lead to tears). Better would be to move the self.coords = coords.copy() call below, and modify self.coords which is the class' copy.

ayushsuhane · 2018-08-05T08:12:44Z

@richardjgowers @jbarnoud Any suggestions on how can I remove the error. It says the numpy array is not C-contiguous. I have already tried using np.ascontiguous (now I have changed it to list type), it also doesn't fail the test but fails in travis.

ayushsuhane · 2018-08-05T08:24:59Z

@richardjgowers Can you review it please. If this API is OK (and once self_capped is merged), I can quickly add self capped function along with modified _determine_method function using benchmarks.

ayushsuhane · 2018-08-05T08:28:22Z

package/MDAnalysis/lib/nsgrid.pyx

+                                #find distance between search coords[i] and coords[bid]
+                                d2 = self.box.fast_distance2(&searchcoords_bbox[current_beadid, XX], &self.coords_bbox[bid, XX])
+                                if d2 < cutoff2 and d2 > EPSILON:
+                                    results.add_neighbors(current_beadid, bid, d2)


Most of the capped funcctions have a return signature such that same atom in different group (query and reference coordinates) will be obtained as a neighbour. In that case, should I remove EPSILON here. It will be included only in self_search function.

Yeah remove that epsilon hack so this works as other distance methods

…be added again for conditional compilation

…s as input parameters, modified tests

…nction to take coordinates as input, modified tests accordingly

Removed dict from buffers from speedup, seperated search and self_search, construct a grid and search with another grid

…fied compilation libs in setup

…d and modified tests

richardjgowers · 2018-08-06T16:09:45Z

@ayushsuhane CHANGELOG then we're done I think :)

ayushsuhane · 2018-08-07T00:34:04Z

I was doing benchmarks for the _determine_method, but I figured we can add them in a different PR, as this is already a big one. If you differ in opinion, I can add them here as well alogn with the modified tests. Here, is the link for benchmarks.

https://github.com/ayushsuhane/Benchmarks_Distance/blob/master/Notebooks/BM_CappedNS.ipynb

orbeckst · 2018-08-07T00:53:15Z

@ayushsuhane thanks for the benchmarks. Looks very encouraging!

Have you also thought about adding some benchmarks to https://github.com/MDAnalysis/mdanalysis/tree/develop/benchmarks ? It would be interesting to see how code that relies on distance calculations improves with your additions. (I am not saying that this should be in this PR, but overall it would be nice if one could point to https://www.mdanalysis.org/benchmarks/ for immediate evidence how your work improved the performance of MDAnalysis.)

orbeckst

Please update CHANGELOG. Thanks!

orbeckst · 2018-08-07T00:56:07Z

package/CHANGELOG


 Enhancements

+  * Added a wrapper of lib.nsgrid in lib.distances.self_capped_distance


@seb-buch also needs to be added as an author as he's authored some of the commits

richardjgowers · 2018-08-07T01:13:10Z

@ayushsuhane don't worry about the encore failures on appveyor

CharlyEmpereurmot · 2020-05-12T02:58:53Z

package/MDAnalysis/lib/nsgrid.pyx

+        if cutoff < 0:
+            raise ValueError("Cutoff must be positive!")
+        if cutoff * cutoff > self.box.c_pbcbox.max_cutoff2:
+            raise ValueError("Cutoff greater than maximum cutoff ({:.3f}) given the PBC")


Here the maximum cutoff is not displayed when raising the error

Hi Charly. Thank you for reporting the problem.

A comment on a merged pull request is likely to be forgotten, though. Would you mind opening an issue?

richardjgowers reviewed Jul 29, 2018

View reviewed changes

kain88-de reviewed Jul 30, 2018

View reviewed changes

richardjgowers reviewed Jul 31, 2018

View reviewed changes

richardjgowers reviewed Aug 1, 2018

View reviewed changes

package/MDAnalysis/lib/nsgrid.pyx Outdated

self.indices_buffer[beadid_i].push_back(beadid_j)

dist = sqrt(dist2)

This comment was marked as resolved.

Sign in to view

richardjgowers reviewed Aug 1, 2018

View reviewed changes

richardjgowers requested changes Aug 2, 2018

View reviewed changes

ayushsuhane commented Aug 2, 2018

View reviewed changes

richardjgowers reviewed Aug 3, 2018

View reviewed changes

ayushsuhane force-pushed the cappedgrid branch from bca43f0 to 5541c81 Compare August 4, 2018 09:53

ayushsuhane mentioned this pull request Aug 4, 2018

Using Self Capped distance function in guess_bonds #2006

Merged

4 tasks

ayushsuhane force-pushed the cappedgrid branch from 5541c81 to 8b93c39 Compare August 5, 2018 01:56

ayushsuhane commented Aug 5, 2018

View reviewed changes

richardjgowers self-assigned this Aug 5, 2018

ayushsuhane and others added 4 commits August 5, 2018 18:54

First commit for gridded method

c04301a

removed the parallel handler to remove hard dependency on omp.h, can …

c3e227a

…be added again for conditional compilation

added the license header

007d47d

Added MDAnalysis header file to c_gridsearch.pyx

fc61e5e

ayushsuhane added 16 commits August 5, 2018 18:54

added the capped function, modified the nsgrid to use box, coordinate…

08de29b

…s as input parameters, modified tests

recovering accidently removed augment in setupd during rebase

9d52ac9

corrected name in documentation

8b8ea74

minor corrections to remove pylint error

e71b671

Used vectors in place of lists and dict, cleaned code and modified fu…

a5d1355

…nction to take coordinates as input, modified tests accordingly

removed dict from buffers for speedup and modified tests

46f9030

mend

2c83681

Removed dict from buffers from speedup, seperated search and self_search, construct a grid and search with another grid

reverted tests, removed sorting from get_indices, get_distances, modi…

b05e243

…fied compilation libs in setup

minor changes for failing tests

4f1e729

added hack to handle NoPBC, tests to follow

7fd1f80

conform tests a/c to modifications in API

57e9d55

removed python functions, added no pbc handle using python min, max

a7264ec

docs unchecked

929fbf5

returning list for get_indices, checked Docs

c817b39

removed parallel libraries

66d94e5

solved contiguous issue, fixed tests, and added checks in API

e28c70a

ayushsuhane force-pushed the cappedgrid branch from 5c3ec1d to e28c70a Compare August 6, 2018 01:55

Added self_capped for nsgrid, moved no pbc box handling outside nsgri…

76c81cf

…d and modified tests

updated CHANGELOG

2a0b01f

orbeckst reviewed Aug 7, 2018

View reviewed changes

ayushsuhane added 2 commits August 6, 2018 18:30

added sebastien to authors

0cde1de

merged develop

b7751d1

richardjgowers approved these changes Aug 7, 2018

View reviewed changes

richardjgowers merged commit e2c2b56 into MDAnalysis:develop Aug 7, 2018

ayushsuhane deleted the cappedgrid branch August 20, 2018 06:00

CharlyEmpereurmot reviewed May 12, 2020

View reviewed changes

		self.fast_update(box)


		cdef void fast_pbc_dx(self, rvec ref, rvec other, rvec dx) nogil:


		self.indices_buffer[beadid_i].push_back(beadid_j)

		dist = sqrt(dist2)


		Enhancements

		* Added a wrapper of lib.nsgrid in lib.distances.self_capped_distance

Conversation

ayushsuhane commented Jul 27, 2018 • edited by orbeckst Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note

Uh oh!

codecov bot commented Jul 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayushsuhane commented Jul 31, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seb-buch Aug 2, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers commented Aug 2, 2018

Uh oh!

ayushsuhane commented Aug 2, 2018

Uh oh!

ayushsuhane commented Aug 2, 2018

Uh oh!

richardjgowers commented Aug 2, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ayushsuhane commented Aug 5, 2018

Uh oh!

ayushsuhane commented Aug 5, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ayushsuhane commented Jul 27, 2018 •

edited by orbeckst

Loading

codecov bot commented Jul 28, 2018 •

edited

Loading

seb-buch Aug 2, 2018 •

edited

Loading

ayushsuhane commented Aug 5, 2018 •

edited

Loading

ayushsuhane commented Aug 7, 2018 •

edited by orbeckst

Loading