Added capped function for grid search#2008
Added capped function for grid search#2008richardjgowers merged 48 commits intoMDAnalysis:developfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2008 +/- ##
===========================================
+ Coverage 88.59% 88.59% +<.01%
===========================================
Files 143 143
Lines 17305 17363 +58
Branches 2649 2658 +9
===========================================
+ Hits 15331 15383 +52
- Misses 1376 1379 +3
- Partials 598 601 +3
Continue to review full report at Codecov.
|
| if min_cutoff is not None: | ||
| idx = pair_distance > min_cutoff | ||
| pairs, pair_distance = pairs[idx], pair_distance[idx] | ||
| if pairs.size > 0: |
There was a problem hiding this comment.
@ayushsuhane I don't really understand what this step is doing?
There was a problem hiding this comment.
The problem with concatenate and search id is that it wouldn't mask the pairs within search ids. The pairs which are returned from capped function are such that in a pair (i, j) i is from the reference positions and j is from the configuration positions. But here, it will return all the indices i.e (i, j) might be such that i and j both are indices of the search group.
There was a problem hiding this comment.
Ok so it sounds like this is much better for self_capped_distance than capped_distance right now. Ideally we'd create 2 nsgrids and tell them to search each other?
package/MDAnalysis/lib/distances.py
Outdated
| search_ids = np.arange(len(configuration), len(all_coords)) | ||
| results = gridsearch.search(search_ids=search_ids) | ||
| pairs = results.get_pairs() | ||
| pairs[:, 1] = undo_augment(pairs[:, 1], mapping, len(configuration)) |
There was a problem hiding this comment.
Why are we doing an undo_augment if we never augmented? Is this because we had to concatenate the coordinates together?
There was a problem hiding this comment.
Yeah, I understand it could become confusing. But I just used it since we already had that function which served the purpose. But yes, since we are concatenating and to get the same results from other capped_functions, all the indices should be translated back to their corresponding group indices.
package/MDAnalysis/lib/nsgrid.pyx
Outdated
|
|
||
| self.grid = NSGrid(self.coords_bbox.shape[0], cutoff, self.box, max_gridsize, debug=debug) | ||
| self.prepared = False | ||
| if prepare: |
There was a problem hiding this comment.
I don't get why prepare would be optional. Can you just make it always prepare on init and remove the prepare method?
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| cdef real[:] coord_i, coord_j | ||
| from collections import defaultdict | ||
|
|
||
| indices_buffer = defaultdict(list) |
There was a problem hiding this comment.
@ayushsuhane can you use C++ vecs here, and does it benchmark faster if you do?
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| self.fast_update(box) | ||
|
|
||
|
|
||
| cdef void fast_pbc_dx(self, rvec ref, rvec other, rvec dx) nogil: |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| for i in range (DIM-1, -1, -1): | ||
| while dx[i] > self.c_pbcbox.hbox_diag[i]: | ||
| for j in range (i, -1, -1): | ||
| dx[j] -= self.c_pbcbox.box[i][j] |
There was a problem hiding this comment.
@ayushsuhane Please have a close look at this loop. It does subtract the correct vector to land at point C in your example.
There was a problem hiding this comment.
Yes, I understand this loop. But I was wondering that even though the particles are in a brick shaped box, can we apply minimum_image_ortho here? While the above loop will land at C, I think if we apply minimum_image from calc_distances.h, it will return distance corresponding to AD rather than AC.
There was a problem hiding this comment.
Even if the particles are put inside a brick-shaped box, the box itself is not brick-shaped so the loop is needed to use the correct box vectors and we can't just rely on minimum_image_ortho.
|
|
||
| boxes_1 = (np.array([1, 2, 3, 90, 90, 90], dtype=np.float32), # ortho | ||
| np.array([1, 2, 3, 30, 45, 60], dtype=np.float32), # tri_box | ||
| np.array([1, 2, 3, 45, 60, 90], dtype=np.float32), # tri_box |
There was a problem hiding this comment.
do the tests fail with the 30 angle?
There was a problem hiding this comment.
No, It was some misunderstanding. I will change it again.
|
I tried generating maps of cell_neighbours as well, but it looks like it is a bad idea. Probably it is due to generating dictionary which takes relatively longer time than vectors. Here is a snippet of the code I think using some alternative hashmaps for indices can serve the purpose if we want to go that way. In any case, iterating over indices is a better solution in my opinion (as it will be faster for less number of atoms as well, while iterating over cells will lead to too many @richardjgowers Is this how you were envisioning to create two grids and check the pairs? As far as benchmarks of vectors are concerned, using vectors of vectors in place of dictionary reduces the time by half, whereas replacing vectors with list does not lead to any significant change in performance. However, it increases the readability of the code and we don't need to address the memory allocations. @seb-buch @kain88-de @jbarnoud @richardjgowers @orbeckst , Please review the code. I have tried to remove most of the part which wasn't desired (specifically the get_coordinates). In my opinion, coordinates can be selected directly via python after we evaluate the indices and distances. For a single query, coordinates_buffer was consuming a significant amount of time. I plan to add noPBC option first and then add the documentation. Ofcourse if @seb-buch is OK with it. But in any case, merging before the GSOC would be helpful. For self_search, it returns all the pairs and indices. To obtain the unique pairs, np.unique should be used. If this is how the method is desired, I will add this in the documentation. While the current version is faster than the implemented methods in MDAnalysis for most of the cases, there are still opportunities for improvements. For instance, using |
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| continue | ||
| #find distance between search coords[i] and coords[bid] | ||
| d2 = self.box.fast_distance2(&self.coords_bbox[current_beadid, XX], &self.coords_bbox[bid, XX]) | ||
| if d2 < cutoff2: |
There was a problem hiding this comment.
looks like npairs has the wrong indentation here, you don't want to increment if d2 < EPSILON. I'd do if d2 cutoff and d2 > EPSILON:
package/MDAnalysis/lib/nsgrid.pyx
Outdated
|
|
||
| self.indices_buffer[beadid_i].push_back(beadid_j) | ||
|
|
||
| dist = sqrt(dist2) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| self.distances_buffer[beadid_i].push_back(dist) | ||
|
|
||
| for i in range(self.searchcoords.shape[0]): | ||
| sorted_indices = np.argsort(self.indices_buffer[i]) |
There was a problem hiding this comment.
I'm not sure we have to sort tbh, I'd just give back indices in the order we found them. You can sort inside the tests to check results
richardjgowers
left a comment
There was a problem hiding this comment.
Need some simple docs on each method
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| self.pair_distances2_buffer.push_back(distance2) | ||
| self.npairs += 1 | ||
|
|
||
| def get_pairs(self): |
There was a problem hiding this comment.
docs docs docs. It took me a while to understand what all the different get_X mean in this class
There was a problem hiding this comment.
Yes, I get your point. I was thinking that first we should fix the API and then I would add the docs. However, even for the reviewers to understand, docs are critical. I will add them today.
There was a problem hiding this comment.
I started to write some docs on my branch... However I am officially on vacation so I do not spend much time of them.
Anyway, I will take some time tomorrow to merge the changes from @ayushsuhane and the very little I did on the docs.
package/setup.py
Outdated
| ['MDAnalysis/lib/nsgrid' + source_suffix], | ||
| include_dirs=include_dirs, | ||
| language='c++', | ||
| libraries=parallel_libraries, |
There was a problem hiding this comment.
remove parallel libs here
|
@ayushsuhane Looks like just a precision error on the test, use |
|
I am not sure about both the errors. (1) I didnt' change anything in the pkdtree, but still it raises an error while it didn't raise an error earlier. |
|
To handle the no pbc case, I have used python methods blatantly inside |
|
@ayushsuhane can you link me a line? I can't find it. In general, it's ok to have some python to check things at the start, you just can't have any inside loops |
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| if (box is None) or (np.allclose(box[:3], 0.) and box.shape[0] == 6): | ||
| bmax = np.max(coords, axis=0) | ||
| bmin = np.min(coords, axis=0) | ||
| for i in range(DIM): |
There was a problem hiding this comment.
Mostly looping over all the coordinates to shift the origin since box origin is always (0,0,0) which is used to get the cellids later on. Similarly, np.min and np.max are also dark yellow if you look into annotate.
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| pseudobox[DIM + i] = 90. | ||
| box = pseudobox | ||
| # shift the origin | ||
| coords -= bmin |
There was a problem hiding this comment.
Here you're modifying coords in place, which changes the copy that the user holds (which might lead to tears). Better would be to move the self.coords = coords.copy() call below, and modify self.coords which is the class' copy.
|
@richardjgowers @jbarnoud Any suggestions on how can I remove the error. It says the numpy array is not C-contiguous. I have already tried using np.ascontiguous (now I have changed it to list type), it also doesn't fail the test but fails in travis. |
|
@richardjgowers Can you review it please. If this API is OK (and once self_capped is merged), I can quickly add self capped function along with modified |
package/MDAnalysis/lib/nsgrid.pyx
Outdated
| #find distance between search coords[i] and coords[bid] | ||
| d2 = self.box.fast_distance2(&searchcoords_bbox[current_beadid, XX], &self.coords_bbox[bid, XX]) | ||
| if d2 < cutoff2 and d2 > EPSILON: | ||
| results.add_neighbors(current_beadid, bid, d2) |
There was a problem hiding this comment.
Most of the capped funcctions have a return signature such that same atom in different group (query and reference coordinates) will be obtained as a neighbour. In that case, should I remove EPSILON here. It will be included only in self_search function.
There was a problem hiding this comment.
Yeah remove that epsilon hack so this works as other distance methods
…be added again for conditional compilation
…s as input parameters, modified tests
…nction to take coordinates as input, modified tests accordingly
…fied compilation libs in setup
…d and modified tests
|
@ayushsuhane CHANGELOG then we're done I think :) |
|
I was doing benchmarks for the https://github.com/ayushsuhane/Benchmarks_Distance/blob/master/Notebooks/BM_CappedNS.ipynb |
|
@ayushsuhane thanks for the benchmarks. Looks very encouraging! Have you also thought about adding some benchmarks to https://github.com/MDAnalysis/mdanalysis/tree/develop/benchmarks ? It would be interesting to see how code that relies on distance calculations improves with your additions. (I am not saying that this should be in this PR, but overall it would be nice if one could point to https://www.mdanalysis.org/benchmarks/ for immediate evidence how your work improved the performance of MDAnalysis.) |
orbeckst
left a comment
There was a problem hiding this comment.
Please update CHANGELOG. Thanks!
|
|
||
| Enhancements | ||
|
|
||
| * Added a wrapper of lib.nsgrid in lib.distances.self_capped_distance |
There was a problem hiding this comment.
@seb-buch also needs to be added as an author as he's authored some of the commits
|
@ayushsuhane don't worry about the encore failures on appveyor |
| if cutoff < 0: | ||
| raise ValueError("Cutoff must be positive!") | ||
| if cutoff * cutoff > self.box.c_pbcbox.max_cutoff2: | ||
| raise ValueError("Cutoff greater than maximum cutoff ({:.3f}) given the PBC") |
There was a problem hiding this comment.
Here the maximum cutoff is not displayed when raising the error
There was a problem hiding this comment.
Hi Charly. Thank you for reporting the problem.
A comment on a merged pull request is likely to be forgotten, though. Would you mind opening an issue?
Following #1996 , added capped function on top of the grid search to find all the pairs from reference and configuration state and their respective distances. Also, I have modified the
__init__of classFastNS. For now, it takes box, and atom positions as input parameters.Note
The function cannot handle non-periodic cases in its current state.