Skip to content

Using Self Capped distance function in guess_bonds#2006

Merged
richardjgowers merged 5 commits intoMDAnalysis:developfrom
ayushsuhane:guessbonds
Aug 5, 2018
Merged

Using Self Capped distance function in guess_bonds#2006
richardjgowers merged 5 commits intoMDAnalysis:developfrom
ayushsuhane:guessbonds

Conversation

@ayushsuhane
Copy link
Contributor

guess_bonds function uses brute force which becomes really slow very rapidly with increase in number of atoms. Additional function over capped _distance specifically to handle two equal arrays are introduced and the guess_bonds function is modified

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@ayushsuhane
Copy link
Contributor Author

As you might notice, after evaluating all the pairs in _pkdtree_capped_self, I call calc_distances which is not efficient. Should I write another cython function to calculate distance for a single pair say A - B particle or there exists any fast implementation other than calc_distances (calc_bonds). @richardjgowers mentioned that we would want to calculate all the distances rather than evaluating them one by one.

@ayushsuhane
Copy link
Contributor Author

Sorry for such a clutter in diff. There is another PR to replace BioKDtree. I will clean the history once it is merged.

@richardjgowers
Copy link
Member

@ayushsuhane yeah there's lib.distances.guess_bonds which does pairwise distances. Something like this:

indices = KDTree.something(coords1, coords2, box)

i, j = indices
# i is array of indices referring to coords2, j is for coords2
# calc_bonds is like zip over the coordinates for distances
dists = calc_bonds(coords1[i], coords2[j], box)

winners = dists < max_cutoff

i, j = i[winners], j[winners]

def capped_distance(reference, configuration, max_cutoff, min_cutoff=None, box=None, method=None):
"""Calculates the pairs and distance within a specified distance
def capped_distance(reference, configuration, max_cutoff, min_cutoff=None,
box=None, method=None, equal=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this equal keyword. Make a new function called self_capped_distance then it's similar to how self_distance_array looks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would result in repeating most of the code again especially the _determine_method function.

@codecov
Copy link

codecov bot commented Jul 26, 2018

Codecov Report

Merging #2006 into develop will increase coverage by 0.06%.
The diff coverage is 91.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2006      +/-   ##
===========================================
+ Coverage    88.45%   88.51%   +0.06%     
===========================================
  Files          143      143              
  Lines        17223    17273      +50     
  Branches      2637     2649      +12     
===========================================
+ Hits         15234    15290      +56     
+ Misses        1389     1385       -4     
+ Partials       600      598       -2
Impacted Files Coverage Δ
package/MDAnalysis/topology/guessers.py 100% <100%> (+1.86%) ⬆️
package/MDAnalysis/lib/distances.py 87.63% <91.04%> (+3.49%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c00a04a...7e7ab24. Read the comment docs.

@richardjgowers
Copy link
Member

@ayushsuhane can be rebased now

@ayushsuhane
Copy link
Contributor Author

@richardjgowers Are any changes required here. While most of the code is repeatable inorder to have same signature to capped_distance, is it something which is desired. WRT determine method, I will redo the benchmarks with nsgrid and change it in #2008 . Apparently, guess_bonds is already present in the benchmarks, I will have to check it as well and see how the performance is changing across commits. I checked with small.gro which you provided earlier, and the timing went down from 3 sec to 3 ms for 12k atoms.
https://github.com/ayushsuhane/Benchmarks_Distance/blob/master/Notebooks/guessbonds_benchmark.ipynb

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks good, few comments and get the tests green and we can merge


Enhancements

* Added lib.distances.self_capped_distance to internally select the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Split into two entries

-----
Currently only supports brute force and Periodic KDtree

.. SeeAlso:: :func:'MDAnalysis.lib.distances.distance_array'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self distance array

else:
idx = np.where((dist < max_cutoff))[0]
for other_idx in idx:
j = other_idx + 1 + i
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment for this, not obvious at first

@ayushsuhane
Copy link
Contributor Author

@richardjgowers please review whenever you get the time

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@richardjgowers richardjgowers merged commit 6b74bac into MDAnalysis:develop Aug 5, 2018
@ayushsuhane ayushsuhane deleted the guessbonds branch August 20, 2018 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants