Grid based search for fast distance calculations#1957
Grid based search for fast distance calculations#1957ayushsuhane wants to merge 3 commits intoMDAnalysis:developfrom
Conversation
|
The search function currently returns pairs, squares of distances and indices within a specified distance. We can add more function as desired. I have modified the setup.py to include the gridsearch file as well and the method can be imported as |
…be added again for conditional compilation
orbeckst
left a comment
There was a problem hiding this comment.
Tests are obviously needed. Documentation is needed once it is clear that you are going to move forward with this approach.
Please properly acknowledge the source of the code (see comment at top).
I had some other minor comments/questions (see inline).
| @@ -0,0 +1,647 @@ | |||
| #cython: cdivision=True | |||
There was a problem hiding this comment.
I assume this comes from @seb-buch 's FATslim directly. In this case please add a full comment header that states the origin and the licence under which it has been incorporated.
There was a problem hiding this comment.
I think it'd be good to use a due.cite for the entry point to these methods too
There was a problem hiding this comment.
We need to remember "add due.cite" for PRs in the future. For this PR I'd say that it can be done once @ayushsuhane works on the docs, so I wouldn't worry about it immediately.
The comment header is important to me, though, because we want to make sure that we don't appear to take code that we cannot legally use or to not give proper attribution. Adding comments from the first commit shows that we are taking attribution seriously.
| neighborhood.size = 0 | ||
| neighborhood.allocated_size = NEIGHBORHOOD_ALLOCATION_INCREMENT | ||
| neighborhood.beadids = <ns_int *> malloc(NEIGHBORHOOD_ALLOCATION_INCREMENT * sizeof(ns_int)) | ||
| ###Modified here |
There was a problem hiding this comment.
(I am asking because I have no idea what I should learn from the comment.)
There was a problem hiding this comment.
Yeah, I was trying to modify the code to output some additional values. I should remove them now though.
| neighborhood.beaddist = <real *> malloc(NEIGHBORHOOD_ALLOCATION_INCREMENT * sizeof(real)) | ||
| ### | ||
| if neighborhood.beadids == NULL: | ||
| abort() |
There was a problem hiding this comment.
Does abort() raise a proper Python Exception? Otherwise, shouldn't this raise RuntimeError or OSError or something else that calling code can decide to handle? (It should say that it failed to allocate memory.)
There was a problem hiding this comment.
Also, when dying here, do you leak the memory from the previous allocations?
|
|
||
| cdef ns_neighborhood *neighborhood = <ns_neighborhood *> malloc(sizeof(ns_neighborhood)) | ||
| if neighborhood == NULL: | ||
| abort() |
There was a problem hiding this comment.
Does abort() raise a proper Python Exception? Otherwise, shouldn't this raise RuntimeError or OSError or something else that calling code can decide to handle? (It should say that it failed to allocate memory.)
| DEF BOX_MARGIN=1.0010 | ||
| DEF MAX_NTRICVEC=12 | ||
|
|
||
| from libc.stdlib cimport malloc, realloc, free, abort |
There was a problem hiding this comment.
Are calls to abort() good style for cython? Will calling code be able to properly handle them? What kinds of exceptions do they raise?
(I have a few comments on abort below but I didn't flag all of them.)
There was a problem hiding this comment.
After reading more about abort it seems to close the program In the context of Fatslim, it is what you want, but in that context it means closing the python interpretor. Returning an error code would be better here.
|
|
||
| # Update neighbor lists | ||
| neighborhood.beadids[neighborhood.size] = bid | ||
| ### Modified here |
I don't think it has to be separated. One reason why If all the code fits into the one One advantage of the separation is that by looking at the sources you immediately see that there is a Bottom line: Despite possible advantages for having a
|
|
From what I can gather from the cython documentation
pure C cython functions cannot propagate the errors to the caller functions. Raising an exception like grid.nbeads = NULL
if grid.nbeads == NULL:
with gil:
raise MemoryError('FATAL: Could not allocate memory for NS grid.nbeads (requested:' + str(sizeof(ns_int) * grid_size) +' bytes)\n')results in
Typically, best practice is to handle any error by a corresponding check in a python function. But, it is not possible in this case. While there exists a library for exception handling in c++ which can be defined as ------------- | ---------------- | -------------- Fortunately enough, I think we can use the |
jbarnoud
left a comment
There was a problem hiding this comment.
The memory leak makes the code unusable in practice and should be addressed. There should be specific tests to make sure that the memory is freed as it is something we can miss when the code will change in the future.
|
|
||
|
|
||
| def __dealloc__(self): | ||
| #destroy_nsgrid(self.grid) |
There was a problem hiding this comment.
From my play with @seb-buch module in the benchmark, having this line commented leads to a memory leak. When an instance of FastNS is destroyed by the garbage collector, self.grid—that is allocated directly with malloc and therefore not handled by python—is not freed. As a result, running a neighbor search on multiple frames result in all the memory being eating up. I had to hard reboot a coule of time because of that specific line being commented out.
It is likely commented for a reason, though. Maybe @seb-buch has some insight to give us on why he commented the line.
It also means we have to be careful with memory management with that PR.
There was a problem hiding this comment.
Yes I agree. Even raising the exceptions/ abort will lead to memory leak. While I have never coded in Cython, I have couple of possible ideas based on what people have done earlier and suggested elsewhere.
First is we can use atexit() functionality, where one can make all the malloc pointers global during the allocation and whenever any instance is destroyed, all the global pointers can be dereferenced (at the atexit() function) before the destruction. Additionally, we can put the same function before raising/catching any exception, so all the pointers which are populated globally are de-referenced first, followed by a gracefull exit.
Another approach to catch the exceptions is using "cysignals/signals.pxi", which catches the exceptions inside the pure C code and can be used along with cdef [datatype] function() except *. This can probably be coupled with atexit() as well.
There was a problem hiding this comment.
I think the line just has to be uncommented. Also, the initialization should probably go in __cinit__ rather than __init__.
| cdef ns_int i | ||
| cdef bint initialization_ok | ||
|
|
||
| if self.prepared and not force: |
There was a problem hiding this comment.
If there is nothing to do, just do nothing. Here, you are preparing again.
|
About the use of |
|
As an afterthought after reading my two last comments back to back, it may be worth doing some cleaning there: https://github.com/MDAnalysis/mdanalysis/pull/1957/files#diff-bff4d08482de32360b2c33d16e6091c5R587. I think about freeing the memory before raising the exception. |
|
|
||
|
|
||
| def __dealloc__(self): | ||
| #destroy_nsgrid(self.grid) |
There was a problem hiding this comment.
I think the line just has to be uncommented. Also, the initialization should probably go in __cinit__ rather than __init__.
|
|
||
| if initialization_ok: | ||
| self.prepared = True | ||
| else: |
There was a problem hiding this comment.
If we reach that point, we are in a stale state. There should be some cleaning here. Though, just destroying the grid is not enough. The grid should be restored in a pre-prepared state.
| DEF BOX_MARGIN=1.0010 | ||
| DEF MAX_NTRICVEC=12 | ||
|
|
||
| from libc.stdlib cimport malloc, realloc, free, abort |
There was a problem hiding this comment.
After reading more about abort it seems to close the program In the context of Fatslim, it is what you want, but in that context it means closing the python interpretor. Returning an error code would be better here.
|
;TL DR Sorry my response time is sluggishly slow but indeed, the code is a quick and dirty port from fatslim, so there are much probably a few things that are not perfect (#euphemism).
My experience with Cython makes me think that playing with malloc()/free() is playing on thin ice (well no more no less than when your write C code) but you barely have any other option if you want really fast code (ie without the GIL) while needing many memory allocations and/or allocation of pure C structures. The way the NS code is implemented in FATSLiM make this pretty much mandatory but it may not be required here and it might well be possible to implement the grid search while staying in the memory comfort zone (aka no malloc()/no free()). |
|
Is this PR superceded by PR #1996 and hence can be closed? |
Fixes #974
Include grid based search for fast selections.
The primary aim is to include the grid based search in MDAnalysis. Using FATSLiM neighbour search as the base library and adding other functionalities over top of it to handle PBC, non-PBC and triclinic cells.
PR Checklist