"around" selection is slow and memory-intensive

As of `0.15` performing an `around` search can be quite expensive, both compute-wise as memory-wise. This crops up more obviously for systems with many atoms, and where the reference for `around` is itself also large.

This was tested in a system with ~640k atoms, where the reference has 1130 atoms (`select_atoms around 15 name ATNAME`). It took at least 15 seconds. Doubling the system size (and reference too) busts my 16Gb of memory.

From the discussion in #383, and later the addition of OpenMP (#529), I was under the impression our implementation was quite fast. But for comparison, for the same system and machine VMD does the same selection operation with little memory usage spike, and virtually instantaneously. Granted, VMD doesn't do PBC but is it enough to account for this?

I took a look at their code and from what I could understand they implement several optimizations some of which I thought we were already doing since #529:
- gridded search;
- parallelization (threading);
- pre-filtering of the search space (for compact references it's easy to define a parallelepiped that contains it with a padding equal to the cutoff, and you're sure your contacts will all be inside that volume);
- shortcutting (when an atom being searched is found to be in contact, no need to calculate further distances from it to the other reference atoms).

I admit I'm not familiar with the internals of `distance_array` and `contact_matrix`, but my question is: is there room for optimization or am I hitting MDAnalysis' limits here?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"around" selection is slow and memory-intensive #974

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"around" selection is slow and memory-intensive #974

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions