[WIP] u.transfer_to_memory also transfers the box#1537
[WIP] u.transfer_to_memory also transfers the box#1537
Conversation
| "array ({})" | ||
| .format(provided_n_atoms, self.n_atoms)) | ||
|
|
||
| self.dimensions = dimensions |
There was a problem hiding this comment.
I'd store this as _dimensions_array to avoid someone accidentally using it thinking it was ts.dimensions?
| [slice(None)]*(2-f_index)) | ||
| ts.positions = self.coordinate_array[basic_slice] | ||
| if self.dimensions is not None: | ||
| if len(self.dimensions) == 1: |
There was a problem hiding this comment.
if self.dimensions = [np.array([1, 2, 3, 4, 5, 6])] this won't work
| coordinates = [] # TODO: use pre-allocated array | ||
| for i, ts in enumerate(self.trajectory[start:stop:step]): | ||
| coordinates.append(np.copy(ts.positions)) | ||
| dimensions.append(np.copy(ts.dimensions)) |
There was a problem hiding this comment.
annoyingly this will copy the box even for non NPT systems, but I guess it's a small cost compared to the number of atoms that will be floating around
There was a problem hiding this comment.
Is there a reliable way to detect a NPT system? I could detect boxes that are [0, 0, 0, 90, 90, 90] but I am not sure it is much better than copying the box.
There was a problem hiding this comment.
Yeah it's no big deal, and probably safer to do it this way
package/MDAnalysis/core/universe.py
Outdated
| 'copied to memory (frame {frame})') | ||
| pm = ProgressMeter(n_frames, interval=1, | ||
| verbose=verbose, format=pm_format) | ||
| for i, ts in enumerate(self.trajectory[start:stop:step]): |
There was a problem hiding this comment.
This comes from an issue I have with Readers that implement timeseries (i.e. the DCD reader): the timeseries method only returns the coordinates. So whatever the method I use for to get the coordinates, I still need to go through the trajectory to get the box.
I see 3 ways of handling this:
- a
timeserieslike method for the box to go with the one for the positions - having the box reading loginc twice: one if the
timeseriesis used that will iterate over the trajectory only to get the box, and one in the existing loop for readers that have notimeseries - what I did: having the logic only once, but iterating twice in some cases
Solution 1 is the best but implies to add a chunk to the reader, 2 the most efficient for little new development, 3 is the simplest and avoid logic duplication.
I am not sure what is the best approach here.
There was a problem hiding this comment.
Could you add the box info reader to the timeseries PR #1400?
There was a problem hiding this comment.
That would be a good idea, indeed. Especially since, it would allow to just ditch the loop in transfer to memory.
I have one doubt, though. The way I see the API, I would add a ProtoReader.timeseries_dimensions method. This would mean that in all cases transfer_to-memory will iterate twice over the trajectory. This can be a performance issue with slow readers.
Any idea before I dive in it?
There was a problem hiding this comment.
The timeseries function gets a keyword that tells it the information we would like to retrieve
There was a problem hiding this comment.
Factor of 2 in trajectory reading makes a difference for processing large data sets. I/O is probably the biggest issue in speeding up analysis (and getting good parallel performance). I would very much like to have ways to use the fastest approaches that are possible and just fall back to slow solutions if we wouldn't have the feature otherwise.
I think the point of a high-level library such as MDA is that it handles these cases seamlessly. (Otherwise, what's the point vs just using the raw lib.formats readers.)
There was a problem hiding this comment.
Yes, the _timeseries_positions approach should take care of taking advantage of full-speed when possible.
We must be very clear about this in both the general and the DCD-specific Reader docs.
There was a problem hiding this comment.
I think if #1400 gets done, then we can just call Reader.timeseriestrajectory.timeseries and that will use a fast version if a child class implements something fancy (DCD) or fall back onto essentially what is written here.
@kain88-de why do we need an xarray here? Everything is nice and square?
There was a problem hiding this comment.
@richardjgowers, just for the record: organization-wise I think it makes more sense for specialized children to override Reader.timeseries and then themselves do the choice of whether to run their own fancy code or fall back to Reader.timeseries. This way we don't need DCD-specific checks under the base Reader class.
There was a problem hiding this comment.
Yeah sorry, that's what I meant. 1400 implements Reader.timeseries which DCD overrides later.
This commit makes the memory reader deal with a non-constant box dimensions and makes `Universe.transfer_to_memory copy the box in addition to the positions. Until now, only one box was stored in the memory reader, making it unpractical for NPT systems.
4c18ae9 to
2bf90a2
Compare
|
I added a |
This commit makes the memory reader deal with a non-constant box
dimensions and makes `Universe.transfer_to_memory copy the box in
addition to the positions.
Until now, only one box was stored in the memory reader, making it
impractical for NPT systems.
Fixes #1041
PR Checklist