[WIP] u.transfer_to_memory also transfers the box by jbarnoud · Pull Request #1537 · MDAnalysis/mdanalysis

jbarnoud · 2017-07-20T17:39:36Z

This commit makes the memory reader deal with a non-constant box
dimensions and makes `Universe.transfer_to_memory copy the box in
addition to the positions.

Until now, only one box was stored in the memory reader, making it
impractical for NPT systems.

Fixes #1041

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

richardjgowers

Tests!

richardjgowers · 2017-07-21T01:19:13Z

package/MDAnalysis/coordinates/memory.py

                                 "array ({})"
                                 .format(provided_n_atoms, self.n_atoms))

+        self.dimensions = dimensions


I'd store this as _dimensions_array to avoid someone accidentally using it thinking it was ts.dimensions?

richardjgowers · 2017-07-21T01:19:52Z

package/MDAnalysis/coordinates/memory.py

                       [slice(None)]*(2-f_index))
        ts.positions = self.coordinate_array[basic_slice]
+        if self.dimensions is not None:
+            if len(self.dimensions) == 1:


if self.dimensions = [np.array([1, 2, 3, 4, 5, 6])] this won't work

richardjgowers · 2017-07-21T01:20:25Z

package/MDAnalysis/core/universe.py

                coordinates = []  # TODO: use pre-allocated array
                for i, ts in enumerate(self.trajectory[start:stop:step]):
                    coordinates.append(np.copy(ts.positions))
+                    dimensions.append(np.copy(ts.dimensions))


annoyingly this will copy the box even for non NPT systems, but I guess it's a small cost compared to the number of atoms that will be floating around

Is there a reliable way to detect a NPT system? I could detect boxes that are [0, 0, 0, 90, 90, 90] but I am not sure it is much better than copying the box.

Yeah it's no big deal, and probably safer to do it this way

richardjgowers · 2017-07-21T01:20:54Z

package/MDAnalysis/core/universe.py

+                             'copied to memory (frame {frame})')
+                pm = ProgressMeter(n_frames, interval=1,
+                                   verbose=verbose, format=pm_format)
+                for i, ts in enumerate(self.trajectory[start:stop:step]):


This comes from an issue I have with Readers that implement timeseries (i.e. the DCD reader): the timeseries method only returns the coordinates. So whatever the method I use for to get the coordinates, I still need to go through the trajectory to get the box.

I see 3 ways of handling this:

a timeseries like method for the box to go with the one for the positions

having the box reading loginc twice: one if the timeseries is used that will iterate over the trajectory only to get the box, and one in the existing loop for readers that have no timeseries

what I did: having the logic only once, but iterating twice in some cases

Solution 1 is the best but implies to add a chunk to the reader, 2 the most efficient for little new development, 3 is the simplest and avoid logic duplication.

I am not sure what is the best approach here.

Could you add the box info reader to the timeseries PR #1400?

That would be a good idea, indeed. Especially since, it would allow to just ditch the loop in transfer to memory.

I have one doubt, though. The way I see the API, I would add a ProtoReader.timeseries_dimensions method. This would mean that in all cases transfer_to-memory will iterate twice over the trajectory. This can be a performance issue with slow readers.

Any idea before I dive in it?

The timeseries function gets a keyword that tells it the information we would like to retrieve

Factor of 2 in trajectory reading makes a difference for processing large data sets. I/O is probably the biggest issue in speeding up analysis (and getting good parallel performance). I would very much like to have ways to use the fastest approaches that are possible and just fall back to slow solutions if we wouldn't have the feature otherwise.

I think the point of a high-level library such as MDA is that it handles these cases seamlessly. (Otherwise, what's the point vs just using the raw lib.formats readers.)

Yes, the _timeseries_positions approach should take care of taking advantage of full-speed when possible.

We must be very clear about this in both the general and the DCD-specific Reader docs.

I think if #1400 gets done, then we can just call ~~Reader.timeseries~~ trajectory.timeseries and that will use a fast version if a child class implements something fancy (DCD) or fall back onto essentially what is written here.

@kain88-de why do we need an xarray here? Everything is nice and square?

@richardjgowers, just for the record: organization-wise I think it makes more sense for specialized children to override Reader.timeseries and then themselves do the choice of whether to run their own fancy code or fall back to Reader.timeseries. This way we don't need DCD-specific checks under the base Reader class.

Yeah sorry, that's what I meant. 1400 implements Reader.timeseries which DCD overrides later.

This commit makes the memory reader deal with a non-constant box dimensions and makes `Universe.transfer_to_memory copy the box in addition to the positions. Until now, only one box was stored in the memory reader, making it unpractical for NPT systems.

kain88-de · 2018-01-05T13:32:25Z

I added a dimensions keyword arg to the timeseries function. That makes the reading easier. It's simple solution for now.

jbarnoud added the Work in progress label Jul 20, 2017

richardjgowers requested changes Jul 21, 2017

View reviewed changes

richardjgowers self-assigned this Jul 21, 2017

jbarnoud and others added 6 commits January 5, 2018 14:26

Change reader.dimensions to reader._dimensions_array

775aa33

Some cleaning on dimensions for memory reader

4a1007c

Update doc

1b40d11

read dimensions into memory

c51d160

update memory reader api

2bf90a2

kain88-de force-pushed the memory-dimensions branch from 4c18ae9 to 2bf90a2 Compare January 5, 2018 13:31

richardjgowers mentioned this pull request Sep 19, 2018

MemoryReader facelift #2080

Merged

4 tasks

orbeckst added the close? Evaluate if issue/PR is stale and can be closed. label Sep 28, 2018

richardjgowers closed this Oct 9, 2018

RMeli deleted the memory-dimensions branch December 23, 2023 21:57

Conversation

jbarnoud commented Jul 20, 2017

PR Checklist

Uh oh!

richardjgowers left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

richardjgowers Jul 26, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kain88-de commented Jan 5, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

richardjgowers Jul 26, 2017 •

edited

Loading