Skip to content

Tidying Universe.__init__#2322

Merged
richardjgowers merged 20 commits intoMDAnalysis:developfrom
lilyminium:universe-clsmethods
Feb 16, 2020
Merged

Tidying Universe.__init__#2322
richardjgowers merged 20 commits intoMDAnalysis:developfrom
lilyminium:universe-clsmethods

Conversation

@lilyminium
Copy link
Member

@lilyminium lilyminium commented Aug 8, 2019

Related to #2282
Fixes #2527

Changes made in this Pull Request:

  • Added class methods to Universe
  • Tidied __init__ by having explicit kw/args and moving the topology/coordinate sorting to external functions
  • An empty universe can no longer be created with Universe(). Use Universe.empty() Universe() creates a universe with 0 atoms
  • Failing to read a topology for coordinates (because len(coordinates)==0, or all_coordinates==True) now raises a warning

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@lilyminium lilyminium changed the title Universe clsmethods Adding classmethods to Universe Aug 8, 2019
@richardjgowers
Copy link
Member

@lilyminium I agree that universe creation is a messy process, and there's lots of confusing options!

But I think that filenames (ie mda.Universe('file.gro', 'traj.trr')) always need to be the default option. Streams work a lot like filenames I think... (I have never used this) but I think they ducktype to act like file object so these can share a "front door".

This then leaves passing the Topology object, which is probably not used often (I added it mostly as a dev way of building universes) so if anything, Universe.from_Topology should be the new class method (but probably not actually...)

I'm not sure what the no arguments use case is, especially now we have the Universe.empty() classmethod. We can probably remove this.

I really like the cleaner structure of Universe.__init__ - would it be possible to make this refactor instead something along the lines of:

class Universe
    def __init__(self, args):
        if filenames_given:
            top = self._top_from_filenames(args)
        elif streams_given:
            top = self._top_from_streams(args)
        elif Topology_given:
            top = args
        # at this point we've unified all the mess into a proper Topology object
        self._generate_from_topology()

        # mess of coordinates can also be hidden away before this too?
        coordinates = figure_out_coordinates(args)
        self.load_new(coordinates)

So we hide the messiness of the different creation arguments, but we don't force a lot of different creation methods.

In terms of docs, I get that the U.from_X classmethods are nicer and self documenting. We could add these as syntactic sugar, so they'd just route straight into U.__init__, but their docstrings could be more helpful than the init docstring, which is a little long.

@lilyminium lilyminium changed the title Adding classmethods to Universe Tidying Universe.__init__ Aug 19, 2019
@codecov
Copy link

codecov bot commented Aug 19, 2019

Codecov Report

❗ No coverage uploaded for pull request base (develop@b3a8525). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             develop    #2322   +/-   ##
==========================================
  Coverage           ?   90.56%           
==========================================
  Files              ?      170           
  Lines              ?    22925           
  Branches           ?     2959           
==========================================
  Hits               ?    20763           
  Misses             ?     1566           
  Partials           ?      596
Impacted Files Coverage Δ
...DAnalysis/analysis/hydrogenbonds/hbond_analysis.py 95.45% <ø> (ø)
package/MDAnalysis/coordinates/base.py 93.8% <ø> (ø)
package/MDAnalysis/analysis/helanal.py 88% <ø> (ø)
package/MDAnalysis/coordinates/GRO.py 93.46% <ø> (ø)
...ckage/MDAnalysis/analysis/encore/confdistmatrix.py 67.77% <ø> (ø)
...e/MDAnalysis/analysis/encore/clustering/cluster.py 96.22% <ø> (ø)
package/MDAnalysis/coordinates/DCD.py 97.08% <ø> (ø)
package/MDAnalysis/analysis/align.py 87.32% <ø> (ø)
package/MDAnalysis/analysis/lineardensity.py 81.81% <ø> (ø)
package/MDAnalysis/analysis/contacts.py 100% <ø> (ø)
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b3a8525...ec4172c. Read the comment docs.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, few comments but I like how much easier Universe.__init__ is now

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs either mushing into gross py2 syntax or putting on ice for a little

# managed attribute holding Reader
self._trajectory = None

def __init__(self, topology, *coordinates, all_coordinates=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So Python 2.7 doesn't support this syntax, which is why we have the fugly *args, **kwargs syntax. So if we want this merged before the Py3 switch, we can change this to args/kwargs (and lose readability for a while) else we can put this PR on ice for a while and merge it once we've dropped Py2.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only visible changes are to the more articulate argument documentation, so it's probably more efficient to just wait until Python 2 finally goes away. Same with #2333

@jbarnoud jbarnoud added the wait-python3 Requires python3-only feature and need to wait for python2 support to be dropped. label Jan 10, 2020
@orbeckst
Copy link
Member

In #2527 it came up that Universe() would be the "pythonic" way to create an empty universe (i.e., what most users would try first). In the discussion here the point

An empty universe can no longer be created with Universe(). Use Universe.empty()

is included. What was the rationale for explicitly forbidding Universe()?

For context: #2527 states that at the moment Universe() produces something that is different from Universe.empty() which raises an exception for its own repr() so either Universe() should immediately raise an exception or do the same as Universe.empty().

@richardjgowers
Copy link
Member

@orbeckst I think the signature&options and intent of mda.Universe.empty() is so different than mda.Universe(file1, file2) that it should have a separate entry point. I think 90+% of people want mda.Universe(files) so that should be what people see. The power use of .empty() should be separate in a classmethod that has all pertinent documentation there.

I thought this PR had been merged already, oops.

@orbeckst
Copy link
Member

orbeckst commented Feb 13, 2020 via email

@lilyminium
Copy link
Member Author

I changed it so that Universe() is the same as calling Universe.empty(0, 0, 0). So in that respect it can close #2527. Now that I've done it, though, I'm not sure it's a good idea -- the universe winds up being pretty useless and you cannot add atoms as people may expect. There could be a warning?

>>> import MDAnalysis as mda
>>> u = mda.Universe()
>>> print(u)
<Universe with 0 atoms>
>>> u.residues
<ResidueGroup with 0 residues>
>>> u._topology
<MDAnalysis.core.topology.Topology object at 0x105fe0590>

@orbeckst
Copy link
Member

Now that I've done it, though, I'm not sure it's a good idea -- the universe winds up being pretty useless and you cannot add atoms as people may expect. There could be a warning?

Thank you for the explanation, that helps a lot.

So the Universe is more like a numpy array (unsurprisingly...) than a list. In this case I'll back-paddle on what I said before and change my opinion:

  • Universe() should just fail — whatever the appropriate exception is when an argument is missing, I think TypeError. The error message can say that one should consider Universe.empty(n_atoms, ...). (note that np.array() raises TypeError)
  • Universe.empty() should require at least one argument. np.array([]) does not complain about anything so I would also say, we don't have to warn about anything if the user sets n_atoms=0: it's the user's decision. The important distinction is not to allow empty arguments (as you would do with list()).

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok mushed the nice signatures into py2.7 friendly syntax, can't wait to change them back

@richardjgowers richardjgowers merged commit aca26d2 into MDAnalysis:develop Feb 16, 2020
@lilyminium lilyminium deleted the universe-clsmethods branch April 14, 2020 08:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wait-python3 Requires python3-only feature and need to wait for python2 support to be dropped.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Universe() should either work as Universe.empty() or raise an exception

4 participants