Skip to content

Introduce operators to groups#1239

Merged
orbeckst merged 4 commits intodevelopfrom
set-operators
Mar 15, 2017
Merged

Introduce operators to groups#1239
orbeckst merged 4 commits intodevelopfrom
set-operators

Conversation

@jbarnoud
Copy link
Copy Markdown
Contributor

@jbarnoud jbarnoud commented Mar 9, 2017

Fixes #726

Add operators, notably but not only set operators, to GroupBase. Change also the meaning of the == operator: previously, a == b was testing instance identoty (same as a is b), now two groups are equals if they are at the same level, from the same universe, and with the same elements in the same order.

I would like to change the behavior of the in operator. I am not sure of what it does now:

def __contains__(self, other):                                              
        if not other.level == self.level:                                       
            # maybe raise TypeError instead?                                    
            # eq method raises Error for wrong comparisons                      
            return False                                                        
        return other.ix in self._ix

I would like to be able to do things like atom in residue_group, or residue in segment.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@jbarnoud
Copy link
Copy Markdown
Contributor Author

jbarnoud commented Mar 9, 2017

I still need to write quite a lot of tests, but here is already the demo I am using to write the thing: https://gist.github.com/jbarnoud/80c13803b551ce4853eb1b5cb3d9e4a8

@jbarnoud
Copy link
Copy Markdown
Contributor Author

jbarnoud commented Mar 9, 2017

For test purposes I would need a system with a ridiculously large number of segments. Is there a way to alter a universe to reasign segments? I could write a function that takes a universe and recreate a Topology object from it; but is there a simplest method? @richardjgowers @dotsdl

@richardjgowers
Copy link
Copy Markdown
Member

@jbarnoud you should be able to use MDAnalysisTests.make_Universe. I think you can also make it produce a Universe with a specified shape, but it might not actually work... I really wouldn't use the TPR system, the TPR parser is one of our slowest iirc

Group with elements of `self` and `other` concatenated

"""
def _get_other_index(self, other):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this method because it does two things at once (both the level check, as well as returned .ix). I do like centralising the level check so we get consistent error messages etc, but I'd prefer something like a decorator:

@only_same_level
def __add__(self, other):
    # at this point, same_level check has happened

@only_same_level
def union(self, other):
    etc

Or if not a decorator, we could have a level check function inside each method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decorator is a good idea indeed. I'll move to that.


@staticmethod
def make_groups(u, level):
a = getattr(u, level)[10:100]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use the default make_Universe() system, you should be able to get away with using only 5 segments to test sets. There's nothing about the set operations which requires more than 5 objects

Copy link
Copy Markdown
Contributor Author

@jbarnoud jbarnoud Mar 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_universe seems indeed to be the solution. I'll see how to reduce the number of element, I was just being greedy 😃

yield self._test_union, a, b, c, d
yield self._test_intersection, a, b, c, d
yield self._test_difference, a, b, c, d
yield self._test_symmetric_difference, a, b, c, d
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you've probably already thought of this, but we also need predictable behaviour when we work with duplicate items in a Group. Ie u.atoms[[0, 1, 2, 1, 2]] | u.atoms[[3, 4]] == u.atoms[[0, 1, 2, 3, 4]]

| ``s.issuperset(t)`` | | test if all elements of |
| | | ``t`` are part of ``s`` |
+-------------------------------+------------+----------------------------+
| ``s.concatenate(t)`` | ``s + t`` | new Group with elements |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth mentioning that this preserves order (and duplicates?), whereas .union will return a sorted unique version

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan to split the table between set operations and not set operations.

| ``s.difference(t)`` | ``s - t`` | new Group with elements of |
| | | ``s`` that are not in ``t``|
+-------------------------------+------------+----------------------------+
| ``s.symmetric_difference(t)`` | ``s ^ t`` | new Group with elements |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be cool if the symbol equivalent worked, you could override __xor__ etc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is planned, indeed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@richardjgowers
Copy link
Copy Markdown
Member

We had some discussion about __contains__ briefly in #1163 where we decided to go for the explicit atom in segment.atoms rather than allowing atom in segment

@jbarnoud
Copy link
Copy Markdown
Contributor Author

Time to step away for a bit. I think I dressed all of @richardjgowers comments:

  • I replaced _get_other_index by a decorator that only checks the level; I also created an ix_array property that returns the index as an array, even for components.
  • I overloaded the operators so we can make ag1 | ag2 or ag1 ^ ag2 like for sets.
  • I added tests for groups with duplicated and scrambled elements.
  • I now use make_Universe and less elements per groups.
  • I split the table in the doc to distinguish between set operators and the others.

In addition, I added the substract method that @orbeckst suggested in #726. It is mapped on the minus sign to be consistent with the plus sign that is mapped on the concatenate method. These two operator diverge from the behavior of sets, but make more sense like that in our specific context.

All the set operators are implemented.

I still have some testing to do. Especially, I need to test that levels cannot be mixed, and that the methods work when provided an elements. I also need to fix the error messages in the @_only_same_level decorator, and to clean up the documentation.

@jbarnoud
Copy link
Copy Markdown
Contributor Author

I did not notice that ComponentBase had the comparison operators overridden already to compare levels. Maybe I should not override then in GroupBase to test subsets/supersets as it may be confusing. Any thought?

@jbarnoud jbarnoud changed the title [WIP] Introduce operators to groups Introduce operators to groups Mar 12, 2017
@jbarnoud
Copy link
Copy Markdown
Contributor Author

I think it is ready for review. I'll rebase once it is ready to be merged.

the same order. Groups that are not at the same level or that belong
to different universe cannot be compared.
"""
o_ix = other.ix_array
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you probably need to use .ix otherwise Atoms evaluate equal to AtomGroups, eg u.atoms[1] == u.atoms[[1]]

richardjgowers added a commit that referenced this pull request Mar 12, 2017
substract -> subtract

reworked subtract to use numpy set operations

tried to use self.ix and self.universe rather than shortcutting
jbarnoud added a commit that referenced this pull request Mar 12, 2017
jbarnoud pushed a commit that referenced this pull request Mar 13, 2017
substract -> subtract

reworked subtract to use numpy set operations

tried to use self.ix and self.universe rather than shortcutting
@jbarnoud
Copy link
Copy Markdown
Contributor Author

I rebased, squashed, fixed the conflict, and force pushed.

@richardjgowers
Copy link
Copy Markdown
Member

Ok this is a big change(tm) so we should get a +2, poke @MDAnalysis/coredevs

One thing I thought of looking at it today, we implement & ^ and |, which follow standard set behaviour, but the implementation of - doesn't. - is more like the opposite of +, which makes sense for MDA, but might trip someone up one day when one of the four set operation symbols doesn't do what they expect. So maybe remove the symbol versions and force people to use methods for & ^ and |

| | | common to ``s`` and ``t`` |
+-------------------------------+------------+----------------------------+
| ``s.difference(t)`` | | new Group with elements of |
| | | ``s`` that are not in ``t``|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't s - t be the equivalent here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see why not: the treatment as set.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, currently its:

ag1 = u.atoms[[1, 2, 1, 2, 3, 4]]
ag2 = u.atoms[[3, 4, 5]]

ag1 + ag2 == [1, 2, 1, 2, 3, 4, 3, 4, 5]
ag1 & ag2 == [3, 4]
ag1 | ag2 == [1, 2, 3, 4, 5]
ag1 - ag2 == [1, 2, 1, 2]  # ie take from ag1 unless in ag2

@jbarnoud Maybe we should make __sub__ == difference and leave subtract as method only access?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See subtraction discussion here: #726 (comment)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers I went back and forth to many time about it, I cannot have an opinion on that anymore. The current version follows @orbeckst suggestion.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for the current status. Since + is not a set operator I think that it's more intuitive that - also isn't.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, we could use another unused operator for difference() (such as %).
Not sure I love such a hackish idea, though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are converging on using - for the set difference.

@jbarnoud
Copy link
Copy Markdown
Contributor Author

An other thing people use to set operators can trip on is that sets overload the comparison operators to test if a set is a subset/superset of an other. I chose not to do the same here, to avoid confusion with how the comparison operators are used for components in mda.

All these symbol ambiguity may be annoying for a user that often use sets and their operators. On the other hand, I mostly wrote these methods to be able to do this:

dppc_upper = upper_leaflet & dppc

@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Mar 14, 2017 via email

@jbarnoud
Copy link
Copy Markdown
Contributor Author

@orbeckst / is free indeed. I do not see anything that would clash with using it for differences.

@richardjgowers
Copy link
Copy Markdown
Member

richardjgowers commented Mar 14, 2017

We can't use symbols differently to the rest of Python, ie people shouldn't have to relearn the language because of our design choices. As far as I can see we can either do:

  1. Only use + and - for the methods concatenate and subtract, set methods have to go through the methods. This means that + and - mostly work as opposites to each other (but don't imply a circular operation)
  2. Use + for concatenate, and all set symbols for set operations, then subtract doesn't have a symbol associated with it. This means that & and - all work as expected according to the larger Python ecosystem

I think I'm in camp 2 so that @jbarnoud 's snippet of c = a & b just works

@orbeckst
Copy link
Copy Markdown
Member

orbeckst commented Mar 14, 2017

I am fine with keeping .subtract() just as a method. I don't really see it being used as much as +. We cannot really change + anymore and in any case, its use is consistent with how list concatenation works in Python.

I think I favor not overloading - in this case, though. Because at least then we do not induce the expectation that + is doing a set-add.

EDIT: I mean, assign - to the set difference, given that this is standard Python behaviors for set

So yes: Option 2 in #1239 (comment)

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking really nice as far as I can tell.

I am just blocking it until the - operator is resolved.

| ``s.concatenate(t)`` | ``s + t`` | new Group with elements |
| | | from ``s`` and from ``t`` |
+-------------------------------+------------+----------------------------+
| ``s.subtract(t)`` | ``s - t`` | new Group with elements |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove - operator for subtract(). (see discussion)

| | | contain the same elements |
| | | in the same order |
+-------------------------------+------------+----------------------------+
| ``x in s`` | | test if ``x`` is part of |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "part of s" mean? In exact contigious order or "any elements of x anywhere in s"?

Probably the former when I compare this to issubset() below but that could be made clearer.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll clarify this. It means component x is part of group s.

| | | common to ``s`` and ``t`` |
+-------------------------------+------------+----------------------------+
| ``s.difference(t)`` | | new Group with elements of |
| | | ``s`` that are not in ``t``|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are converging on using - for the set difference.

return other.ix in self.ix

def __sub__(self, other):
return self.subtract(other)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to self.difference(other)

"""
return self._ix

@property
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just call it ix? The added _array sounds superfluous.

But I admit to my ignorance of the topology system: apparently there's also a ix already defined and that's not what you want? Under which circumstances is ix_array != ix? For Atom? Add some more docs. Perhaps add a SeeAlso or link to :meth:ix.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.ix has a different meaning in ComponentBase and in GroupBase. For groups, self.ix == self.ix_array; both are the array of indices of the components. For components, instead, self.ix is the index of the component as an integer. self.ix_array gives a common interface between groups and components to get the index as an array without having to check for the type.

I'll add a word in the doc.

@_only_same_level
def difference(self, other):
"""Elements from this Group that do not appear in another

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as subtract.

One can also use -:

ag1 - ag2

or

ag1.difference(ag2)

@_only_same_level
def symmetric_difference(self, other):
"""Group of elements which are only in one of this Group or another

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can also use ^.

@_only_same_level
def intersection(self, other):
"""Group of elements which are in both this Group and another

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One can also use &.

return a, b, c, d, e

@staticmethod
def make_groups_duplicated_and_scrumbled(u, level):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like "scrumbled" (... even if you mean "scrambled") ;-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argghh. I wrote it with a french accent.

assert_(str(sg) == '<SegmentGroup [<Segment 4AKE>]>')



Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Impressive battery of tests!

The only thing that seems to be missing is tests for the operator versions, -, &, |, ^.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Their binding is tested test_shortcut_overriding.

@kain88-de
Copy link
Copy Markdown
Member

Does the change of behavior for - break any code inside of MDAnalysis or break existing user code in a way that no warnings/exceptions are shown and just the results change? I've recently noticed I made a change that silently changed results for alignto and I was lucky to notice it, see #1242.

@richardjgowers
Copy link
Copy Markdown
Member

@kain88-de this is purely adding things, ie - et al don't exist yet, so should be fine

class GroupBase(_MutableBase):
"""Base class from which a Universe's Group class is built.

Instances of :class:`GroupBase` provide the following operations that
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should link to these docs from another place. People who use MDAnalysis will most likely not look at the docs for the low level classes like GroupBase. The set operations are a quite neat addition and others will likely want to use them. So should this be linked/show-cased in the normal doc rst files or should we just link to this from AtomGroup/ResidueGroup/SegmentGroup?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was definitely going to include these operations in #1190 which is the tutorial/introduction docs. But yeah, adding them to GroupBase is effectively hiding them from users

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to this from the high-level groups? I haven't done a lot of cross linking in rst docs. We could definitely add a See Also section to the high level groups.

@jbarnoud
Copy link
Copy Markdown
Contributor Author

So the consensus is - for difference?

Copy link
Copy Markdown
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments but you addressed all my previous questions.

| ``x in s`` | | test if ``x`` is part of |
| | | ``s`` |
+-------------------------------+------------+----------------------------+
| ``x not in s`` | | test if ``x`` is not part |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you remove "x not in s"?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not in is not really an operator. It is just the logical opposite of in.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worthwhile pointing out in the docs that this works as expected?

I am fine with leaving it out as you did, given that everything "just works"™ in a pythonic fashion.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully it "just works".

AtomGroups can be compared and combined using group operators. For
instance, AtomGroups can be concatenated::

ag_concat = ag1 + ag2 # or ag_concat = ag1.concat(ag2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't put the alternative in a comment. Either a separate code line or mention it in text with a link to concat().

treats the AtomGroups as sets, duplicates are removed from the resulting
group, and atoms are ordered::

ag_union = ag1 | ag2 # or ag_union = ag1.union(ag2)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above comment

new :class:`AtomGroup` for multiple matches. This makes it
difficult to use the feature consistently in scripts.

AtomGroups can be compared and combined using group operators. For
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice write-up!

:meth:`difference` is the set version of :meth:`subtract`. The resulting
group is sorted and deduplicated.

All set methods are listed in the table bellow. These methods treats the
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

below

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

methods treat

(plural)

| ``s.difference(t)`` | ``s - t`` | new Group with elements of |
| | | ``s`` that are not in ``t``|
+-------------------------------+------------+----------------------------+
| ``s.symmetric_difference(t)`` | ``s ^ t`` | new Group with elements |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, right?

@richardjgowers richardjgowers self-requested a review March 15, 2017 17:11
@jbarnoud
Copy link
Copy Markdown
Contributor Author

I started to actually use these operators for real applications. Here is just a little example because I relly enjoy the feature:

# Split lipids by leaflet and lipid type
lipids = saturated + unsaturated
# We assume that each lipid is a single residue with its atoms
# ordered from the head to the tails
top_head = sum(r.atoms[0] for r in lipids.residues)
top_head_z = top_head.positions[:, 2]
upper_mask = top_head_z > top_head_z.mean()
upper = lipids.residues[upper_mask].atoms
lower = lipids - upper
upper_saturated = upper & saturated
lower_saturated = lower & saturated
upper_unsaturated = upper & unsaturated
lower_unsaturated = lower & unsaturated

@orbeckst
Copy link
Copy Markdown
Member

Blog post!!!

@orbeckst
Copy link
Copy Markdown
Member

@jbarnoud can you do a final clean up of the history and then I am happy to merge (unless anyone else has any further comments).

@richardjgowers
Copy link
Copy Markdown
Member

richardjgowers commented Mar 15, 2017 via email

jbarnoud and others added 4 commits March 15, 2017 21:53
See #726

This commit introduce several method to the GroupBase class, including
all the set operators, and a subtract method. Some operator symbols are
overridden.
substract -> subtract

reworked subtract to use numpy set operations

tried to use self.ix and self.universe rather than shortcutting
@jbarnoud
Copy link
Copy Markdown
Contributor Author

@orbeckst I reworked the history. There is now 4 commits instead on plenty. I did not squash everything together to avoid loosing track @richardjgowers' contribution.

@orbeckst
Copy link
Copy Markdown
Member

Perfect! I'll just wait for travis, just to make extra sure.

@orbeckst orbeckst self-assigned this Mar 15, 2017
@orbeckst orbeckst merged commit 1b165be into develop Mar 15, 2017
@orbeckst orbeckst deleted the set-operators branch March 15, 2017 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants