More flexibility with wildcards in selection#2551
More flexibility with wildcards in selection#2551orbeckst merged 22 commits intoMDAnalysis:developfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2551 +/- ##
===========================================
- Coverage 90.68% 90.68% -0.01%
===========================================
Files 169 169
Lines 22833 22828 -5
Branches 2940 2939 -1
===========================================
- Hits 20707 20702 -5
Misses 1540 1540
Partials 586 586 Continue to review full report at Codecov.
|
richardjgowers
left a comment
There was a problem hiding this comment.
This looks good, thanks for adding tests. One question, is there a performance difference? Ie if you do select_atoms('name H?') on a large (~100k atoms) system, how does the timing of this compare?
| assert ag == ag_wild | ||
|
|
||
| def test_wildcard_double_selection(self, universe): | ||
| ag = universe.select_atoms('resname ASN or resname ASP or resname HSD') |
There was a problem hiding this comment.
a shortcut here is resname ASN ASP HSD
richardjgowers
left a comment
There was a problem hiding this comment.
Will also need a CHANGELOG entry and adding yourself to AUTHORS
|
Thank you for the feedback. I implemented the suggested changes and added two more corner case tests that I could think of. As for the performance, I measured it using a bigger system (340K atoms) and the old version which doesn't support different wildcards performed slightly better (0.6s vs 0.9s). I am not aware of the underlying implementation of fnmatch but we could also try using the python re module. |
richardjgowers
left a comment
There was a problem hiding this comment.
Ok cool, I just wanted to check we weren't tanking performance by making this change. Couple tweaks needed then we'll be good to go
package/CHANGELOG
Outdated
| mm/dd/yy richardjgowers, kain88-de, lilyminium, p-j-smith, bdice, joaomcteixeira, | ||
| PicoCentauri, davidercruz, jbarnoud, RMeli, IAlibay, mtiberti, CCook96, | ||
| Yuan-Yu, xiki-tempula | ||
| Yuan-Yu, xiki-tempula, Iv-Hristov |
There was a problem hiding this comment.
So because this is your first ever contribution, you also need to add your name to the AUTHORS file
package/CHANGELOG
Outdated
| * Added parmed to setup.py | ||
|
|
||
| Enhancements | ||
| * Changed selection wildcards to support multiple wildcards |
There was a problem hiding this comment.
reference the initial issue #
package/MDAnalysis/core/selection.py
Outdated
|
|
||
| Supports the use of one wildcard at the start, | ||
| end, and middle of strings | ||
| Supports multiple wildcards, based on fnmatch |
There was a problem hiding this comment.
add a .. versionchanged:: thing
package/MDAnalysis/core/selection.py
Outdated
| mask |= np.char.startswith(values, val[:wc_pos]) | ||
| mask &= np.char.endswith(values, val[wc_pos+1:]) | ||
|
|
||
| values = getattr(group, self.field).astype(np.str_) |
There was a problem hiding this comment.
so I think I was calling .astype(np.str_) here because we were pushing it into a np.char function. Now we're using fnmatch this likely isn't necessary, try removing this?
|
@IAlibay @richardjgowers Thank you for the feedback! I have combined all the wildcard tests to eliminate code duplication as Irfan suggested. |
package/MDAnalysis/core/selection.py
Outdated
| Supports the use of one wildcard at the start, | ||
| end, and middle of strings | ||
| .. versionchanged:: 0.21 | ||
| Supports multiple wildcards, based on fnmatch |
There was a problem hiding this comment.
@Iv-Hristov the text entry in a versionchanged needs to be indented (this is what is causing Travis to fail).
So:
.. versionchanged:: 1.0.0
Supports...
orbeckst
left a comment
There was a problem hiding this comment.
@Iv-Hristov this looks like an excellent contribution.
But the documentation is missing. fnmatch extends the simple *-globbing that we had before. It is very important that this is documented. Find the parts in the documentation that explained the globbing and update them (e.g. Selections should now get its own section on pattern matching).
@lilyminium will need to know this, too, for the AtomSelection Language section of the User Guide.
I know that this seems a lot of extra work for only a few lines of code. But that's why we make you do PRs, so that you get a real idea what it means to produce software that are used by many people.
|
@orbeckst Thank you for the feedback! I will make sure to do that later today. |
|
I think I documented all the functionality and added two more tests. However, I am having a problem where pytest.mark.parametrize complains if I try and use square brackets as an input because it expects a string. I tried a few escape characters such as '', '\', '\Q... \E' but nothing seemed to work. Does anyone have any experience with what escape sequence might work so that I can squish all the tests together? This is what gives the error "tuple object not callable": ` |
|
You missed the comma after line EDIT: Good practice is to have a comma even after the last element so that you can easily add more elements to the list without the highly informative "tuple object not callable" error ;-) |
orbeckst
left a comment
There was a problem hiding this comment.
Looks really good, just one typo in the docs.
| Pattern matching | ||
| ---------------- | ||
|
|
||
| The pattern matching notation described bellow is used to specify |
| ---------------- | ||
|
|
||
| The pattern matching notation described bellow is used to specify | ||
| patterns for matching strings: |
There was a problem hiding this comment.
| patterns for matching strings: | |
| patterns for matching strings (based on :mod:`fnmatch`): |
|
Congratulations @Iv-Hristov on your first merged PR. Nice contribution! |
Fixes #2436
Changes made in this Pull Request:
PR Checklist