-
Notifications
You must be signed in to change notification settings - Fork 823
Add SMARTS selection #2883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SMARTS selection #2883
Changes from all commits
8ef63ac
ecd95e5
517f9a5
51ed059
b932512
b1596b3
350a995
4c1bbfb
8fcf9d2
5569155
1d2bb91
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -600,6 +600,60 @@ def apply(self, group): | |
| return group[group.aromaticities].unique | ||
|
|
||
|
|
||
| class SmartsSelection(Selection): | ||
| """Select atoms based on SMARTS queries. | ||
|
|
||
| Uses RDKit to run the query and converts the result to MDAnalysis. | ||
| Supports chirality. | ||
| """ | ||
| token = 'smarts' | ||
|
|
||
| def __init__(self, parser, tokens): | ||
| # The parser will add spaces around parentheses and then split the | ||
| # selection based on spaces to create the tokens | ||
| # If the input SMARTS query contained parentheses, the query will be | ||
| # split because of that and we need to reconstruct it | ||
| # We also need to keep the parentheses that are not part of the smarts | ||
| # query intact | ||
| pattern = [] | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it simpler to enforce no white space allowed in the pattern?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. White spaces are added automatically by the parser around parentheses: |
||
| counter = {"(": 0, ")": 0} | ||
| # loop until keyword but ignore parentheses as a keyword | ||
| while tokens[0] in ["(", ")"] or not is_keyword(tokens[0]): | ||
| # keep track of the number of open and closed parentheses | ||
| if tokens[0] in ["(", ")"]: | ||
| counter[tokens[0]] += 1 | ||
| # if the char is a closing ")" but there's no corresponding | ||
| # open "(" then we've reached then end of the smarts query and | ||
| # the current token ")" is part of a grouping parenthesis | ||
| if tokens[0] == ")" and counter["("] < (counter[")"]): | ||
| break | ||
| # add the token to the pattern and remove it from the tokens | ||
| val = tokens.popleft() | ||
| pattern.append(val) | ||
| self.pattern = "".join(pattern) | ||
|
|
||
| def apply(self, group): | ||
| try: | ||
| from rdkit import Chem | ||
| except ImportError: | ||
| raise ImportError("RDKit is required for SMARTS-based atom " | ||
| "selection but it's not installed. Try " | ||
| "installing it with \n" | ||
| "conda install -c conda-forge rdkit") | ||
| pattern = Chem.MolFromSmarts(self.pattern) | ||
| if not pattern: | ||
| raise ValueError(f"{self.pattern!r} is not a valid SMARTS query") | ||
| mol = group.convert_to("RDKIT") | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be nice to try and cache this in Topology.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The mol (topology, not coordinates) is already cached in the converter so that we don't need to rebuild it for every frame of a trajectory: |
||
| matches = mol.GetSubstructMatches(pattern, useChirality=True) | ||
| # convert rdkit indices to mdanalysis' | ||
| indices = [ | ||
| mol.GetAtomWithIdx(idx).GetIntProp("_MDAnalysis_index") | ||
| for match in matches for idx in match] | ||
| # create boolean mask for atoms based on index | ||
| mask = np.in1d(range(group.n_atoms), np.unique(indices)) | ||
| return group[mask] | ||
|
|
||
|
|
||
| class ResidSelection(Selection): | ||
| """Select atoms based on numerical fields | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.