-
Notifications
You must be signed in to change notification settings - Fork 765
Open
Labels
Description
Is your feature request related to a problem?
OpenForceField has a really nice function called chemical_environment_matches that returns only the tagged atoms in a SMARTS substructure search. e.g. If you passed '[#6]1:[#6]:[#6:4]:[#6:3]:[#7:2]:[#6:1]:1', it would search the entire substructure but only return the tagged atoms, and ordered by the tag number.
Describe the solution you'd like
Instead of going openff.Molecule.from_rdkit(atoms.convert_to("RDKIT")).chemical_environment_matches("my smarts query"), it would be super handy to implement it directly in MDAnalysis. The way OpenFF does it is just by looking at the GetAtomMapNum()s of the smarts molecule; it looks fairly straightforward. Their code is below (MIT license):
# Set up query.
qmol = Chem.MolFromSmarts(smirks) # cannot catch the error
if qmol is None:
raise ValueError(
'RDKit could not parse the SMIRKS string "{}"'.format(smirks)
)
# Create atom mapping for query molecule
idx_map = dict()
for atom in qmol.GetAtoms():
smirks_index = atom.GetAtomMapNum()
if smirks_index != 0:
idx_map[smirks_index - 1] = atom.GetIdx()
map_list = [idx_map[x] for x in sorted(idx_map)]
# Perform matching
matches = list()
# choose the largest unsigned int without overflow
# since the C++ signature is a uint
max_matches = np.iinfo(np.uintc).max
for match in rdmol.GetSubstructMatches(
qmol, uniquify=False, maxMatches=max_matches, useChirality=True
):
mas = [match[x] for x in map_list]
matches.append(tuple(mas))
return matchesDescribe alternatives you've considered
Don't do that.
Additional context
richardjgowers