Skip to content

Implement tagged smarts matching #3348

@lilyminium

Description

@lilyminium

Is your feature request related to a problem?

OpenForceField has a really nice function called chemical_environment_matches that returns only the tagged atoms in a SMARTS substructure search. e.g. If you passed '[#6]1:[#6]:[#6:4]:[#6:3]:[#7:2]:[#6:1]:1', it would search the entire substructure but only return the tagged atoms, and ordered by the tag number.

Describe the solution you'd like

Instead of going openff.Molecule.from_rdkit(atoms.convert_to("RDKIT")).chemical_environment_matches("my smarts query"), it would be super handy to implement it directly in MDAnalysis. The way OpenFF does it is just by looking at the GetAtomMapNum()s of the smarts molecule; it looks fairly straightforward. Their code is below (MIT license):

# Set up query.
        qmol = Chem.MolFromSmarts(smirks)  # cannot catch the error
        if qmol is None:
            raise ValueError(
                'RDKit could not parse the SMIRKS string "{}"'.format(smirks)
            )

        # Create atom mapping for query molecule
        idx_map = dict()
        for atom in qmol.GetAtoms():
            smirks_index = atom.GetAtomMapNum()
            if smirks_index != 0:
                idx_map[smirks_index - 1] = atom.GetIdx()
        map_list = [idx_map[x] for x in sorted(idx_map)]

        # Perform matching
        matches = list()

        # choose the largest unsigned int without overflow
        # since the C++ signature is a uint
        max_matches = np.iinfo(np.uintc).max
        for match in rdmol.GetSubstructMatches(
            qmol, uniquify=False, maxMatches=max_matches, useChirality=True
        ):
            mas = [match[x] for x in map_list]
            matches.append(tuple(mas))

        return matches

Describe alternatives you've considered

Don't do that.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions