17.3 Maximum Common Substructure Search

Another method of pattern matching looks for substructures in common between the target molecule and a pattern, but does not necessarily use the pattern as the only query. Maximum Common Substructure (MCS) as implemented in OEChem uses much of the same machinery as standard substructure searching. An instance of OEMCSSearch is constructed with a SMILES, an OEMolBase or an OEQMolBase and then the pattern is matched against a target. OEMatchBases are returned in an analogous fashion, however, there the similarity with substructure searching stops. MCS searching can return matches from as large as the smaller of the two molecules being compared down to a single atom match.

When an MCS is constructed from a SMILES, standard SMARTS matching rules apply for what constitutes a match. If constructed with an OEQMol, then whatever atom and bond expressions have been applied to the OEQMol, apply to subsequent matches in the MCS search. Finally, there is a constructor that takes an OEMolBase directly along with an atom expression and a bond expression as defined above. This prevents the need to pre-define a query molecule.

The next example will read us a molecule created from parsing the SMILES, create an MCS object from this molecule and then search through an input file for the MCS in each target molecule. import os, sys from openeye.oechem import *

# ch17-4.py
mol1=OEGraphMol()
OEParseSmiles(mol1, "c1ccccc1C(=O)C")
OEAssignAromaticFlags(mol1)
OETriposAtomNames(mol1)

# create an OEMCSSearch from this molecule
mcss = OEMCSSearch(mol1,
                   OEExprOpts_DefaultAtoms,OEExprOpts_DefaultBonds)
# ignore substructures smaller than 4 atoms
mcss.SetMinAtoms(4)

ifs = oemolistream("drugs.sdf")
for mol in ifs.GetOEMols():
    OETriposAtomNames(mol)
    print mol.GetTitle()
    matchcount = 0
    for mb in mcss.Match(mol):
        print "Match:", matchcount,"Size:",mb.NumAtoms(),"atoms"
        for mp in mb.GetAtoms():
            sys.stdout.write(" %s->%s " %
                        (mp.target.GetName(),mp.pattern.GetName()))
        print
        matchcount += 1