17.2 Maximum Common Substructure Search

The maximum substructure common to two molecular graphs can be identified using the OEMCSSearch class. The following example demonstrates how to initialize an OEMCSSearch instance, and to perform a maximum common substructure search with another molecule.

#include "oechem.h"

#include <iostream>

using namespace std;
using namespace OEChem;
using namespace OESystem;

int main()
{
  OEGraphMol m1,m2;

  OEParseSmiles(m1, "c1cc(O)c(O)cc1CCN");
  OEParseSmiles(m2, "c1c(O)c(O)c(Cl)cc1CCCBr");

  OEMCSSearch mcss(m1,OEExprOpts::DefaultAtoms,OEExprOpts::DefaultBonds);

  unsigned int count;
  OEIter<OEMatchBase> match;
  for (count=1,match = mcss.Match(m2);match;++match,count++)
  {
    OEIter<OEMatchPair<OEAtomBase> > apr;

    cout << "Match " << count << ':' << endl;
    cout << "pattern atoms: ";
    for (apr = match->GetAtoms();apr;++apr)
      cout << apr->pattern->GetIdx() << ' ';
    cout << endl;

    cout << "target atoms:  ";
    for (apr = match->GetAtoms();apr;++apr)
      cout << apr->target->GetIdx() << ' ';
    cout << endl;

    OEGraphMol m3;
    OESubsetMol(m3,match,true);
    string smi;
    OECreateSmiString(smi,m3);
    cout << "match smiles = " << smi << endl;
  }

  return 0;
}

The first molecule 'm1' in the example is dopamine, and the second molecule 'm2' is a dopamine analog. The OEMCSSearch instance is initialized with dopamine, and arguments which control how node and edge expressions are built for the maximum common substructure query. Please refer to section 17.4 for an explanation of expression options and the OEExprOpts namespace. The OEMCSSearch::Match method returns an iterator over the maximum common substructure(s), that can be passed as an argument to the OESubsetMol function, and subsequently converted into a smiles string. Using the standard OEMatchBase::GetAtoms interface, the atom correspondences are also printed out by the example program.