17.1 Substructure Search

Substructure searches can be done in OEChem using the OESubsearch class. The OESubSearch class can be initialized with a SMARTS pattern, a query molecule OEQMolBase, or a molecule and expression options. The following example demonstrates how to initialize a OESubSearch instance with a SMARTS pattern, and perform a substructure search.

#include "oechem.h"
#include <iostream>

using namespace std;
using namespace OEChem;

int main()
{
  OEGraphMol mol;

  OEParseSmiles(mol, "Cc1ccccc1");
  OESubSearch ss("c1ccccc1");

  if (ss.SingleMatch(mol))
  {
    cout << "benzene matches toluene" << endl;
  }
  else
  {
    cout << "benzene does not match toluene" << endl;
  }

  return 0;
}

In the example program, the query pattern is benzene and the molecule in which the substructure is being searched for is toluene. Since benzene is a substructure of toluene the program will identify the substructure, and report the substructure as found.

The OESubSearch class is not only able to identify the presence or absence of a substructure, but also the node and edge correspondences of the pattern and target. The following example extends the simple match example to write out all atom correspondences between benzene and toluene.

#include "oechem.h"
#include <iostream>

using namespace std;
using namespace OEChem;
using namespace OESystem;

int main()
{
  OEGraphMol mol;

  OEParseSmiles(mol, "c1ccccc1C");
  OESubSearch ss("c1ccccc1");

  unsigned int count;
  OEIter<OEMatchBase> match;
  for (count=1,match = ss.Match(mol);match;++match,count++)
  {
    OEIter<OEMatchPair<OEAtomBase> > apr;

    cout << "Match " << count << ':' << endl;
    cout << "pattern atoms: ";
    for (apr = match->GetAtoms();apr;++apr)
      cout << apr->pattern->GetIdx() << ' ';
    cout << endl;

    cout << "target atoms:  ";
    for (apr = match->GetAtoms();apr;++apr)
      cout << apr->target->GetIdx() << ' ';
    cout << endl;
  }

  return 0;
}

The OESubSearch::Match method returns an iterator over all subgraphs. Each of the subgraphs can be queried for their node and edge correspondences. In this particular example, the benzene substructure is identified twelve times in toluene. Each of the matches differ in their node and edge correspondences to the substructure..