Substructure searches can be done in OEChem using the
OESubsearch
class. The OESubSearch
class can be
initialized with a SMARTS pattern, a query molecule OEQMolBase
,
or a molecule and expression options. The following example
demonstrates how to initialize a OESubSearch
instance with a
SMARTS pattern, and perform a substructure search.
#include "oechem.h" #include <iostream> using namespace std; using namespace OEChem; int main() { OEGraphMol mol; OEParseSmiles(mol, "Cc1ccccc1"); OESubSearch ss("c1ccccc1"); if (ss.SingleMatch(mol)) { cout << "benzene matches toluene" << endl; } else { cout << "benzene does not match toluene" << endl; } return 0; }
In the example program, the query pattern is benzene and the molecule in which the substructure is being searched for is toluene. Since benzene is a substructure of toluene the program will identify the substructure, and report the substructure as found.
The OESubSearch
class is not only able to identify the presence
or absence of a substructure, but also the node and edge
correspondences of the pattern and target. The following example
extends the simple match example to write out all atom correspondences
between benzene and toluene.
#include "oechem.h" #include <iostream> using namespace std; using namespace OEChem; using namespace OESystem; int main() { OEGraphMol mol; OEParseSmiles(mol, "c1ccccc1C"); OESubSearch ss("c1ccccc1"); unsigned int count; OEIter<OEMatchBase> match; for (count=1,match = ss.Match(mol);match;++match,count++) { OEIter<OEMatchPair<OEAtomBase> > apr; cout << "Match " << count << ':' << endl; cout << "pattern atoms: "; for (apr = match->GetAtoms();apr;++apr) cout << apr->pattern->GetIdx() << ' '; cout << endl; cout << "target atoms: "; for (apr = match->GetAtoms();apr;++apr) cout << apr->target->GetIdx() << ' '; cout << endl; } return 0; }
The OESubSearch::Match
method returns an iterator over all
subgraphs. Each of the subgraphs can be queried for their node and
edge correspondences. In this particular example, the benzene
substructure is identified twelve times in toluene. Each of the
matches differ in their node and edge correspondences to the
substructure..