17.1 Using SMARTS

The most common method describing a substructure to search for is to use a SMARTS string. A complete description of the SMARTS language is beyond the scope of this document, but there are a couple of references available.

Assuming you have a SMARTS pattern that describes your search criteria, OEChem makes it very easy to search large numbers of molecules very efficiently. Additionally, you can return just the existence of a match or you can get access to the actual atoms in the target molecule that matched the pattern. In order to search for substructures, you create an instance of the OESubSearch class, initialize it with the SMARTS pattern and the perform the search. The following example will read in a file and print out the SMILES for all that contain a benzene ring (c1ccccc1).

# ch17-1.py
from openeye.oechem import *
import os,sys

# create a subsearch object and initialize
pat = OESubSearch()
pat.Init('c1ccccc1')

# open the input stream
ifs = oemolistream('drugs.sdf')

# open stdout as output stream and set to SMILES
ofs = oemolostream()
ofs.SetFormat(OEFormat_SMI)

# loop over molecules
for mol in ifs.GetOEMols():
    # just check for a match, print if found
    if pat.SingleMatch(mol) == 1:
        OEWriteMolecule(ofs, mol)

The above example only show the existence of the match, not the actual atoms in the target molecule that match the query. OEChem provides a rich set of functions for find the unique substructures (or all) that match a query and for extracting this information from the target molecule.

In the next example, we will use the same basic code as ch17-1.py but will retrieve the actual atom matches from the OESubSearch. While the SingleMatch method returns true or false, the Match method is a generator method that returns all the matches (as instances of the OEMatchBase) in a loop. The Match method takes a molecule as the first argument and a second argument that if true, returns only unique matches. Run the following example with the second argument to Match as 1 and then as 0 to see the difference in the two behaviors.

# ch17-2.py
from openeye.oechem import *
import os,sys

# create a subsearch object and initialize
pat = OESubSearch()
pat.Init('c1ccccc1')

# open the input stream
ifs = oemolistream('drugs.sdf')

# open stdout as output stream and set to SMILES
ofs = oemolostream()
ofs.SetFormat(OEFormat_SMI)

# loop over molecules
for mol in ifs.GetOEMols():
    OETriposAtomNames(mol)
    print mol.GetTitle()
    matchcount = 0
    for matchbase in pat.Match(mol,1):
        print "Match:",matchcount,
        for matchpair in matchbase.GetAtoms():
            print matchpair.target.GetName(),
        matchcount+=1
        print

As each OEMatchBase is returned from the Match generator method, a second loop can loop over either the atoms or bonds of the match. The above example used the GetAtoms method of the OEMatchBase. Each time through the loop, matchpair is an instance of OEMatchPair with the ``target'' a reference to the corresponding atom in the target structure and ``pattern'' a reference to the matching atom in the OESubSearch instance. A corresponding GetBonds method provide a loop over all the bonds in the match.