3.4 Generating a SMILES from a Molecule

To produce a SMILES string from a molecule, we use a function. The next two examples will use OECreateCanSmiString. OECreateCanSmiString converts the given OEMolBase into a canonical SMILES string and returns that string. Note the difference in the syntax between Python and C++. C++ sends an empty string as an argument, whereas in Python the SMILES string is the return value of the function.

from openeye.oechem import *
import sys

mol = OEGraphMol()

if (OEParseSmiles(mol, "c1ccccc1") == 1):
    smi = OECreateCanSmiString(mol)
    sys.stdout.write("Canonical SMILES is %s\n" % smi)

else:
    sys.stderr.write("SMILES string was invalid!\n")

The following more complicated example reads SMILES from stdin and writes the canonical SMILES to stdout.

#!/usr/bin/env python
# ch3-1.py
from openeye.oechem import *
import sys

mol = OEGraphMol()

smilein = raw_input()
while smilein:
    mol.Clear()
    if (OEParseSmiles(mol, smilein) == 1):
        smi = OECreateCanSmiString(mol)
        sys.stdout.write("%s\n" % smi)
    else:
        sys.stderr.write("%s is an invalid SMILES!" % smilein)
    smilein = raw_input()

Notice that this example makes use of the OEMolBase Clear method to reuse the molecule. The behavior of OEParseSmiles is to add the given SMILES to the current molecule. If the line mol.Clear() was removed from the program, the output would contain longer and longer SMILES containing disconnected fragments.

The above example is a very simple canonical SMILES creation program, but probably doesn't do what most users might expect. The molecule returned by OEParseSmiles preserves the aromaticity present in the input SMILES string, so for example, if benzene is expressed as ``c1ccccc1'' all atoms and bonds are marked as aromatic, but if expressed as a Kekulé form, ``C1=CC=CC=C1'', all atoms and bonds are kept aliphatic.

Input Output
cc c=c
C1=CC=CC=C1 C1=CC=CC=C1
C1=CN=CC=C1 C1=CC=NC=C1

A common task after creating a molecule from SMILES is to normalize its aromaticity with OEAssignAromaticFlags. So the following example will produce canonical SMILES including perception of aromaticity from the connection table.

#!/usr/bin/env python
# ch3-2.py
from openeye.oechem import *
import sys

mol = OEGraphMol()
smilein = raw_input()
while smilein:
    mol.Clear()
    if (OEParseSmiles(mol, smilein) == 1):
        OEAssignAromaticFlags(mol)
        smi = OECreateCanSmiString(mol)
        sys.stdout.write("%s\n" % smi)
    else:
        sys.stderr.write("%s is an invalid SMILES!" % smilein)
    smilein = raw_input()

And here are the results of this new version:

Input Output
cc C=C
C1=CC=CC=C1 c1ccccc1
C1=CN=CC=C1 c1ccncc1

This same program could also be written to construct a new molecule each time through the loop:

#!/usr/bin/env python
# ch3-3.py
from openeye.oechem import *
import sys

smilein = raw_input()
while smilein:
    mol = OEGraphMol()
    if (OEParseSmiles(mol, smilein) == 1):
        OEAssignAromaticFlags(mol)
        smi = OECreateCanSmiString(mol)
        sys.stdout.write("%s\n" % smi)
    else:
        sys.stderr.write("%s is an invalid SMILES!" % smilein)
    smilein = raw_input()