28.2 Library Generation

The OELibraryGen was designed to give programmers a high degree of control when applying chemical transformations. It was also designed for efficiency. Potentially costly preprocessing is performed a single time before transformations can be carried out. The relative setup cost of a OELibraryGen instance may be high, and the memory use large as preprocessed reactants are stored in memory. Subsequent generation of products,however, is very efficient because setup costs are paid in advance. The OELibraryGen class serves a dual purpose of managing sets of preprocessed starting materials, and storing a list of chemical transform operations defined by a reaction molecule.

Chemical transform operations are carried out on starting materials. Starting materials provide most of the virtual matter that goes into making virtual product molecules. The OELibraryGen class provides an interface to associate starting materials with reactant patterns using the OELibraryGen::SetStartingMaterial and OELibraryGen::AddStartingMaterial methods. These methods associate starting materials to reactant patterns using the index (reactant number) of the pattern. Reactant patterns are numbered starting at zero for the lowest atom index and all atoms that are a members of the same connected component. The next reactant pattern begins with the next lowest atom index that is not a member of the first component. In a SMIRKS pattern the first reactant (reactant number zero) is the furthest reactant on the left. Disconnected reactant patterns may be grouped into a single component using component level grouping in SMIRKS denoted by parentheses.

Once a reaction has been defined, and starting materials have been associated with each of the reactant patterns, chemical transformations can be applied to combinations of starting materials. To achieve a chemically reasonable output attention should be given to the mode of valence (or hydrogen count) correction that matches the reaction. The OELibraryGen class has three possible modes of valence correction: explicit hydrogen, implicit hydrogen, and automatic. The default mode for valence correction and SMIRKS interpretation is to emulate the Daylight Reaction Toolkit. Hydrogen counts are adjusted using explicit hydrogens in SMIRKS patterns. Reactions are carried out using explicit hydrogens, and valence correction occurs when explicit hydrogens are added or deleted as defined by a reaction. The following example demonstrates strict SMIRKS and explicit hydrogen handling.

#!/usr/bin/env python
# ch28-2.py

from openeye.oechem import *

libgen = OELibraryGen("[O:1]=[C:2][Cl:3].[N:4][H:5]>>[O:1]=[C:2][N:4]")

mol = OEGraphMol()
OEParseSmiles(mol, "CC(=O)Cl")
libgen.SetStartingMaterial(mol, 0)

mol.Clear()
OEParseSmiles(mol, "NCC")
libgen.SetStartingMaterial(mol, 1)

for product in libgen.GetProducts():
    smi = OECreateCanSmiString(product)
    print smi

In the amide bond forming reaction a hydrogen atom attached to the nitrogen in the amine pattern is explicitly deleted when forming the product. When executed, the example generates two products in total. Each product corresponds to the equivalent protons attached to the amine. If a unique set of products is desired, canonical smiles strings may be stored for verification that products generated are indeed unique.

The following demonstrates how the same basic reaction given in the previous example can be carried out using the implicit hydrogen correction mode. Notice that no explicit hydrogens appear in the reaction. Instead, the SMARTS implicit hydrogen count operator appears on the right hand side of the reaction and is used to assign the implicit hydrogen count of the product nitrogen.

#!/usr/bin/env python
# ch28-3.py

from openeye.oechem import *

libgen = OELibraryGen("[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][Nh1:4]")
libgen.SetExplicitHydrogens(False)

mol = OEGraphMol()
OEParseSmiles(mol, "CC(=O)Cl")
libgen.SetStartingMaterial(mol, 0)

mol.Clear()
OEParseSmiles(mol, "NCC")
libgen.SetStartingMaterial(mol, 1)

for product in libgen.GetProducts():
    smi = OECreateCanSmiString(product)
    print smi

The reaction is written to work with implicit hydrogens (using the lowercase 'h' primitive), and the OELibraryGen instance is set to work in implicit hydrogen mode using the OELibraryGen::SetExplicitHydrogens method.

The final example demonstrates automatic valence correction. In implicit hydrogen mode (set using the OELibraryGen::SetExplicitHydrogens method) automatic valence correction attempts to add or subtract implicit hydrogens in order to retain the valence state observed in the starting materials. Before chemical transformations commence, the valence state for each reacting atom is recorded. After the transform operations are complete the implicit hydrogen count is adjusted to match the beginning state of the reacting atoms. Changes in formal charge are taken into account during the valence correction.

#!/usr/bin/env python
# ch28-4.py

from openeye.oechem import *

libgen = OELibraryGen("[O:1]=[C:2][Cl:3].[N:4]>>[O:1]=[C:2][N:4]")
libgen.SetExplicitHydrogens(False)
libgen.SetValenceCorrection(True)

mol = OEGraphMol()
OEParseSmiles(mol, "CC(=O)Cl")
libgen.SetStartingMaterial(mol, 0)

mol.Clear()
OEParseSmiles(mol, "NCC")
libgen.SetStartingMaterial(mol, 1)

for product in libgen.GetProducts():
    smi = OECreateCanSmiString(product)
    print smi

In general, automatic valence correction is a convenience that allows straightforward reactions to be written in simplified manner and reduces the onus of valence state bookkeeping. Reactions that alter the preferred valence state of an atom, oxidation for example, may not be automatically correctable.

				OEChem - Python Theory Manual Version 1.3.1

				OEChem - Python Theory Manual Version 1.3.1