4.3 Molecular File Formats

In addition to SMILES strings, OEChem is able to read numerous other molecular file formats, including MDL SD files, Tripos Mol2 files and PDB files. The format of an input file or stream may be associated with a oemolstream using the SetFormat method, and may be retrieved with GetFormat. These take (or return) and integer constant defined in C++. The following table shows the constants and the corresponding file formats supported by OEChem. A value of OEFormat_UNDEFINED (zero) means that there is no file format associated with the molstream. Note that the default format associated with an oemolstream is OEFormat_SMI.

File Format Description Read? Write?
OEFormat_OEB New Style OpenEye OEBinary Yes Yes
OEFormat_BIN Old Style OEBinary Yes Yes
OEFormat_CAN Canonical SMILES Yes Yes
OEFormat_FASTA FASTA protein sequence Yes Yes
OEFormat_ISM Isomeric SMILES Yes Yes
OEFormat_MDL MDL Mol File Yes Yes
OEFormat_MF Molecular Formula (Hill order) No Yes
OEFormat_MOL2 Tripos Sybyl mol2 file Yes Yes
OEFormat_MOL2H Sybyl mol2 with explicit hydrogens Yes Yes
OEFormat_MOPAC MOPAC file format(s) Yes Yes
OEFormat_PDB Protein Databank PDB file Yes Yes
OEFormat_SDF MDL SD File Yes Yes
OEFormat_SMI Absolute SMILES Yes Yes
OEFormat_XYZ XMol XYZ format Yes Yes

The following example shows how to use oemolstreams to convert MDL SD files into Tripos Mol2 files.

#!/usr/bin/env python
# ch4-4.py
from openeye.oechem import *

ifs = oemolistream()
ifs.open()
ofs = oemolostream()
ofs.open()

ifs.SetFormat(OEFormat_SDF)
ofs.SetFormat(OEFormat_MOL2)

for mol in ifs.GetOEMols():
    OEWriteMolecule(ofs, mol)

In general, the SetFormat method should only be called on an oemolistream before the first connection table is read.