4.7 Format control from the command line

Using the methods outlined above, it is possible to allow the stream format to be controlled from the command line. OEChem's oemolstreams control the format by interpreting the input and output file names.

The following is a simple example of using command-line arguments to allow OEChem programs to support many file formats at run-time.

#!/usr/bin/env python
#ch4-7.py

from openeye.oechem import *
import sys

if len(sys.argv)!=3:
    print "usage: ch4-7.py <infile> <outfile>"
    sys.exit()

ifs = oemolistream(sys.argv[1])
ofs = oemolostream(sys.argv[2])

for mol in ifs.GetOEMols():
    OEWriteMolecule(ofs, mol)

The example above allows a user to specify the input and output files and formats from the command line.

For instance, if the above listing is a program called foo.py: prompt>foo.py file1.sdf file1.smi

will convert file1.sdf from MDL's SD format to Daylight's SMILES format.

A first extension of this idea allows access to stdin and stdout via the "-" filename.

For instance:

prompt>foo.py file2.mol2 -

This command will read file2.mol2 in MOL2 format and write the molecules to stdout in SMILES, the default format.

Thus if you have another program GetFromDatabase which gets molecules from a database and writes them in SMILES format, you can chain it with any OEChem program. Using your operating system's redirection commands (e.g. - Unix pipe "|" or redirect ">") you can move molecules directly from GetFromDatabase to foo.py without a temporary file.

prompt>GetFromDatabase | foo.py - file3.sdf

This convert command will take the SMILES format output from GetFromDatabase, send it to foo.py on stdin with the default format of OEFormat_SMI and generate an SD format file.

However, to make this concept of using stdin and stdout for piping data really useful, one needs to be able to control the format of stdin and stdout similarly to the way it would be controlled for temporary files. To facilitate this, oemolstreams interpret filenames which are ONLY format extensions to indicate format control for stdin and stdout.

Now, using our program foo.py from listing 4.7 above:

prompt>foo.py .smi .mol2

This command opens stdin with SMILES format and opens stdout with MOL2 format.

Now we have complete format control of stdin and stdout from the command line. If we have a program GenerateStructures, which only writes MOL2 format and another program GenerateData, which only reads SD format, we can use them from the command line with any OEChem program which uses command-line arguments for file specification.

prompt> GenerateStructures | foo.py .mol2 .sd | GenerateData

This command demonstrates how any OEChem program with command-line file specification can be used to pipe formatted input and output.