3.6 Format control from the command line

Using the methods outlined above, it is possible to allow the stream format to be controlled from the command line. OEChem's oemolstreams control the format by interpreting the input and output file names.

The following is a simple example of using command-line arguments to allow OEChem programs to support many file formats at run-time.

#include "oechem.h"
#include <iostream>

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main(int argc,char *argv[])
{
  if(argc != 3)
    return 1;
  oemolistream ims(argv[1]);
  oemolostream oms(argv[2]);

  if (!ims)
  {
    cerr << "Error: Unable to read " << argv[1] << endl;
    return 1;
  }
  if (!oms)
  {
    cerr << "Error: Unable to create " << argv[2] << endl;
    return 1;
  }

  OEMol mol;
  while (OEReadMolecule(ims,mol))
    OEWriteMolecule(oms,mol);

  return 0;
}

The example above allows a user to specify the input and output files and formats from the command line.

For instance, if the above listing is a program called convert: prompt>convert file1.sdf file1.smi

will convert the file1.sdf from MDL's SD format to Daylight's SMILES format.

A first extension of this idea allows access to cin and cout via the "-" filename.

For instance:

prompt>convert file2.mol2 -

This command will read file2.mol2 in MOL2 format and write the molecules to cout in SMILES, the default format.

Thus if you have another program GetFromDatabase which gets molecules from a database and writes them in SMILES format, you can chain it with any OEChem program. Using your operating systems redirection commands (e.g. - Unix pipe "|" or redirect ">") you can move molecules directly from GetFromDatabase to convert without a temporary file.

prompt>GetFromDatabase | convert - file3.sdf

This convert command will take the SMILES format output from GetFromDatabase and generate an SD format file.

However, to make this concept of using cin and cout for piping data really useful, one needs to be able to control the format of cin and cout similarly to the way it would be controlled for temporary files. To facilitate this, oemolstreams interpret filenames which are ONLY format extensions to indicate format control for cin and cout.

The following example shows use of file extensions as filenames

#include "oechem.h"

#include <iostream>

using namespace OEChem;
using namespace OESystem;
using namespace std;

int main()
{
  OEMol mol;
  oemolistream ims(".sdf");
  oemolostream oms(".mol2");

  if (ims)
  {
    if (oms)
    {
      while (OEReadMolecule(ims,mol))
        OEWriteMolecule(oms,mol);
    }
    else cerr << "Error: Unable to write OEBinary to cout" << endl;
  }
  else cerr << "Error: Unable to read SD format from cin" << endl;
  return 0;
}

In the example above, the input oemolstream is cin and the format is set to SDF. The output oemolstream is cout and the format is MOL2. This is exactly equivalent to listing 4.4. However, this method is extensible to format control of cin and cout from the command line. Note: this prevents you from naming files ".mol2", ".sdf", etc.

Now, using our program convert from listing 4.7 above:

prompt>convert .smi .mol2

This command opens cin with SMILES format and open cout with MOL2 format.

Now we have complete format control of cin and cout from the command line. If we have a program GenerateStructures, which only writes MOL2 format and another program GenerateData, which only reads SD format, we can use them from the command line with any OEChem program which uses command-line arguments for file specification.

prompt> GenerateStructures | MyOEChemProgram .mol2 .sd | GenerateData

This command demonstrates how any OEChem program with command-line file specification can be used to pipe formated input and output.