4.6 Compressed Molecule Input and Output

For any of the molecular file formats supported by OEChem it is often convenient to read and write compressed files or strings. Molecule streams support gzipped input and output via the zlib library. The ".gz" suffix on any filename used to open a stream is recognized and the stream is read or written in compressed format. This mechanism does not interfere with the format perception. For instance, "fn.sdf.gz" is recognized as a gzipped file with MDL's SD format.

The following example demonstrates use of compressed input and output

#!/usr/bin/env python
# ch4-6.py
from openeye.oechem import *

ifs = oemolistream()
ofs = oemolostream()

if (ifs.open("drugs.sdf.gz") == 1):
    if (ofs.open("drugs.oeb.gz") == 1):
        for mol in ifs.GetOEGraphMols():
            OEWriteMolecule(ofs, mol)
    else:
        sys.stderr.write("Unable to open output file\n")
else:
    sys.stderr.write("Unable to open input file\n")

The example above converts all of the molecules in a gzipped SD format file into an OEBinary version 2 format gzipped file.