25.1.3 OEChem Examples

The above explanation should go some way to explaining OEChem's decision to attach biopolymer information to each atom, rather than have container classes for residues and chains (and presumably connected components, NMR models, etc...). The OEResidue class is therefore an additional set of fields that may be associated with an atom. It does not denote or prescribe an amino or nucleic acid but instead stores atom-specific data such as atom serial number, b-factor and occupancy, in addition to residue information, chain information, fragment information, NMR model information, etc...

The residue information associated with an atom can be set with the OEAtomSetResidue function, and is retrieved with the OEAtomGetResidue function. The PDB and Macromodel file format readers parse this information from the input file format. Additionally, OEChem allows residue information to be perceived directly from the connection table using the OEPerceiveResidues function.

For many algorithms processing biomolecules, it is convenient to maintain the atoms of the OEMolBase in sorted order to group atoms in the same residue next to one another, and residues in the same chain sequentially. This can be done conveniently in OEChem using the OEPDBOrderAtoms function. Note, that OEPercieveResidues calls OEPDBOrderAtoms automatically.

A common idiom is therefore the following code snippet:

def MyPrepareProtein(mol):
    if OEHasResidues(mol):
        OEPDBOrderAtoms(mol)
    else:
        OEPerceiveResidues(mol)

As a teaching example, the following code demonstrates one way of reporting the number of different chains in an OEMolBase.

def MyCountChains(mol):
    result = 0
    first = True
    prev = None
    for atom in mol.GetAtoms():
        res = OEAtomGetResidue(atom)
        chain = res.GetChainID()
        if first or chain != prev:
            result += 1
        first = False
        prev = chain
    return result

A slightly improved version would be to use OEChem's OESameChain function.

def MyCountChains2(mol):
    result = 0
    prev = None
    for atom in mol.GetAtoms():
        res = OEAtomGetResidue(atom)
        if prev and OESameChain(res,prev):
            continue
        prev = res
        result += 1
    return result

Clearly, a MyCountResidues function would look almost identical but use the OESameResidue function instead of OESameChain. The slightly more complicated example below, reports the number of residues in each chain.

def MyReportResidues1(mol):
    prevchain = None
    for chain in mol.GetAtoms():
        chainres = OEAtomGetResidue(chain)
        if not prevchain or not OESameChain(chainres, prevchain):
            prevres = None
            count = 0
            for residue in mol.GetAtoms():
                resres = OEAtomGetResidue(residue)
                if OESameChain(resres, chainres):
                    if not prevres or not OESameChain(resres,prevres):
                        prevres = resres
                        count += 1
            print count,"residues in chain",chainres.GetChainID()
            prevchain = chainres

Whilst the above example contains the doubly nested loops that some structural biologists like to see, the same output can be generated even more efficiently by:

def MyReportResidues2(mol):
    count = 0
    residue = None
    chain = None

    for atom in mol.GetAtoms():
        res = OEAtomGetResidue(atom)
        if not chain:
            chain = res
        elif not OESameChain(res,chain):
            print count,"residues in chain",chain.GetChainID()
            count = 0

        if not residue or not OESameResidue(res, residue):
            residue = res
            count += 1

    if count>0:
        print count,"residues in chain",chain.GetChainID()

Of course, just because OEChem uses an extremely advanced representation of biopolymers, there's absolutely nothing to prevent a user slurping this information into a FORTRAN common block or whichever representation best suits their way of thinking about the problem.