The above explanation should go some way to explaining OEChem's decision to attach biopolymer information to each atom, rather than have container classes for residues and chains (and presumably connected components, NMR models, etc...). The OEResidue class is therefore an additional set of fields that may be associated with an atom. It does not denote or prescribe an amino or nucleic acid but instead stores atom-specific data such as atom serial number, b-factor and occupancy, in addition to residue information, chain information, fragment information, NMR model information, etc...
The residue information associated with an atom can be set with the OEAtomSetResidue function, and is retrieved with the OEAtomGetResidue function. The PDB and Macromodel file format readers parse this information from the input file format. Additionally, OEChem allows residue information to be perceived directly from the connection table using the OEPerceiveResidues function.
For many algorithms processing biomolecules, it is convenient to maintain the atoms of the OEMolBase in sorted order to group atoms in the same residue next to one another, and residues in the same chain sequentially. This can be done conveniently in OEChem using the OEPDBOrderAtoms function. Note, that OEPercieveResidues calls OEPDBOrderAtoms automatically.
A common idiom is therefore the following code snippet:
def MyPrepareProtein(mol): if OEHasResidues(mol): OEPDBOrderAtoms(mol) else: OEPerceiveResidues(mol)
As a teaching example, the following code demonstrates one way of reporting the number of different chains in an OEMolBase.
def MyCountChains(mol): result = 0 first = True prev = None for atom in mol.GetAtoms(): res = OEAtomGetResidue(atom) chain = res.GetChainID() if first or chain != prev: result += 1 first = False prev = chain return result
A slightly improved version would be to use OEChem's OESameChain function.
def MyCountChains2(mol): result = 0 prev = None for atom in mol.GetAtoms(): res = OEAtomGetResidue(atom) if prev and OESameChain(res,prev): continue prev = res result += 1 return result
Clearly, a MyCountResidues function would look almost identical but use the OESameResidue function instead of OESameChain. The slightly more complicated example below, reports the number of residues in each chain.
def MyReportResidues1(mol): prevchain = None for chain in mol.GetAtoms(): chainres = OEAtomGetResidue(chain) if not prevchain or not OESameChain(chainres, prevchain): prevres = None count = 0 for residue in mol.GetAtoms(): resres = OEAtomGetResidue(residue) if OESameChain(resres, chainres): if not prevres or not OESameChain(resres,prevres): prevres = resres count += 1 print count,"residues in chain",chainres.GetChainID() prevchain = chainres
Whilst the above example contains the doubly nested loops that some structural biologists like to see, the same output can be generated even more efficiently by:
def MyReportResidues2(mol): count = 0 residue = None chain = None for atom in mol.GetAtoms(): res = OEAtomGetResidue(atom) if not chain: chain = res elif not OESameChain(res,chain): print count,"residues in chain",chain.GetChainID() count = 0 if not residue or not OESameResidue(res, residue): residue = res count += 1 if count>0: print count,"residues in chain",chain.GetChainID()
Of course, just because OEChem uses an extremely advanced representation of biopolymers, there's absolutely nothing to prevent a user slurping this information into a FORTRAN common block or whichever representation best suits their way of thinking about the problem.