27.1.3 OEChem Examples

The above explanation should go some way to explaining OEChem's decision to attach biopolymer information to each atom, rather than have container classes for residues and chains (and presumably connected components, NMR models, etc...). The OEResidue class is therefore an additional set of fields that may be associated with an atom. It does not denote or prescribe an amino or nucleic acid but instead stores atom-specific data such as atom serial number, b-factor and occupancy, in addition to residue information, chain information, fragment information, NMR model information, etc...

The residue information associated with an atom can be set with the OEAtomSetResidue function, and is retrieved with the OEAtomGetResidue function. The PDB and Macromodel file format readers parse this information from the input file format. Additionally, OEChem allows residue information to be perceived directly from the connection table using the OEPerceiveResidues function.

For many algorithms processing biomolecules, it is convenient to maintain the atoms of the OEMolBase in sorted order to group atoms in the same residue next to one another, and residues in the same chain sequentially. This can be done conveniently in OEChem using the OEPDBOrderAtoms function. Note, that OEPercieveResidues calls OEPDBOrderAtoms automatically.

A common idiom is therefore the following code snippet:

void MyPrepareProtein(OEMolBase &mol)
{
  if (OEHasResidues(mol))
  {
    OEPDBOrderAtoms(mol);
  }
  else OEPerceiveResidues(mol);
}

As a teaching example, the following code demonstrates one way of reporting the number of different chains in an OEMolBase.

unsigned int MyCountChains1(OEMolBase &mol)
{
  unsigned int result = 0;
  OEIter<OEAtomBase> atom;
  bool first = true;
  char prev = '\0';
  char chain;

  for (atom=mol.GetAtoms(); atom; ++atom)
  {
    OEResidue res = OEAtomGetResidue(atom);
    chain = res.GetChainID();
    if (first || (chain != prev))
      result++;
    first = false;
    prev = chain;
  }
  return result;
}

A slightly improved version would be to use OEChem's SameChain function.

unsigned int MyCountChains2(OEMolBase &mol)
{
  OEIter<OEAtomBase> atom;
  unsigned int result = 0;
  OEResidue prev;

  for (atom=mol.GetAtoms(); atom; ++atom)
  {
    OEResidue res = OEAtomGetResidue(atom);
    if (prev && OESameChain(res,prev))
      continue;

    prev = res;
    result++;
  }
  return result;
}

Clearly, a MyCountResidues function would look almost identical but use the OESameResidue function instead of OESameChain. The slightly more complicated example below, reports the number of residues in each chain.

void MyReportResidues1(OEMolBase &mol)
{
  OEIter<OEAtomBase> chain;
  OEResidue prevchain;

  for (chain=mol.GetAtoms(); chain; ++chain)
  {
    OEResidue chainres = OEAtomGetResidue(chain);
    if (!prevchain || !OESameChain(chainres,prevchain))
    {
      OEIter<OEAtomBase> residue;
      OEResidue prevres;

      unsigned int count = 0;
      for (residue=mol.GetAtoms(); residue; ++residue)
      {
        OEResidue resres = OEAtomGetResidue(residue);
        if (OESameChain(resres,chainres))
          if (!prevres || !OESameChain(resres,prevres))
          {
            prevres = resres;
            count++;
          }
      }

      cout << count << " residues in chain "
           << chainres.GetChainID() << endl;
      prevchain = chainres;
    }
  }
}

Whilst the above example contains the doubly nested loops that some structural biologists like to see, the same output can be generated even more efficiently by:

void MyReportResidues2(OEMolBase &mol)
{
  OEIter<OEAtomBase> atom;
  unsigned int count = 0;
  OEResidue residue;
  OEResidue chain;

  for (atom=mol.GetAtoms(); atom; ++atom)
  {
    OEResidue res = OEAtomGetResidue(atom);

    if (!chain)
      chain = res;
    else if (!OESameChain(res,chain))
    {
      cout << count << " residues in chain "
           << chain.GetChainID() << endl;
      chain = res;
      count = 0;
    }

    if (!residue || !OESameResidue(res,residue))
    {
      residue = res;
      count++;
    }
  }

  if (count)
      cout << count << " residues in chain "
           << chain.GetChainID() << endl;
}

Of course, just because OEChem uses an extremely advanced representation of biopolymers, there's absolutely nothing to prevent a user slurping this information into a FORTRAN common block or whichever representation best suits their way of thinking about the problem.