12.5 Identifying Connected Components

To aid in splitting molecules into discrete connected components, for example to separate a parent compound from its salt, or a ligand from a protein, OEChem provides the function OEDetermineComponents. This function arbitrarily assigns an integer index, starting from one, to each disconnected molecule in the OEMolBase. The user provides a pointer to an array of at least OEMolBase::GetMaxAtomIdx unsigned ints, which is populated by the function. On return this provides a mapping from each atom's index, obtained by OEAtomBase::GetIdx, to its component index. Unused atom indices are mapped to zero. The function itself also returns the total number of components found, i.e. the maximum part index stored in the array.

The following provides a short example of how to use this function.

void MyReportParts(const OEMolBase &mol)
{
  unsigned int *parts;
  unsigned int count;
  unsigned int size;

  size = (unsigned int)(mol.GetMaxAtomIdx()*sizeof(unsigned int));
  parts = (unsigned int*)malloc(size);

  count = OEDetermineComponents(mol,parts);
  printf("The molecule has %d components\n",count);

  OEIter<OEAtomBase> atom;
  for (atom=mol.GetAtoms; atom; ++atom)
    printf("atom %d is in part %d\n",atom->GetIdx(),part[atom->GetIdx()]);

  free(parts);
}