1.1 OEChem and Informatics

Chemical information processing is the science of representing molecules in computers. Hence the fundamental ``object'' or data structure within a chemical information system is that of the molecule, its atoms and its bonds.

A significant problem encountered in such systems is that different applications place differing requirements or constraints on how a molecule is represented. In protein biochemistry, molecules are divided into amino acid residues with specific atom naming and conformational information such as alpha helix or beta sheet. In inorganic chemistry requires isotopic and co-ordination information on each atom which complex chiralities. One possible solution is to prescribe a single data structure that encodes all of the potential information required of an atom. However, such an approach suffers from the fact that ``you can't please all of the chemists, all of the time.'' A requirement in the field of chemical databases and substructure searching is that a molecule representation be as compact as possible, to allow as much information to be held in memory as possible and maximize the performance of processing databases from disk.

The alternative, even complementary, approach taken by OEChem is to concentrate on the similarities in molecule processing, rather than on the differences. Using the concepts of ``polymorphism'' from object oriented programming, it possible to write algorithms that are mostly independent of the actual data structure used to store a molecule.