27.1.2 Set Theory to the Rescue

It turns out that there are simple computer science solutions to these problems. Indeed, it was Codd's exposition of these principals for "relational" database systems, that completely killed off use of hierarchical and network database management systems within a decade of their introduction.

The premise is that rather than encode a single fragile hierarchy explicitly, each leaf or record instead maintains its identity or position within the organization. This allows the representation of arbitrary sets and/or partitions of a set of records. All ligand atoms of a molecule are denoted by the fact that they have a ligand property that is true, rather than it being implicit from where it is stored (in the abstract sense of an access path). Of course, each record may possess more than one property, allowing it to simultaneously exist in more than one set. Hence, this representation is generic enough to handle arbitrary Venn diagrams. Strict hierarchies and trees are therefore just an emergent property, where some sets are strict subsets of others. This allows elements to simultaneously be organized in more than one hierarchy, or to elide or introduce new levels into the hierarchy.

The next realization is that once sets, or levels in a hierarchy, are represented by boolean properties or predicates, that there's no need to have an explicit "name" or placeholder for a set. Instead, a set or partition can be defined/named by providing a representative member, and a binary predicate that determines whether another member/record is in the same set/partition as it. For example, to represent a protein chain, it is sufficient to specify an arbitrary atom in the appropriate chain, and provide a SameChain function. Similarly, a residue can be specified by providing the exact same atom and a SameResidue function.

				OEChem - C++ Theory Manual Version 1.3.1

				OEChem - C++ Theory Manual Version 1.3.1