24.2 Extensions to Daylight SMILES

The OEChem SMILES parsers support several minor extensions to Daylight syntax. Each of these extensions and its motivations are listed below.

Quadruple Bond
In addition to ``-'', ``='' and ``#'' for specifying single, double and triple bonds respectively, OEChem also supports ``$'' for specifying quadruple bonds. An example would be octachlorodirhenate (III), which is written as ``[Rh-](Cl)(Cl)(Cl)(Cl)$[Rh-](Cl)(Cl)(Cl)Cl''.

Unquoted and Additional Elements
In addition to the standard Daylight unquoted elements, B, C, N, O, F, P, S, Cl, Br and I, OEChem's SMILES readers also allow H, D and T to specify hydrogen, deuterium and tritium. Additionally, to support Syracuse SMILES, ``CL'' and ``BR'' are considered ``Cl'' and ``Br''. The periodic table is also extended from 102 to 109 elements, i.e. [Sg] for Seaborgium, with the addition of [D] and [T] representing [2H] and [3H] respectively.

OEChem may support ``Na'', ``Li'' and ``K'' as unquoted elements to support Syracuse SMILES at some point in the future.

Aromatic Telurium
In order to support OpenEye's aromaticity model, which allows Tellurium to be aromatic, the SMILES parser has been extended to support ``[te]'', such as in tellurophene, ``[te]1cccc1'', which follows in the sequence furan (``o1cccc1''), thiophene (``s1cccc1'') and selenophene (``[se]1cccc1'').

Atom Maps in Molecules
Traditionally, SMILES atom maps, i.e. [Pb:1], are only ever used and specified in reaction molecules, [Pb:1]»[Au:1]. However, OEChem extends this notion to allow atom maps to be used in discrete molecules. Hence, both [1*] and [*:1] may be used to mark significant sites in a molecule.

RGroup Attachment Points
As a short hand to support specifying templates for combinatorial libraries, and to support existing Cactus and JChem/Marvin usage, OEChem allows ``[R2]'' to be used as short-hand for [*:2]. For enquiring minds, the SMILES [R2:3] is interpreted as [*:3] or [R3], with the last specification taking priority.

External Bond Attachment Points
OEChem SMILES also allows specification of attachment points as external closures. These have the syntax, ampersand followed by a ring closure. Hence the SMILES CC&1 is equivalent to the RGroup attachment SMILES CC[R1], which is equivalent to the atom mapped molecule CC[*:1]. As with ring closures, bond orders may be specified after the ampersand and before the closure index, C&=1, and two digit closures are indicated by a '%' prefix, i.e. C&%12 or C&=%12.

One major advantage, of this notation is that the SMILES parser will fuse attachment points present in a SMILES string, in the same way as it fuses ring closures. Hence, C&1C&2.Cl&1.Br&2, when parsed produces the molecule ClCCBr. This provides a convenient method of enumerating combinatorial libraries using string concatenation.