The OEChem SMILES parsers support several minor extensions to Daylight syntax. Each of these extensions and its motivations are listed below.
[Rh-](Cl)(Cl)(Cl)(Cl)$[Rh-](Cl)(Cl)(Cl)Cl
''.
[Sg]
for Seaborgium, with the addition of [D]
and
[T]
representing [2H]
and [3H]
respectively.
OEChem may support ``Na
'', ``Li
'' and ``K
'' as
unquoted elements to support Syracuse SMILES at some point in the future.
[te]
'', such as in tellurophene, ``[te]1cccc1
'',
which follows in the sequence furan (``o1cccc1
''),
thiophene (``s1cccc1
'') and selenophene (``[se]1cccc1
'').
[Pb:1]
, are only ever
used and
specified in reaction molecules, [Pb:1]»[Au:1]
. However, OEChem
extends this notion to allow atom maps to be used in discrete molecules.
This is often useful for denoting significant sites or attachment points
in a molecule. Traditionally in SMILES, isotopes of element zero have
been used to perform role, however in OEChem both [*:1]
and
[1*]
may be used.
When external attachment points are paired within a SMILES string, they behave identically to ring closures, just using a separate index space. Hence, the SMILES ``c&1ccccc&1'' is interpreted the same way as ``c1ccccc1'', and ``C&1.C&1'' is interpreted like ``C1.C1'', i.e the SMILES ``CC''.
However, unlike ring closures, unpaired external attachment points
are allowed and are interpreted like RGroup attachment points above.
Hence, the SMILES ``CC&1
'' (on its own) is equivalent to the RGroup
attachment SMILES CC[R1]
, which is equivalent to the atom mapped
molecule CC[*:1]
.
The major advantage of these semantics, inspired by Daylight's CHUCKLES,
is that it allows convenient enumeration of combinatorial libraries
using string concatenation. For example, three components of a library
may be specified as ``C&1CCC&2
'', ``F&1
'' and
``Br&2
''. The using the same notation
``C&1CCC&2.F&1.Br&2
'' is interpreted as the reaction
product, i.e. ``FCCCCBr
''.
As with ring closures, bond orders may be specified after the ampersand
and before the closure index, ``C&=1
'', and two digit closures are
indicated by a '%
' prefix, i.e. ``C&%12
'' or
``C&=%12
''.