Cano is an open-source library for canonical SMILES and canonical layered code computation.
Canonical SMILES generated by Cano are, according to Daylight and ChemAxon terminology, unique SMILES with isomeric information, or absolute SMILES. All significant molecule features, such as isotopes, charges, radicals, stereocenters, stereogroups, cis-trans bonds, and aromaticity, are encoded into SMILES in a canonical form. A canonical SMILES string defines the molecule independently of any particular representation (atom renumbering, stereogroup renumbering, explicit/implicit hydrogens). So, the equality of the canonical SMILES of two molecules guarantees that these molecules are the same, and vice versa.
Note: 'Useless' stereocenters are ignored in both canonical SMILES and canonical layered codes generated by Cano. Stereocenter is considered useless when it doesn't give any information for distinguishing stereoisomers. Please see examples below.
Canonical layered code support in Cano is preliminary. The following layers are included:
Indigo Layered Code is somewhat like IUPAC InChI code, but is not 100% compliant with it. One notable difference between Indigo Layered Code and IUPAC InChI is that Cano does not mark stereocenter as '?' in the tetrahedral stereochemistry layer if the stereocenter is not specified. A good result from this decision is that Cano is able to construct same layered code for molecules where 'useless' stereocenters are present. Please see examples below.
Cano is written in portable C++ and supports the Linux, Windows and Mac OS X operating systems. No third- party components are used.
Cano exposes C interface to applications. Java wrapper is available for all supported platforms. For Windows, there is also a Cano.Net C# language wrapper. See .NET Library Reference for details.
A command-line utility based on Cano is provided. See Command-line Reference for details.
All operation of Cano is thread-safe, and so there is no problem to use it in multi-threaded applications.
Note: Query features are not supported for canonicalization.
Almost all features of the original Daylight SMILES format are supported, including:
The only features that are not supported are:
The following ChemAxon SMILES extensions are supported:
MDL (Symyx) Molfiles are supported. Almost all format features are supported, including:
The only features that are not supported are:
AROMATICITY), tetrahedral stereocenters
(TETRAHEDRAL), and cis-trans bonds information (CISTRANS).| Input SMILES | Parameters | Resulting SMILES |
|---|---|---|
| C1C=CC=CC=1 | +AROMATICITY |
c1ccccc1 |
| C1C=CC=CC=1 | -AROMATICITY |
C1=CC=CC=C1 |
| C([H])1C([H])=C([H])C([H])=C([H])C([H])=1 | +AROMATICITY |
c1ccccc1 |
| C([H])1C([H])=C([H])C([H])=C([H])C([H])=1 | -AROMATICITY |
C1=CC=CC=C1 |
| N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O |a:8| | +TETRAHEDRAL |
CN(CC(=O)N1C(CSC1C1CC2CC1CC2)C(=O)N[C@H](C)CCO)CC |a:20| |
| N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O |a:8| | -TETRAHEDRAL |
CN(CC(=O)N1C(CSC1C1CC2CC1CC2)C(=O)NC(C)CCO)CC |
| C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C | +CISTRANS |
CC(C)(C)OC(=O)NCCNC(=O)/C=C/C(O)=O |
| C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C | -CISTRANS |
CC(C)(C)OC(=O)NCCNC(=O)C=CC(O)=O |
The table below presents a comparison on canonical layered codes given by Cano with InChI codes obtained by IUPAC software.
| Input SMILES | Results |
|---|---|
| C1C=CC=CC=1 | Indigo Layered Code: Indigo=1.1/C6H6/c1-2-4-6-5-3-1/h1-6H IUPAC InChI: InChI=1S/C6H6/c1-2-4-6-5-3-1/h1-6H |
| N1(C(SCC1C(=O)N[C@@H](CCO)C)C1CC2CCC1C2)C(CN(CC)C)=O | Indigo Layered Code: Indigo=1.1/C20H35N3O3S/c1-4-22(3)11-18(25)23-17(19(26)21-13(2)7-8-24)12-27-20(23)16-10-14-5-6-15(16)9-14/h13-17,20-21,24H,4-12H2,1-3H3/t13-/m1/s1 IUPAC InChI: InChI=1S/C20H35N3O3S/c1-4-22(3)11-18(25)23-17(19(26)21-13(2)7-8-24)12-27-20(23)16-10-14-5-6-15(16)9-14/h13-17,20,24H,4-12H2,1-3H3,(H,21,26)/t13-,14?,15?,16?,17?,20?/m1/s1 |
| C(NCCNC(=O)/C=C/C(O)=O)(=O)OC(C)(C)C | Indigo Layered Code: Indigo=1.1/C11H18N2O5/c1-11(2,3)18-10(17)13-7-6-12-8(14)4-5-9(15)16/h4-5,12-13,15H,6-7H2,1-3H3/b5-4+ IUPAC InChI: InChI=1S/C11H18N2O5/c1-11(2,3)18-10(17)13-7-6-12-8(14)4-5-9(15)16/h4-5H,6-7H2,1-3H3,(H,12,14)(H,13,17)(H,15,16)/b5-4+ |
You can see that the gross formula and connection layers of Indigo Layered Codes match the corresponding layers of IUPAC InChI, and so do cis-trans layers.
From the pictures below, you can see that all three molecules specify the same mixture. This is represented in the fact that Cano gives identical SMILES and layered codes for all three molecules.
| Canonical SMILES: C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O Indigo Layered Code: Indigo=1.1/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1 IUPAC InChI: InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14-/m1/s1 |
|
| Canonical SMILES: C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O Indigo Layered Code: Indigo=1.1/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1 IUPAC InChI: InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14-/m1/s1 |
|
| Canonical SMILES: C[C@@H]1CC(C(=O)N1)1N2CC(C)3CN1CC(C)(C2)C3=O Indigo Layered Code: Indigo=1.1/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9,15H,4-8H2,1-3H3/t9-/m1/s1 IUPAC InChI: InChI=1S/C14H21N3O2/c1-9-4-14(11(19)15-9)16-5-12(2)6-17(14)8-13(3,7-16)10(12)18/h9H,4-8H2,1-3H3,(H,15,19)/t9-,12?,13?,14?/m1/s1 |
Also, you can see that the IUPAC InChI gives slightly different stereocenter layer in the third molecule, than in the first two molecules.
Look at the Downloads page for the installation package suitable for your system.
See also Java library reference, .NET Library Reference, and Command-line Reference.
Copyright © 2009-2010 SciTouch LLC
This program is free software: You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; version 3 of the License.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If you did not not, please see http://www.gnu.org/licenses/.
If GPL-licensed Cano does not fit your needs, please contact us at info@scitouch.net to discuss the purchase of a commercial license. You may need the commercial license if you want to: