CDK Chemoinformatics

The cdk library provides a variety of basic cheminformatics functions. It is used by a variety of software applications including JChemPaint and the free open-source workflow platform KNIME.

Atom type perception is a core feature of the CDK. Previous versions of the API represented stereochemistry in different ways. This made interconversion between file formats difficult.

Substructure Matching

A core cdk cheminformatics function is substructure matching. This allows a query molecule to be compared to a set of target molecules in order to identify which have the desired substructure. The resulting data can then be used to implement higher level cheminformatics functionality such as ring perception or pharmacophore perception.

The CDK contains an extensive suite of tests of various kinds. These include unit tests, which test core APIs, and integrated tests that test multiple algorithms. In addition, the cdk has extensive documentation for its features.

cdk chemistry library provides APIs for representing chemical structures, including 2D renderings and molecular-data tables. It also supports the reading and writing of several file formats, including SMILES. In addition, it provides a standard representation for stereochemistry.

Atom type perception is a fundamental part of the CDK library, describing the chemical features of an atom such as its number of neighbors, possible formal charges, (approximate) hybridization, and electron distribution over orbitals. Previous versions of the CDK implemented atom typing as part of different algorithms, leading to duplicated and sometimes divergent typing code. This has been unified in the current version.

Pattern Matching

KNIME-CDK is an open source cheminformatics plugin that provides functionalities for molecule conversion to and from common formats, substructure searching and generation of signatures, fingerprints and molecular properties. It is based on the community driven CDK library and extends the functionality of the KNIME workflow platform with complimentary cheminformatics nodes.

Substructure matching is one of the most fundamental operations in chemoinformatics. It is a prerequisite for many other functions like fingerprint and descriptor generation and can be used to determine similarity between structures. CDK v2.0 has improved the speed of these searches significantly by using function pattern matching rather than path or MACCS-like keys.

In addition to matching, CDK contains other chemoinformatics functions such as atom typing and a set of heuristics for predicting aromaticity. Aromaticity can be determined by comparing a series of features and properties such as hydrogen bonding strength, van der Waals distances and dipole moments. This calculation requires a number of different algorithms which are implemented in the CDK.


The Chemistry Development Kit (CDK) is a library in the programming language Java for chemoinformatics. It includes a variety of tools for analysing chemical structures. These tools can handle different types of chemical formulas and can provide isotope containers, atomic patterns, and molecular fragments. CDK is open source and is licensed under the GNU Lesser General Public License (LGPL).

The CDK provides data structures that represent chemical concepts and methods to manipulate these structures and perform computations on them. It also supports a variety of cheminformatics algorithms, including ring perception, aromaticity prediction, and fingerprinting. Its IChemObjectBuilder classes enable the construction of higher level constructs such as sets and reactions, and are based on a formalized representation of atoms and bonds in chemistry.

CDK is a community driven project and its success has demonstrated that it can thrive over extended periods of time. Its continued development has resulted in a high-quality, performant library for chemoinformatics. The project has adopted a rigorous code review system that requires any functionality-changing patch to be reviewed by one independent developer for the development branch and two reviewers for the stable branches.


The Chemistry Development Kit (CDK) is a Java library providing data structures to represent chemical concepts and methods to manipulate them. It implements a wide variety of cheminformatics algorithms from chemical structure canonicalization to molecular descriptor calculation and pharmacophore perception for use in drug discovery applications.

CDK also excels at handling various chemical file formats and has a Python-based API making it easy to incorporate into Python-based cheminformatics workflow systems such as Taverna and KNIME. It also provides an interface to statistical software for building Quantitative Structure-Activity Relationship (QSAR) models.

Several commercial and open source software packages rely on the cdk to perform various functions including chemical drawing, molecule search and manipulation and molecular descriptor calculation. For example, the free Molecular Graphics Toolkit from ACD/Labs and Indigo are cdk-based software that provide user-friendly graphical interfaces for chemical structure visualization and command line utilities for scripting and automation. Other cdk-based tools include Open Babel for format conversion and the PaDEL-Descriptor software for molecular descriptor computations.

Run back to the home screen

Leave a Reply

Your email address will not be published. Required fields are marked *