Skip to main content
Try Wikispaces Classroom now.
Brand new from Wikispaces.
ICEP: Indiana Cheminformatics Education Portal
Pages and Files
2D chemical database searching systems
3D visualization, alignment, docking and scoring
Characterizing 2D structures with descriptors and fingerprints
Chemical structures on the web and in the scholarly literature
Cluster Analysis and Diversity Analysis
Data mining of chemical & biological information
MOOCs and Learning Materials Relevant to Cheminformatics
Quantitative Structure-Activity Relationships (QSAR) and Predictive Models
Representation and characterization of 3D structures
Representation of 2D structures on computer
Add "All Pages"
are processes that convert one set of chemical compounds into another. There are many different classes of reaction - for example, displacement reactions and acid-base reactions. Of particular interest in drug discovery are
. There are many kinds of information that can be associated with a reaction, including:
involved in the reaction
Any solvents involved in the reaction
Reaction conditions (often a mix of numeric and textual information)
Note that the reaction might be generic (i.e. applicable to many compounds, rather than just one specific compound) or specific, and a reaction equation might just supply the simplified start and end points to a more detailed reaction mechanism. You can find information on many simple organic reactions on
or even better in an organic chemistry textbook such as
Morrison and Boyd
. Here is a simple example reaction equation and mechanism: the
Reaction equations are generally represented on paper like mathematical equations, with a set of reactants on the left and products on the right. By convention, the "+" sign is used to separate reactants from each other or products from each other, and an arrow separates the reactants from the products and indicates reaction direction.
From a cheminformatics perspective, the most important concern is representing the individual chemical structures, and representing the
, i.e. mapping of reagents to products (or the reaction mechanism if stored). All other information requires trivial kinds of representation (e.g. text, numeric, etc). Note that the transformation represents a subset of the information in the reaction mechanism, and it is quite possible to define transformations that have no corresponding valid reaction mechanism. However, when representing information on computer we often represent a more detailed level of information about the transformation than we would see on paper (e.g. explicit structures, mapping of one structure to another). In a database, we might store the transformation, the reaction mechanism, or both.
We of course already know how to represent the 2D structures, with SMILES, InChI, MDL MOL files and so on. However, we need a way to map reactants to products and thus represent the whole reaction.
One way to do this is with
Reaction SMILES and SMIRKS
extends SMILES to allow identification of products and reagents, and the mapping between them.
goes further and allows the definition of transformations that do not contain complete structures, but rather fragments represented in a SMARTS-like way. Connection table formats can be modified to incorporate a variety of reaction information - for example an extension of the
MDL MOL/SD File
called an RXN file defines structures as reactants or products. In the same way that several MOL files can be contained in an SD file, several RXN files can be contained in an RD file.
Note there is currently no equivalent of Reaction SMILES or SMIRKS for InChIs.
Historically, books have been published that index and describe organic reactions. The most famous is the
Beilstein Handbook of Organic Chemistry
. This slowly migrated into the
, now one of the largest repositories of reactions. Other databases include
. There are now some free databases such as
The Chemical Thesaurus
It's important to recognize two things about reaction databases: first, they can only exist with a source of reactions. Usually this source is the scholarly literature, from which reactions have to be manually extracted. Second, reaction databases are distinct from synthesis planning systems such as
which attempt to assist the chemist with reaction planning (through rules, etc) but don't necessarily sit on top of a comprehensive, detailed reaction database.
As with simple structure databases, bespoke systems and interfaces developed, often incorporating searching of both compounds and reactions. e.g.
. These systems are developed particularly for use by chemistry librarians and synthetic chemists.
To implement a reaction database, just as we can store a SMILES in a database text field, so we can also store a Reaction SMILES or a SMIRKS in a text field. Most chemistry database cartridges can work with these. Here are the ones that do handle reactions:
As well as the straightforward structure, substructure and similarity searches, we also want to be able to carry out a few more specialized forms of searching on reaction databases. In particular, we often want to limit searching to just products or reagents - for example, find all the reactions that have an exact match to the query in the products. This quickly leads to more advanced kinds of searching, for example:
Finding reactions that contain particular substructures in the reagents and products
Finding different synthesis routes for a named reaction (e.g. find all Dies-Alder reactions)
Finding reactions which are similar to a query reaction
Finding chains of reactions that can be used to create a particular structure from a set of starting materials
And, of course, these queries may be combined with text / numeric searching. So searching reaction databases can get quite complicated. The problem of finding chains of reactions is particularly interesting, as these chains are really paths through a graph.
Type in the content of your page here.
help on how to format text
Turn off "Getting Started"