Representing reactions

Chemical Reactions are processes that convert one set of chemical compounds into another. There are many different classes of reaction - for example, displacement reactions and acid-base reactions. Of particular interest in drug discovery areorganic reactions . There are many kinds of information that can be associated with a reaction, including:

Note that the reaction might be generic (i.e. applicable to many compounds, rather than just one specific compound) or specific, and a reaction equation might just supply the simplified start and end points to a more detailed reaction mechanism. You can find information on many simple organic reactions on Wikipedia or even better in an organic chemistry textbook such as Morrison and Boyd . Here is a simple example reaction equation and mechanism: the Algar-Flynn-Oyamadareaction.

Reaction equations are generally represented on paper like mathematical equations, with a set of reactants on the left and products on the right. By convention, the "+" sign is used to separate reactants from each other or products from each other, and an arrow separates the reactants from the products and indicates reaction direction.

From a cheminformatics perspective, the most important concern is representing the individual chemical structures, and representing the transformation , i.e. mapping of reagents to products (or the reaction mechanism if stored). All other information requires trivial kinds of representation (e.g. text, numeric, etc). Note that the transformation represents a subset of the information in the reaction mechanism, and it is quite possible to define transformations that have no corresponding valid reaction mechanism. However, when representing information on computer we often represent a more detailed level of information about the transformation than we would see on paper (e.g. explicit structures, mapping of one structure to another). In a database, we might store the transformation, the reaction mechanism, or both.

We of course already know how to represent the 2D structures, with SMILES, InChI, MDL MOL files and so on. However, we need a way to map reactants to products and thus represent the whole reaction.

One way to do this is with Reaction SMILES and SMIRKS . Reaction SMILES extends SMILES to allow identification of products and reagents, and the mapping between them. SMIRKS goes further and allows the definition of transformations that do not contain complete structures, but rather fragments represented in a SMARTS-like way. Connection table formats can be modified to incorporate a variety of reaction information - for example an extension of the MDL MOL/SD File called an RXN file defines structures as reactants or products. In the same way that several MOL files can be contained in an SD file, several RXN files can be contained in an RD file.

Note there is currently no equivalent of Reaction SMILES or SMIRKS for InChIs.

Reaction databases

Historically, books have been published that index and describe organic reactions. The most famous is the Beilstein Handbook of Organic Chemistry . This slowly migrated into the Beilstein Database , now one of the largest repositories of reactions. Other databases include CASREACT and SPRESI . There are now some free databases such as The Chemical Thesaurus and WebReactions

It's important to recognize two things about reaction databases: first, they can only exist with a source of reactions. Usually this source is the scholarly literature, from which reactions have to be manually extracted. Second, reaction databases are distinct from synthesis planning systems such as CAESA and WODCA which attempt to assist the chemist with reaction planning (through rules, etc) but don't necessarily sit on top of a comprehensive, detailed reaction database.

As with simple structure databases, bespoke systems and interfaces developed, often incorporating searching of both compounds and reactions. e.g. DiscoveryGate and SciFinder Scholar . These systems are developed particularly for use by chemistry librarians and synthetic chemists.

To implement a reaction database, just as we can store a SMILES in a database text field, so we can also store a Reaction SMILES or a SMIRKS in a text field. Most chemistry database cartridges can work with these. Here are the ones that do handle reactions:

As well as the straightforward structure, substructure and similarity searches, we also want to be able to carry out a few more specialized forms of searching on reaction databases. In particular, we often want to limit searching to just products or reagents - for example, find all the reactions that have an exact match to the query in the products. This quickly leads to more advanced kinds of searching, for example:

  • Finding reactions that contain particular substructures in the reagents and products
  • Finding different synthesis routes for a named reaction (e.g. find all Dies-Alder reactions)
  • Finding reactions which are similar to a query reaction
  • Finding chains of reactions that can be used to create a particular structure from a set of starting materials

And, of course, these queries may be combined with text / numeric searching. So searching reaction databases can get quite complicated. The problem of finding chains of reactions is particularly interesting, as these chains are really paths through a graph.
Type in the content of your page here.