Skip to main content
Get your brand new Wikispaces Classroom now
and do "back to school" in style.
ICEP: Indiana Cheminformatics Education Portal
Pages and Files
2D chemical database searching systems
3D visualization, alignment, docking and scoring
Characterizing 2D structures with descriptors and fingerprints
Chemical structures on the web and in the scholarly literature
Cluster Analysis and Diversity Analysis
Data mining of chemical & biological information
MOOCs and Learning Materials Relevant to Cheminformatics
Quantitative Structure-Activity Relationships (QSAR) and Predictive Models
Representation and characterization of 3D structures
Representation of 2D structures on computer
Add "All Pages"
Representation and characterization of 3D structures
Where do 3D structures come from?
2D chemical structures can be derived from knowledge of the atoms that are present in a compound, and how they are bonded together. This is common knowledge for all substances, and thus we do not need to consider where they come from. However, there is no
information available to us that would reveal what the 3D structure of a compound would be. Indeed, as we shall see, all compounds are flexible to some degree, so the 3D structure will change over time. And we must bear in mind that, as with 2D structures, we are dealing with a model, not with reality itself (which is, to the best of our knowledge so far, a grand scale fuzzy quantum event!).
There are three main sources of 3D structural information:
Computer-generated 3D structures
Dealing with conformational flexibility
Most compounds have
, which means that the whole molecule can flex into many different
in 3D (see, for example, how much a
"rigid" cyclohexane chair
can flex, and how much a
more flexible molecule
can move!). Thus there is not just one 3D structure, but for any one compounds there is an infinite number (or less than infinite if, say, we consider discreet rotation units) of possible conformers.
However, not all conformers are equal. In particular, molecules prefer to be in low energy states instead of high energy states. Therefore we may decide to store just one (low energy) conformer (and let algorithms flex the molecule as needed), or produce several conformers (say a sampling of different lowe energy orientations).
First, we have to decide how to determine whether a bond is rotatable. A good working definition is: any single bond which is not part of a ring, is not terminal (e.g. Methyl) and is not in a conjugated system (e.g. an Amide). However, this is not perfect: we do know conjugated system bonds can rotate to a degree (based on the degree of conjugation), and we can have flexing of rings (say between
chair and boat conformations
When we are discussing the rotation of rotatable bonds, you will hear two terms used: the torsion angle (and the
. These two terms are synonymous, and refer to the relative position, or angle, between the A-B bonds and the C-D bonds when considering four atoms connected in the order A-B-C-D, i.e.:
Representing 3D conformers on computer
In addition to the information stored in the 2D structure (the atoms and how they are connected by bonds), for 3D conformers we also need to be able to store the coordinates of atoms (relative to some origin). There is no well established linear notation for storing this information, although SLN does allow atoms to be labelled with coordinates. More usual is a connection-table type file format, often either an
file or a
file. Other formats can be used too, such as
and for the coordinates simply an
Internally, we can create a coordinate table which is simply an extension of the atom lookup table to store X, Y and Z coordinates for each of the atoms relative to a defined origin. It is normal for this coordinate system to be based on
(i.e. one unit is one Ångström). Here is an example:
Once we have a coordinate table, we can derive from it a
that specifies the distance (in Ångström) between any two atoms in the conformer. For example:
Note that this also specifies a
Once we have distances, we can also use
In addition to storing coordinate tables and distance matrices for 3D conformers, we can also use various ways of specifying degrees of flexibility of a compound in 3D. For example we can specify two coordinate tables, one which stores a minimum X, Y, and Z value for an atom, and one which stores a maximum value. Or we can similarly specify minimum and maximum distance matrices.
Generating and manipulating 3D structures with a computer
There are a variety of programs that will "convert" 2D structures (say in SMILES format) to 3D structures. Often these will produce "valid" 3D structures, but not necessarily an energy minimized one (unless they are combined with an energy minimization tool as described below). These programs may output a single structure, or an ensemble of 3D structures. Most of these methods are fragment & rule based, that is they split the 2D structure into small fragments that are then matched to pre-defined dictionary of 3D fragments. By a series of rules and theory these are then combined together into a full 3D structure. Examples of this kind of approach are
. Other methods use Distance Geometry methods to rapidly sample the "conformational space" of a molecule to look for valid conformations based on distance bounds. The most prominent current example of this approach is
Most of these methods also perform
, which can also be applied to 3D structures from any source (e.g. Xray or NMR). An energy minimization algorithm will take a conformer as input, and will attempt to rotate and flex the molecule such that the potential energy is minimized. To do this, we can apply any one of many optimization algorithms. Some of these will only find local minima (such as hill climbing), whilst others will attempt to find global minima (such as
is a set of molecular features that is required for binding to a particular protein target. It is almost always used to refer to structural features (or derivatives such as hydrogen bonding potential), and is usually used in reference to 3D structures. A pharmacophore may be defined as set of features and distance bounds of these features from each other in 3D, and can be generated from either a target, or from a set of ligands. For example, "An OH group between 2 and 5 Ångström away from a carboxyl oxygen, both of which are 7-8 Ångström from a benzene ring":
A pharmacophore can be used as a query to a database too. Note that a pharmacophore search is like a substructure search in that it is a subgraph query on a fully-connected distance matrix graph. A pharmacophore can be represented in a variety of ways: for instance, a distance matrix of pharmacophore points (with a dictionary for point types which may contain coordinates of 3D substructures or SMARTS of 2D features). Note that we often need to be able to represent distance ranges (rather than exact distances) and we also may need to represent ambiguity in pharmacophore points.
3D descriptors and fingerprints
Just as with 2D, we can generate 3D structural or property-based descriptors. The equivalent of 2D structural keys are 3D pharmacophore "fragments". Sometimes these are called triplets or quadruplets based on the number of atoms in each of the fragments. Note that these fragments can contain distance ranges and ambiguous points just like a full pharmacophore. For a set of molecules, there are a huge number of triplets or quadruplets that can be generated, so these are usually hashed down onto a fixed number of bits.
A variety of other kind of descriptors can be created for 3D ranging from atom-based (e.g. partial charges generated from semi-empirical methods) to full molecule field-based (such as electrostatic, steric and hydrophobic fields). These can be used for a variety of applications (molecular alignment, docking, and similarity)
Databases of 3D structures
A good overview of how databases of 3D structures can be used in drug discovery is given on
Pharmacophore searching is the equivalent of substructure searching in 2D: we supply a pharmacore query and then return all of the molecules which could satisfy the query (either by flexing the molecule, or by storing multiple conformers).
Similarity searching in 3D can be simply a matter of calculating at Tanimoto coefficient or Euclidean distance between two fingerprints (as in 2D). However there are several other ways of calculating 3D similarity that are not based on 3D similarity - for example by comparing distance matrices and mapping atoms in one molecule onto another, or by aligning the molecules to maximize the overlap of fields, and then measuring the amount of overlap between fields.
Available 3D databases
The most comprehensive database of 3D chemical structures generated by x-ray crystallography is the
Cambridge Structural Database
. This database contained
469,611 structures as of January 2009. The database comes with a variety of tools for viewing and analyzing the structures, including several
In particular, there is a
free 500 compound subset
of the database available for teaching purposes. There are a variety of databases of machine-generated 3D structures available, in particular Indiana University hosts
, a database of PubChem structures converted to 3D with
help on how to format text
Turn off "Getting Started"