Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2003/10/06/drawing_molecules

Drawing molecules

Many chemicals (okay, an infinite number) can be represented as a molecular graph, with atoms for nodes and bonds for edges [*]. The nomenclature is somewhat confusing since each the molecular graph is often called a molecule, and each component (connected subgraph) of the molecule is a chemical molecule. A molecule data structure may contain 0 or more subgraphs representing chemical molecules.

Molecules have a three dimensional structure. Small molecules are often planar graphs and can be drawn as a two dimensional depiction, with some special notations to handle chirality. (This is sometimes called two-and-a-half dimensional.) Here's a depiction of ethyl alcohol, known more affectionately as ethanol and found at the pub nearest you.

That's a very verbose depiction with all those hydrogens sticking off the heavy atoms. (There's all sorts of short hand names in chemistry which highlight your background. I did molecular modelling of biomolecules, where hydrogen is light and everything else is heavy because it's 1/12th the mass of carbon, the next heaviest atom we dealt with. Others more interested in metals call non-metals grease. No doubt some cosmologists classify everything as hydrogen, helium, and impurities.)

It can be compacted somewhat by moving the H'es alongside the heavy atom, like the next picture. I choose to use H3C instead of CH3 because of stylistic reasons. I'm sure a chemist would shun me for that. :)

Writing in the hydrogens gets tedious. As it turns out, atoms have valences, which you might recall from introductory chemistry when you learned the Lewis dot model. Carbon has a valence of 4 which means that it takes 4 single bonds, or 1 double bond and 2 single bonds, or 2 double bonds, or 1 triple bond and 1 single bond.

Actually, the Lewis model would also allow a quadruple bond under the Octect rule, but that's not going to happen with carbon because under the valence bond model is isn't possible to have all four electrons in the outer shell point the same way. Larger atoms, in the 3rd row of the period table and above, can have quadruple bonds, and it looks like some systems even have quintuple bonds. Daylight's toolkit only supports up to triple bonds. OpenEye extends SMILES to include $ as the quadruple bond symbol, and now I see ChemDraw even has a hextuple bond, and it's been observed in Cr2.

Chemists just assume the valences will always be filled. (You have to get to some pretty unusual physics to break that assumption, like ultra high vacuum or extremely short timescale interactions.) Rather than listing the hydrogen counts explicitly, they decided to use an implicit hydrogen representation, where the number of hydrogens on an atom is the atom's valence plus the charge minus the sum of its bond orders. The most common exception to this is for hydrogens around a chiral center.

Here's ethyl alcohol drawn using implicit hydrogens. (I think chemists call the earlier version with explicit hydrogens a redundant depiction.)

As an(other) aside, the polar hydrogen model lies between the explicit and implicity models and is used in molecular mechanics. Polar hydrogens have a large partial charge and are more likely to be in long-range hydrogen bonds. Nonpolar hydrogens are mostly involved in van der Waal bonding, which is very short range. MM merges a heavy atom and its nonpolar hydrogens into a new atom type with masses and charges adjusted accordingly. This cuts the number of simulated atoms in half with hopefully only slight effect on the result.

If you're chemist drawing organic molecules all the time you'll end up drawing a lot of carbons. Standard practice is that "normal" carbons aren't drawn at all. Any bond ending or a bend with no element symbol is assumed to be an uncharged carbon of average molecular weight. (I suppose there are rare exceptions, like if you make isotopically pure 14C diamond or buckyballs you might ignore the 14.) But when drawing your NMR structure, use 13C. (And when you email me to correct my mistakes, remember, I'm a physicist by training and have only learned your native practices by osmosis. :).

Here's ethyl alcohol as a chemist would sketch it. (I think they call it a line drawing.)

Can't get much more terse than that.


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB