Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2003/10/14/systematic_name

Systematic Name - History of Chemical Nomenclature

Note: I am not a chemist nor have I researched this throughly. Don't trust your term paper on what I've written. Please let me know of any mistakes I've made.

The molecular formula is great for the task of listing proportions. It isn't enough. Chemists build on the work of others. For that to work, a chemist needs to describe a compound and other chemists need to know that that description exists. The old solution was to come up with a new, unique name for a compound, based often on its origin. Everyone simply memorizes that list of names. But while chemists have exceptional memory for chemistry terms, it's impossible to memorize millions of names if there is no underlying meaning in the name or relationship between names.

Chemists of the 19th century came up with two ways to describe a compound which did work. The prefered way, still paramount, is through the chemical diagram. It appears to have been started by Archibald Scott Couper in 1858 (need to do more research). This is the familiar two-dimensional (also called topological) depiction of the molecule. It is very simple to understand after some training, and it builds on the ability of the human mind to interpret images. It's especially powerful when comparing multiple compounds which are part of a series and where the core structure is aligned.

(It really is built on the human ability to remember pictures, and less on the ability to interpret graph topologies. The easiest way to test that is to take a depiction a chemist usually uses and flip it upside down. It will likely take a little bit for that chemist to recognize it. That's why the standard nomenclature for steroids specifically shows the prefered orientation in depictions as

and states that "[p]rojections of steroid formulae should not be oriented as in formulae 2c, 2d or 2e unless circumstances make it obligatory, e.g.in dimers formed photochemically."
Is it immediately to you that all those structures are the same? If so, are you also a chemist?)

Depictions are a very good way for chemists to describe a compound but they suffer from several notable problems. First, they are big. Consider what a chemistry paper would look like if it had to use

Rinse with
instead of
Rinse with ethanol.
That's a tiny molecule. Imagine using one of the steroid images instead.

This problem could be, and is, remedied in part by depicting the graph once in a paper, assigning it a name, and refering to the graph through its name. For common compounds, like ethanol, there's no need to show the graph. There's still the problem of coming up with a good name.

Another problem is that it's hard to say a graph. Try giving a talk with circumlocutions like "okay, there's a six-element ring fused to another six element ring fused to another six element ring (two bonds from the first fuse) which is itself fused to a five element ring (four bonds from the second fuse)." Again, it needs a name.

The other big problem, especially in the pre-computer days, was the ability to search for a graph. How are graphs sorted into a list? If there's an index, is there a (large) depiction for the index as well as the record itself? A picture is said to be worth a thousand words, but surely there must be a shorter way to describe a picture as a shorter word, which can fit on a line of text; a line notation.

Chemists aren't stupid. They quickly noticed that carbon played an important role in all sorts of interesting compounds, which became the field of organic chemistry. (Inorganic chemistry has its own way of doing things in part because of the increased role of describing the crystal structure and the decreased complexity of the compounds. Carbons, unlike most every other element, easily forms long chains.)

Various sorts of nomenclature systems were used to turn the graph into a word. International codification occured with the Geneva Congress of 1892, which evolved into what is now called the IUPAC nomenclature. It general approach starts by finding the parent. In the simplest case it's a matter of identifing the longest chain of carbons, figuring out which end is the start end, then naming the bits and pieces which come off each side. This is recursive because those bits and pieces may have more bits and pieces.

Life (which is made of chemistry) isn't so simple. What about rings? What if there are several chains of equal length? These can all be figured out, and there is a way to do which generates a unique word. The spanner in the works is that chemists don't want just a name for the compound; they want the name to indicate functionality. (This makes it easier to figure out what something does, and give the ability to catalog, say, all compounds with the same function together.) Substructures, like an ketone or alcohol or aldhyde, are strong indicators of functionality so are used to determine the parent.

There end up being many complications in making chemistry nomenclature fit the chemist's model of chemistry. For example, a compound can have have multiple functions. Nicotinoyl morpholine and pyridyl morpholinyl ketone are both names for the same compound (according to Garfield's thesis; remember, I'm not a chemist). In the first case, the morpholine is regarded as the parent structure, while the ketone is the parent for the second case. Garfield says it took three years to train a chemist in how to use the system, and the result is a systematic name which may be quite different than the name the chemist uses.

The term "systematic name" is quite interesting. It's meant as the word created when turning the graph into a unique name following the rules of a nomenclature system, but it's different than the term canonical name used for line notations like SMILES. I don't know why.

Garfield points out that while all compounds may have a systematic name, getting chemists to always use the systematic name is impossible. Some compounds are known by a trade name, like formaldehyde, and others because the compound was identified long before the structure was determined, like insulin. These are called trivial names, but one chemist's trivial name is another chemist's systematic name. A steroid chemist will use the term androstane and not cyclopentanophenanthrene, because the first is part of the systematic name for steroids, and because the second is just too long.

The nomenclature system then is very much like a dictionary. Some nations (France) have an official language (French) with an institute (L'Académie française) which proscribe all the words in the general language ("email", non; "courriel", oui). Other dictionaries (OED) describe how words are used but are not meant to enforce use and have liberal rules for accepting new words. In any case, you might think you're a real hep cat making up some fly words but it's pretty pants if no one gets your groove.

The major difference is that chemical nomenclature can be used to describe any compound (up to limits of the chemical model used; organic chemistry nomenclature can't be used to describe iron ore) and is not limited to a finite set of words.

I really like Garfield's use of linguistics to recast the trivial/systematic dichotomy. He uses the word idiom. A term is an idiom if it can't be understood from it morphemes. I take that to mean that all morphemes (in chemistry these include eth, an, and ol) are idioms as is insulin while ethanol is not (being composed purely of morphemes). Seriously. Read his paper. It's quite comprehensible even to a non-chemist, and it gives a good history of the problem and (in retrospect) a perspective of the state of computers in the 1950s.


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2020 Andrew Dalke Scientific AB