Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2003/10/14/paper_databases

Paper databases -- History of Chemical Nomenclature

The Geneva Congress of 1892 defined a nomenclature for international use. This was a formalization of systems already in use. One of the most important uses of nomenclature system were for indexes like Beilstein ("Beilstein Handbook of Organic Chemistry"), first published in 1881. It was used by chemists who wanted to find more information about a compound or related compounds. I am not a chemist and I've thankfully never had to use Beilstein but I think I've figured it out enough to give a sense of what life was like before computers. If you want a real guide, with more details and even helpful pictures, try other sites.

(The closest I've come to something like Beilstein looking through Gradshteyn and Ryzhik or Abramowitz and Stegun for the solution to a math equation.)

In modern day speak Beilstein is a database of chemical records. Each record entry has information about a compound, including its name, molecular formula, and a relevant publication reference, and possibly a depiction and physical properties like boiling point. All in German because Germany dominated the field of organic chemistry in the 1800s.

The entries are sorted by structural type into volumes (and subvolumes, with new volumes added over time). The acyclic compounds are in volumes 1, 2, 3, and 4. Acyclic ompounds with no functional group are in volume 1 as are hydroxy-, oxo-, and hydroxy-oxo compounds. Acyclic carboxylic acids are in volume 2 unless they also have hydroxy- and oxo-functions, in which case they are in volume 3. And so on. This ordering gives a way for chemists to browse for other structually similar compounds with similar function.

If the systematic name is known, use the General-Sachregister (name index), which maps from name to record location (volume, subvolume, page number).

If the systematic name isn't known for a compound, first determine its molecular formula in Hill order. Go to the General-Formelregister (formula index) of Beilstein for a list of compounds with that formula. Es gibt viele Verbindungen mit ... Sorry, got carried away trying to remember enough college German to read some of the Beilstein examples. There can be many compounds with the same molecular formula. Even something simple like C2H6O could be either ethanol or dimethyl ether. All the molecular formula does is greatly reduce the number of compounds to consider. It can be reduced even more by using knowledge of German chemistry nomenclature (perhaps with the help of a handy German/English chemistry dictionary) to figure out which of those compound names are most likely to correspond to the structure.

Here's where using a line notation really pays off. There's generally about 60 lines per page. From pictures I've seen, structure formulas even when compressed for space look like they take up about 5 lines of text. To display the structure in the index requires at lest quintupling the number of pages used for the index. Since a record itself is only about 5 lines long, it would mean doubling the number of an already large publication.

If that fails, the compound might still be in Beilstein. Some compounds aren't listed in the formula index but can found by a combination of a structure-based decision and leafing through pages. For an example of the joys of a search, take a look at this page where it describes looking for the aluminum salt of 8-hydroxyquinoline (a laser dye).

It isn't listed in the formula index so the way to find it is to use knowledge of how record entries are laid out. The dye is a heterocyclic system with one nitrogen, so should be found in volume 21. That contains information about to get to the correct subvolume, which lists the start page for compounds of the form CnH2n-11NO and more specifically the start page for C9H7NO. From that page, manually search starting from page 1057 until it's found on page 1144. Yowza!

There are other chemistry indicies like CAS, which indexes journal publications and uses their own nomenclature and search system. Given the last few essays, you should now be as competant as I at understanding summaries of how they work.

That describes how to use Beilstein, and should provide clues as to how the database was generated. While I don't know if this is what they did, this is my best guess. There's a team of chemists trained in the nomenclature system (it took about three years to become an adept). They read the raw sources (books, journals, etc.), convert the information to a structure diagram and apply the nomenclature to get the systematic name. They then searched their notecards to see if they knew about it already; updating the cards if they did. If not, they create new cards; one for the record, one by name, and one by formula. When it's time to publish they went through the cards and created the printing plates, which were used to print the book. Very labor intensive, but that was state of the art. There were only minor improvements in the process, like improved printing press technology which made it easier to include depictions, until the 1940s.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2013 Andrew Dalke Scientific AB