Surely there must be a cleaner way to name a molecule.
Naming known molecules
That runs over and and over in your head. Why do you want a name? You're looking for additional information about a chemical graph, so what about using a graph search instead of a text search? Suppose all chemical compounds were stored in a computer as a graph. To search the database, sketch the compound then do a graph isomorphism search. Graph isomorphism is slower than a text compare, so the search could be sped up with filters. Eg, search first for a matching molecular formula and only do the graph search on the records which pass the filter.
Hey! That could work! It would be even better if all the chemistry papers were put into the database, so anyone could look up a paper given the graph of a compound of interest. Oooh! And if it included published reactions as well, then people can get pointers on how to synthesize a compound.
Much to your delight (or chagrin), you find that the Chemical Abstracts Service beat you to this.
Substance identification is a special strength of CAS. It is widely known as the CAS Registry, the largest substance identification system in existence. When a chemical substance, newly encountered in the literature, is processed by CAS, its molecular structure diagram, systematic chemical name, molecular formula, and other identifying information are added to the Registry and it is assigned a unique CAS Registry Number. Registry now contains records for more than 22 million organic and inorganic substances and more than 34 million sequences.
They digitize all this information, make it searchable, and license
the technology for others to develop search software for your
computer. Or if you want, you can get it on paper, microfilm, or
CD-ROM. All for a price of somewhere between a few hundred and nearly
30,000 dollars/year depending on who you are and what you want. (Who
says information wants to be
Actually, the cost in part reflects the service needed to keep things
up to date with the literature and in part the high barrier to anyone
else reproducing their database; the skills of inexpensive off-shore
chemists not withstanding.)
They are also a naming service. They assign a new, unique CAS number for every compound in the database. Ethanol is CAS# 64175. You can design your compound database system to store the CAS# as the primary key. When you need more ethanol -- without the tasty impurities you'll get from your pub -- ring up your supplier and order it by CAS#. This helps make sure both parties are talking about the same thing.
Problem solved. You can isolate a compound, determine its structure, get the CAS# and/or its IUPAC name, and look it up in the literature. Or is it solved?....
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2013 Andrew Dalke Scientific AB