Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2003/10/07/naming_known_molecules

Naming known molecules

Surely there must be a cleaner way to name a molecule.

That runs over and and over in your head. Why do you want a name? You're looking for additional information about a chemical graph, so what about using a graph search instead of a text search? Suppose all chemical compounds were stored in a computer as a graph. To search the database, sketch the compound then do a graph isomorphism search. Graph isomorphism is slower than a text compare, so the search could be sped up with filters. Eg, search first for a matching molecular formula and only do the graph search on the records which pass the filter.

Hey! That could work! It would be even better if all the chemistry papers were put into the database, so anyone could look up a paper given the graph of a compound of interest. Oooh! And if it included published reactions as well, then people can get pointers on how to synthesize a compound.

Much to your delight (or chagrin), you find that the Chemical Abstracts Service beat you to this.

Substance identification is a special strength of CAS. It is widely known as the CAS Registry, the largest substance identification system in existence. When a chemical substance, newly encountered in the literature, is processed by CAS, its molecular structure diagram, systematic chemical name, molecular formula, and other identifying information are added to the Registry and it is assigned a unique CAS Registry Number. Registry now contains records for more than 22 million organic and inorganic substances and more than 34 million sequences.

They digitize all this information, make it searchable, and license the technology for others to develop search software for your computer. Or if you want, you can get it on paper, microfilm, or CD-ROM. All for a price of somewhere between a few hundred and nearly 30,000 dollars/year depending on who you are and what you want. (Who says information wants to be anthropomorphizedfree? Actually, the cost in part reflects the service needed to keep things up to date with the literature and in part the high barrier to anyone else reproducing their database; the skills of inexpensive off-shore chemists not withstanding.)

They are also a naming service. They assign a new, unique CAS number for every compound in the database. Ethanol is CAS# 64175. You can design your compound database system to store the CAS# as the primary key. When you need more ethanol -- without the tasty impurities you'll get from your pub -- ring up your supplier and order it by CAS#. This helps make sure both parties are talking about the same thing.

Problem solved. You can isolate a compound, determine its structure, get the CAS# and/or its IUPAC name, and look it up in the literature. Or is it solved?....


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2020 Andrew Dalke Scientific AB