Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2004/01/03/available_toolkits

Chemical Informatics Toolkits

Now that you know some of the history of chemical nomenclature, how do you actually use one of these molecular representations to do science? The easiest way is to get existing software which does what you want. For example, you can buy an integrated set of tools from Tripos, MDL, or Accelrys. They can be used for many of the tasks needed for chemical research, and they are extensible and scriptable so that new capabilities can be added by customers.

There are limits to the extensibility. These programs were designed as applications, with the assumption that that application will always be in charge. Suppose though that you want to write an Excel add-in which uses parts of Sybyl to compute molecular properties. That can't be done by a customer because Sybyl can't be used as a library that way. (Or at least not that I know of.) You're somewhat stuck, and your add-in will need to use a workaround like starting sybyl without a GUI and passing it an SPL script.

If you are writing a new application, or new plug-in, or need direct access to the molecule's data structure, or writing an algorithm which otherwise exceeds the limitations of these applications (or if you're a programming geek who prefers are "real" programming language) then you'll need to look at the available toolkits. These are software libraries which are used as part of a larger system instead of the other way around. You can buy some from a commercial vendor, like Daylight (and PyDaylight, my Python API to it) or OpenEye, or get one of the open source variants, like Frowns, Open Babel, or JOELib.

As an interesting note, chemical informatics is a small field and these are all interrelated projects. The main parent is Daylight, which started in the 1980s with the family Weininger at Pomona College. I started PyDaylight at Bioreason in 1998. Brian Kelley worked with me on it. He wrote Frowns after leaving Bioreason and based its API on PyDaylight. Matt Stahl and Pat Walters wrote Babel in the mid-1990s at the University of Arizona as a molecular converter program. They use that code for various projects, and when Matt started at OpenEye in 1999 (or late 1998?) they decided that OpenEye would develop an open source version based on Babel, which became OELib. After a few years, they decided that a rewrite was in order and that the new version, called OEChem, would be closed source. This is OEChem. OELib got picked up by others and turned into Open Babel. JOELib is a Java OELib and modeled on that library.

So where are the ties between the Daylight and Babel derived threads? Daylight, OpenEye, and Bioreason are all located in Santa Fe, NM. Dave Weininger over at Daylight encouraged Anthony Nichols to start OpenEye, and Anthony liked the Santa Fe area. Part of the encouragement was, I think, to show that a company could make a living from selling a set of chemistry-oriented toolkits. Roger Sayle (of RasMol fame) was a VP at Metaphorics, a sister company to Daylight. C programmer that he is, he also helped out a lot with the Daylight toolkits and provided some algorithm suggestions for OELib, and I think he contributed some SMILES and SMARTS parsing code he was using for some of his own projects. (Given the different edge cases in the two code bases, I know Daylight and OELib uses different parsers :)). In mid-2000 he decided to work down the street, as it were, with OpenEye. In addition, Pat at Vertex used the Daylight toolkit for projects while also contributing code and support to OpenEye. And there's me, who uses both toolkits and submits obscure bug reports, to the combined thankfulness and annoyance of both companies.

Are there others? Certainly, although I don't know how accessible they are as toolkits. CACTVS must have some chemical informatics libraries and I know people have paid Wolf-D. Ihlenfeldt for some of the components, but I don't think it has a low-level API for all those libraries. Pipeline Pilot has SMILES and SMARTS cababilities but are accessed through as SOAP server calls rather than through a more traditional library interface.

In addition to the publically known projects, many companies have in-house projects. In my research for the essays on nomenclature history I read that many of the large pharmas wrote their own systems in the mid-1900s and I've heard that some of those still exist. I consulted with Combichem and helped them with their in-house chemical informatics toolkit. Combichem was bought first by DuPont Pharma then eventually resold to Deltagen, which proceeded to go under. People who left Combichem convinced their new employers to buy a copy of that codebase.

Or, you can write your own.


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB