Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2010/10/03/cyclops_mysql_jquery_and_marvin

cyclops_mysql and jquery-marvin

It's been many months since I last posted anything. My fiancee came back from a tour of duty in Iraq and we did a driving tour for almost three months. Turns out, driving, travel, visiting friends and family, and getting things prepared for a wedding take a lot of work. I'm sad to say that I put off work and other external issues during most of this time.

Going back to work after such a long break proved hard, as many of you can well imagine. I needed something to get back into the swing of things before tackling client projects, so I dusted off an old project and played around with a new one.


In 2009 at OpenEye's CUP X, I presented some work I did on writing MySQL user-defined functions ("UDF"s) for cheminformatics using OEChem.

There's some history here. A client of mine was developing a web application and needed just a couple of cheminformatics extensions. They were already using MySQL and OEChem, so I think I billed them about 40 hours and wrote some UDFs for them. My contract with them says they get the copyright to what I do for them, so that was the end of that code.

But I wanted to present something for CUP so I rewrote everything, and improved on it. For example, I looked around at other cheminformatics extensions for MySQL and noticed they didn't make effective use of some ways to get higher performance out of a MySQL UDF. The two main examples are to allocate objects in the *_init function and use them during the rest of the search, rather than reallocating each time, and to check for static input values in the _init instead of trying to reevaluate them for each row.

I got some pretty good numbers, and reported them in my CUP X presentation titled "Database extensions for fun and profit." But I didn't release the code because it was conference-ware and not usable.

Later that year, OpenEye released their OEGraphSim toolkit with fingerprint support, including code for the 166 MACCS key and for path-based hash fingerprints. I decided to update my code to take advantage of this new toolkit.

The result is cyclops_mysql-1.0.tar.gz. The supported commands are:

For more details see the README. The "oe_*" functions map almost directly to OEChem or OEGraphSim functions. The "fp_*" functions work on fingeprints, expressed as hex-encoded strings. MySQL UDFs, unlike the equivalent technology in PostgreSQL, are not object based, so these functions only work with strings or numbers.

The package also includes a pretty comprehensive test suite and a benchmarking tool. The OpenEye tools are fast. On my laptop I can generate about 7,000 canonical SMILES per second in a database query, and about 16,000 SMARTS matches per second. See "macbook_pro.bench" for details, and "README.benchmark" for more information about the benchmarking tools.


The other tool I worked on was jquery-marvin-0.8.tar.gz, an improved interface for working with Marvin, which is a Java-based chemical structure viewer and editor. Marvin comes with a "marvin.js" script to help integrate Marvin into a web page, but as I write in the README:

Chemaxon's Javascript code for Marvin is 12 years old, according to the copyright statement. It uses a Javascript programming style which is now considered obsolete and it depends on a lot of brower sniffing which is no longer needed in modern browsers.
One of the obsolete things it does is "document.write". I didn't want to do that.

Now, there's reasons for ChemAxon to keep the code they have, and I mention possibilities in the README. But I don't the same constraints so I wrote a brand new interface, based on jQuery-ui. Some of the advantages to this package are:

It has some disadvantages as well:

This package is not production quality code but it is complete enough that the adventuresome and curious shouldn't have a problem, at least, not with the core functionality. To see how to use it, try some of the demos then look at the source.

Work, marriage, and honeymoon

Okay, it's time for me to get back to paying work, and to apologize to my clients for putting them off so long. Then again, I'm getting married soon, with a honeymoon immediately after, so I won't be able to get much done. Hmm...

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2013 Andrew Dalke Scientific AB