cyclops_mysql and jquery-marvin
It's been many months since I last posted anything. My fiancee came back from a tour of duty in Iraq and we did a driving tour for almost three months. Turns out, driving, travel, visiting friends and family, and getting things prepared for a wedding take a lot of work. I'm sad to say that I put off work and other external issues during most of this time.
Going back to work after such a long break proved hard, as many of you can well imagine. I needed something to get back into the swing of things before tackling client projects, so I dusted off an old project and played around with a new one.
There's some history here. A client of mine was developing a web application and needed just a couple of cheminformatics extensions. They were already using MySQL and OEChem, so I think I billed them about 40 hours and wrote some UDFs for them. My contract with them says they get the copyright to what I do for them, so that was the end of that code.
But I wanted to present something for CUP so I rewrote everything, and improved on it. For example, I looked around at other cheminformatics extensions for MySQL and noticed they didn't make effective use of some ways to get higher performance out of a MySQL UDF. The two main examples are to allocate objects in the *_init function and use them during the rest of the search, rather than reallocating each time, and to check for static input values in the _init instead of trying to reevaluate them for each row.
I got some pretty good numbers, and reported them in my CUP X presentation titled "Database extensions for fun and profit." But I didn't release the code because it was conference-ware and not usable.
Later that year, OpenEye released their OEGraphSim toolkit with fingerprint support, including code for the 166 MACCS key and for path-based hash fingerprints. I decided to update my code to take advantage of this new toolkit.
The result is cyclops_mysql-1.0.tar.gz. The supported commands are:
- oe_matches(smiles, smarts)
- oe_count_matches(smiles, smarts)
- oe_count_umatches(smiles, smarts)
- oe_lingosim(str1, str2)
- oe_path_fp(smiles, num_bits=4096, min_bonds=0, max_bonds=5, atom_type=191, bond_type=3)
- fp_contains(superstructure_fp, substructure_fp)
- fp_tanimoto(fp1, fp2)
The package also includes a pretty comprehensive test suite and a benchmarking tool. The OpenEye tools are fast. On my laptop I can generate about 7,000 canonical SMILES per second in a database query, and about 16,000 SMARTS matches per second. See "macbook_pro.bench" for details, and "README.benchmark" for more information about the benchmarking tools.
The other tool I worked on was jquery-marvin-0.8.tar.gz, an improved interface for working with Marvin, which is a Java-based chemical structure viewer and editor. Marvin comes with a "marvin.js" script to help integrate Marvin into a web page, but as I write in the README:
Now, there's reasons for ChemAxon to keep the code they have, and I mention possibilities in the README. But I don't the same constraints so I wrote a brand new interface, based on jQuery-ui. Some of the advantages to this package are:
- You can use $().marvinview() and $().marvinsketch() to put a viewer or sketcher at any place in the DOM tree and remove it.
- It uses only three global variables. (Two are needed to capture property and mouse events.)
- It includes regression tests (based on qunit.js)
- If you want the property change and mouse events then you can use jQuery's normal event mechanism instead of going through the global functions. (It even does the right thing with mouse events. Although I can't figure out someone would use those.)
It has some disadvantages as well:
- I've only tested it on my Mac with Safari, Opera, and Firefox. It's entirely possible that it won't work on IE or Windows, and I have no intention of supporting old browsers.
- I don't support the entire set of Marvin parameter APIs. For example, this demo shows that MarvinView can take MarvinSketch options, which are used if 'editable' is true, which lets people open a MarvinSketch window for further editing.
- There's no documentation.
- The self-tests are incomplete.
This package is not production quality code but it is complete enough that the adventuresome and curious shouldn't have a problem, at least, not with the core functionality. To see how to use it, try some of the demos then look at the source.
Work, marriage, and honeymoon
Okay, it's time for me to get back to paying work, and to apologize to my clients for putting them off so long. Then again, I'm getting married soon, with a honeymoon immediately after, so I won't be able to get much done. Hmm...
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2010 Dalke Scientific Software, LLC.