Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2005/07/24/PyDaylight_release

PyDaylight 1.0 released

I just updated PyDaylight with support for v4.91 of the Daylight toolkit. It ships with backwards-compatible support for v4.8x. I tested it under with all 4 toolkit versions under Linux, Solaris and IRIX, using Python 2.4. Thanks to Daylight for letting me use their machines for the upgrade and testing.

[Download PyDaylight-1.0.tar.gz]

The new version is called 1.0. I was about to call this version 0.91. The previous ones were 0.9, 0.85, 0.8, and so on. I was holding off on using the 1.0 name until someone tested the Thor and Merlin support, but as most people are migrating to the Daylight Cartridge that isn't really important. PyDaylight supports pretty much all of the toolkit and its core is now some 7 years old so it's time to commit and not hide behind the "still under development" numbering.

The upgrade started with me compiling Python 2.4 for Daylight's machines. I've tried for years but they are a C shop and don't even use Python for in-house use, so their machines had some rather old Python versions, if it was even present. There weren't any problems with the builds though I didn't run the regression self-tests. (Once upon a time the SGI optimizer couldn't handle the regular expression module.)

PyDaylight uses a modified version of dayswig to build the C extension for Python. This uses SWIG, which also wasn't installed on the Daylight machines. NOTE: the distribution includes pre-swig'ed files for versions 4.8x and 4.91 of the toolkit so you likely don't need SWIG on your machines.

I had to change dayswig slightly to support DX_API_PUBLIC in the function signature. This is __stdcall when the toolkit is compiled for MS Windows, empty on Unix machines. Daylight no longer supports the monomer toolkit so dayswig only includes that interface if dt_monomer.h exists under $DY_ROOT/include. This is perhaps doing too much work for myself since after all no one I know uses the monomer toolkit - that's why Daylight's no longer supporting it!

The latest SWIG release is 1.3.25 but that didn't work quite right. It didn't like the wrapped Daylight handles. The SWIGged interface now requires that the objects be derived from Python integers, when once upon a time it did coercion via int(obj). A nice feature of PyDaylight is that its objects integer handles are intermixable, and much of the code depends on that flexibility.

After digging for a while I couldn't find an easy fix so I did what I should have done a couple hours previous and used an older version of SWIG. For the 0.9 release I used 1.3.11 which is still available. Switched to that and a few minutes later dayswig_python worked as expected.

Next I make sure that each of the new functions was supported. I wanted the PyDaylight code (on top of the dayswig_python level) to support 4.8x and 4.9x transparently so I created a new internal variable, daylight._toolkit_version which can have integer values like 4830 (for 4.83) and 4910 (for 4.91). I decided to make a new variable, different than but based on the value of DX_TOOLKIT_VERSION, because then I could control its value and ensure it was appropriately comparable.

The README for the release lists the API features. The functions dt_molgraph(3), dt_addh(3), and dt_suppressh(3) are available as the new Molecule and Reaction methods molgraph(), addh(), suppressh(). I added a default so the last two are applied to all atoms and not just chiral ones.

In my testing I found there was a bug in dt_addh and reported it to Jack, who tracked it down and fixed it. The bug looked something like this, though the following comes from memory:

>>> from daylight import Smiles
>>> mol = Smiles.smilin("C")
>>> mol.addh()
>>> mol.cansmiles()

Apparently the count used to figure out the number of branches used data fields which weren't set right for the newly created hydrogens.

The new dt_molgraph is kind of strange. It's the only toolkit function which modifies a molecule but doesn't check the mod flag. It always sets the mod bit, does its work, and turns the mod bit off. I talked with Jack and I think that will be changed in a future toolkit release so molgraph only works when the mod bit is set, and where it doesn't set the bit. Watch the release notes! :)

As a side effect of my testing I found a fun new way to wreak havoc on the toolkit. Consider this

>>> from daylight import Smiles, Bond
>>> mol = Smiles.smilin("C"*200)
>>> atoms = mol.atoms
>>> for atom in atoms[2:3+N]:
...     Bond.add(atoms[0], atom)
>>> mol.mod = False
>>> mol.cansmiles()
There are two failure modes. If N == 100 (so there are 100 bonds to the first atom) then the toolkit hangs in the cansmiles() code. It's most likely looking for an available ring closure number, but none are available. If N == 128 then dt_mod_off fails. It looks like there's an internal table that expects no more than 128 bonds for an atom. Interestingly, if dt_mod_off fails then as a side effect it deletes the molecule or reaction object. PyDaylight doesn't catch that error condition so keeps the now dead handle around. Future use of the object will have strange side effects and the garbage collection will say something about an uncaught exception because the dt_dealloc fails.

The dt_smilin_addh(3) is available as daylight.Smiles.smilin_addh(). SMILES errors are now appended to the regular error queue instead of the special SMILES error queue (finally!) so I tweaked the code used to get the last error so it does the right thing. There's still a toolkit bug where some SMILES errors don't cause an error. The one example I found was ">". Jack's going to look into it.

There's a new fingerprint function in v4.91, dt_fp_similarity(3). This takes two fingerprint handles and an expression(5). The expression strings look like "c/sqrt((a+c)*(b+c))" where the variables a, b, c, and d, are the number of bits only on in fp1, only on in fp2, on in to both, and off in both, respectively. There are a few functions like sqrt(), min() and max() and the normal operators "+-/*^". Constants can be written as integers or simple floats (exponential notation like 2.3E-09 isn't supported). The internal evaluation uses doubles but the result is return via a dt_Real which is only a float. A float seems rather small these days, since it only has 6 base-10 digits of precision.

>>> i=40999900; float(i) == daylight.dt_fp_similarity(fp, fp, str(i))
>>> i=40999901; float(i) == daylight.dt_fp_similarity(fp, fp, str(i))

The new PyDaylight function for this is Fingerprint.similarity. The Daylight function supports a few hard-coded expression strings like "COSINE" and "TANIMOTO". These are tested via exact string matches and cannot be used as variables in the expressions. If you want the latter I copied the definitions from the documentation into the table Fingerprint.expressions.

One problem I have with dt_fp_similarity is that I can't tell if there was a syntax error in the expression. The function will return -1.0 in that case but it's possible that the expression is supposed to return -1.0. There are a couple of other places where it's hard to tell if a return value is an error indicator or not, but in those cases I can check if there's a new message in the error queue. Not here. The error message goes to the terminal and not to the error queue. I hope they fix this for the next release.

I suppose you could implement your own parser in Python. Shouldn't be too hard, either using eval (despite the potential security problems) or use the parser generator included with PyDaylight for the MCL support.

The new function dt_ischiral(3) test whether or not the given atom is chiral. This is available as a new read-only attribute of Atom instances named ischiral.

>>> mol = Smiles.smilin("OC(Cl)=[C@]=C(C)F")
>>> [atom.ischiral for atom in mol.atoms]
[0, 0, 0, 1, 0, 0, 0]

The new function dt_setbondstyle(3) is available as the setbondstyle method of Depiction instances.

I moved some of the module self-tests into the test/ directory and created a new test file test_v491.py for each of the new features. You can look at that file for examples of use.

Finally, there was a bug fix to handle a case found by Terry Brunck. I didn't deallocate temporary streams for looping over atoms. Normally that's okay because the toolkit deallocates those when the molecule is deallocated. But his algorithm made thousands of streams and the Daylight deallocator used a garbage collection algorithm which doesn't scale well for that case. The fix was to delete the temporary stream once it's no longer needed. There was even a comment in the code questioning why there wasn't a dealloc.

"Daylight", "Daylight toolkit", "Thor" and "Merlin" are registered trademarks of Daylight Chemical Information Systems, Inc. Daylight C.I.S. is neither affiliated with nor responsible for PyDaylight. Are you kidding? They don't think that Python can be used for real programming. (Hi Daylight krewe! :)

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me

Copyright © 2001-2013 Andrew Dalke Scientific AB