Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2008/05/12/eurocup2008

EuroCUP 2008 presentation

The following is text to accompany my presentation for EuroCUP 2008. I do not have a license for OEChem on my public facing web server machine so I cannot have a live demo for any of the code examples.

Download the presentation as PDF.

AJAX and the OpenEye Tools

My name is Andrew Dalke. I'm an independent software consultant and instrutor based in Göteborg (Gothenburg), Sweden. I mostly focus on developing computational chemistry tools and helping scientists become more capable in using computers to do their research.

[page 2]

Suppose you want a web page that shows a graphical 2D depiction of a compound given its SMILES. One very traditional way to do this - the Daylight libraries have supported it for over 10 years - is with a CGI script serving images based on the GET query parameters. The HTML might look like

<html>
...
<img src="/depict.cgi?smiles=CC(=O)Oc1ccccc1C(=O)O" />
...
</html>

The web page gets the HTML, figure out it needs an image, and makes an HTTP request to the src URL. The web server, which is usually Apache, gets the request, converts it into a CGI request, and runs the program named "depict.cgi". This program uses the CGI parameter to create the requested depiction. In real life the CGI script may in turn call another program to do the actual depiction.

[page 3]

This interface was developed about 15 years ago and is still a valid way to write web applications. There are many other ways to handle the interface between the outside world and the actual work which needs to be done. The modern term for the different layers, which can include database access, session maintenance, and output templates, are now called the "web application stack." Ruby on Rails is a popular "full stack" system developed over the last 4, and Django and TurboGears are roughly similar systems for Python. All my examples are based on TurboGears.

[page 4]

The web server implementation should not affect how the web interface works. That it, there should be no reason to change any of the URLs or get different HTML back from the server. In practice though you a few things do change. For example, using the extension ".cgi" in the URL is a bit of a cheat. It's there because that's one way Apache can tell if a file is a data file or an executable CGI script. In use it's a "leaky abstraction" because it lets some of the internal implementation decisions leak into the public. This can make it harder to port to other system.

In my case I'm using TurboGears, which by default doesn't do well with periods in the URL, so for my examples I'll remove the ".cgi" from the URL.

The TurboGears code is structured very similarly to the Apache code. An HTTP request comes in, TurboGears converts that into a Python function call (instead of CGI request), and calls the function that handles the request. In this case that Python function doesn't know anything about chemistry. It leaves the details up to OpenEye's ogham toolkit for 2D structure depiction.

The biggest architecture different is that everything is done through Python and Python libraries, and everything occurs in the same process space. I don't have to start up a new program for every request.

By the way, if you're curious on how I get ogham to generate a PNG output as a string, rather than as a GIF or other non-PNG file, see my earlier essay on "OE8BitImage to PNG." It was a fun bit of reverse engineering.

[page 5]

My web page example had a single hard-coded SMILES. What if I want something more interactive, where the user can input a SMILES and see the depiction image? I'll do this with an HTML form, which sends the "smiles" parameter to the "/depict" service on the web browser. This is the same service I used for the HTML image.

[page 6]

Viewing just the image is very static. The image just sits there. I would rather see the structure I submitted and also have a form for submitting a new SMILES to depict. In this case I'll submit the form to a new "/show_depict" handler, which will respond with HTML that includes an img element for requested SMILES and includes the form for doing a new "/show_depict" depiction. Note that this requires two requests to the server; the first to "/show_depict" for the HTML and and the second to "/depict" to get the depiction image.

[page 7]

By using HTML forms I've now advanced to HTML 2.0, which was formally specified in 1995. At the end of that year, Netscape Navigator introduced Javascript, which people originally used for doing form input validation. People make mistakes, and while hopefully the server is doing a layer of sanity checking, it still may take some time for the sent form to go the server and come back again. A Javascript program can make things feel more interactive by sitting inside of the web page where it can access HTML and form elements and handle events like ""form submitted."

The only difference in the HTML is the img src URL, so instead of submitting the form each time to the server, I'm going to listen for the "submit" event using Javascript. When that occurs I'll reach into the document (the formal term is the "document object mode", or "DOM") and change the URL, then tell the browser that there's no need to do the actual submission.

[page 8]

Here's how the HTML form looks like. I've added an "onsubmit" handler to the form, which is a bit of Javascript to call on form submission, and I've given identifiers to the SMILES input text box and to the depiction image, to make them easier to find later on.

[pages 9 and 10]

Here's the HTML fully fleshed out. The "onsubmit" handler calls the Javascript function "update_image()". This gets the text from the "smiles" field and finds the "description" image element. It then sets the description's "src" field to a URL based on the SMILES. I use the "escape" function because the user input may contain characters that have special meaning in URLs, like "/". The "return false" tells the browser that it does not need to send the form to the server.

This is out-dated!

[page 11]

I could go into more details but I won't, because what you see here is out-dated. This was state of the art about 6-8 years ago, but in practice there are problem with it. For example, it's hard to make components with it, like the ability to have multiple depictors in the same page. It mixes HTML and Javascript in the same file, which is harder to develop with and it confuses text editors. There's also the unfortunate problem that browsers have bugs. IE is known for its memory leaks in the face of circular references. There are workarounds, but learning them all takes time.

Thankfully there are better ways to develop Javascript tools. Most of the best practices and workarounds are available through Javascript libraries like jQuery, YUI, and MochiKit.

[page 12]

Here's the same form rewritten for use with jQuery. You see at the time I include the jQuery code, which is available as a single file from this URL. I then have a script block that sets up the interactive page. What this is saying is:

  When the document is fully loaded (that is, all the HTML has been parsed),
    Find the elements with tag name "form" (there is only one)
      When its submit button is pressed ...
        call this anonymous function.  ("anonymous" means "does not have a name")

Javascript allows "$" in a variable name. jQuery defines a special function named just "$" combines a selection language and a wrapper object. "$(document)" means "select the document object from Javascript and wrap it inside of a jQuery context." That context is what lets you do ".ready()" and ".submit()". If the function call gets a string then jQuery uses a sort of XPath language to select fields from the DOM. '$("form")' means "select all HTML elements named "form" while '$("#smiles")' means "select all HTML elements where the 'id' is 'smiles'.

The anonymous submit function does the following:

Select the "#smiles" element (that's the element with id 'smiles').
Get it's "val" property, which in this case is the input text for that field.
Escape it to make a depiction URL.
Assign the URL to the "src" attribute of the "#depiction" element (the element
   with id 'depiction')
Finally, "return false" to tell the browser it does not need to send the form.

This code is bit longer than the preceeding Javascript example, but that's only going to be the case for very simple examples like this. Otherwise the jQuery code is usually shorter, more succinct, and easier to understand, once you understand how jQuery works. It also separates the Javascript code completely from the HTML.

[pages 13 and 14]

I can make the interface still more interactive. The OpenEye depiction code is quite fast. Instead of waiting for the form submission I could update the image src URL after every keystroke. Sadly, this turns out to be complicated to do correctly. Javascript supports "keydown", "keyup", and "keypress" events, which sound like the right things. The problem is, the text isn't updated until after the event succeeds. Why? Because it's used for key input filtering. The Javascript handler can "return false" to tell the browser to ignore a given key.

It's also complicated because things like "control-v" for "paste", and "home" for "go to start of the input", and the backspace key are also handled as key input, but aren't simple changes to the text field. The easiest solution I found was to wait until after the event happens, let the browser do whatever is appropriate to the key input, and only then examine the contents of the text field.

I'm going to use MochiKit for this, which is another Javascript library. MochiKit is great for Python programmers like me because it makes Javascript feel more like Python. It adds mostly core-level libraries to simplify event handling, iteration, and DOM manipulation. There is some functionality overlap to jQuery, but they do work pretty well together. The only thing to watch out for is by default both want to define the '$' function.

Don't be put off by seeing that MochiKit's last release was in 2006. It's a stable, well-developed and mature library.

I import the MochiKit functionality with the usual <script> tag. Once the document is loaded and ready, I add a "keydown" handler on the "#smiles" element. This anonymous function will be called after every key press. But all the function does is ask the browser to call another function, "update_image", 0 seconds later. The browser adds it to the wait queue of function calls be done at some time in the future. These calls are only done when no event handler is being processed. (The Javascript code in a page is single-threaded by design.) The result is that "update_image" will be called most likely as soon as the keydown event is processed.

The "update_image" function should be very familiar. It's the code that extracts the text value from the "#smiles" element, constructs the image URL, and assigns it to the "#depiction" element's "src" field.

One of the many nice things about the OpenEye toolkit is it will handle partial SMILES strings as input. OEParseSmiles parses as much as it can understand and return True on success. If it returns False then the SMILES was not correct or was incomplete, but the molecule object will contain as much of the molecule as it was able to parse. It's a valid molecule object, and the depiction code has no problems laying it out.

JSON request

[page 15]

The example I depicts the molecule while typing in the SMILES string. I'm going to change it a bit and also display the IUPAC name for the SMILES string using OpenEye's naming code on the server. Again, this will be a highly interactive server where I can see the name while I am typing it.

This is a bit more complex than the image example because I need data from the server. I want to know if the SMILES string is a valid SMILES string (it could be an incomplete input) and the IUPAC name for the molecule, or at least as much of the input as OEParseSmiles could understand.

I'll do this by creating a new web service called "smi2name." It's a normal GET request that takes a "smiles" as its only input parameter and return a "JSON" document. JSON is a special data format in "JavaScript Object Notation", which is very fast for web browsers to handle as they already have code for dealing with Javascript code. This is a common technique in modern Javascript code and most libraries, including MochiKit, have code to make it easy to do.

At the bottom you can see an example JSON document that would be returned by this service. It's a Javascript dictionary containing a "status" field, which is either "valid" or "invalid", and a "name" field, containing the OpenEye's IUPAC name assignment.

[page 16]

My one change to the HTML is to include a "Name: " field below the image, which is where the IUPAC name will go. That's a label and an empty text span element, with the id "compound_name."

[page 17]

Here's the modified Javascript code for that case. You'll recognize the first half of the code. The "loadJSONDoc" is the MochiKit call to simplify making a JSON request. I give it the URL to call and an optional dictionary of query arguments. Requests like this are asynchronous, meaning that the Javascript has asked the browser to fetch the URL but it's not going to get the result right away.

Instead, MochiKit returns what's called a "Deferred" object. I can configure it to call "show_compound_name" once the JSON document has been fetched and parsed into a normal Javascript data structure.

The callback function is named "show_compound_name". The JSON document contains a Javascript dictionary, so I can get figure out if the input SMILES was valid or not and color the result black if it was valid or red if it was invalid.

The last line of real code shows jQuery's function call chaining. The '$("#compound_name")' selects the element with id "compound_name", which is the text span. The ".text(smi2name_result.name)" gets the "name" from the results dictionary and assigns it to the text content of the spam element. This is what displays the name to the user.

The result of calling ".text(...)" is the same query object. I can use it to change other properties of my selection. So I'll change the CSS "color" property and so it shows the red or black status value.

[page 18]

In case you're curious, here's most of the code on the server to implement "smi2name" using TurboGears. I left out only the scaffolding code that TurboGears writes for you and the lines to import the right OpenEye libraries into the Python module.

Demo

[page 19]

Last summer I spent a month learning how to use modern Javascript tools. My experimental test case was a 2D structure viewer widget. I developed a demo for it, and recorded an screencast.

[page 20]

The hardest part to get working was the mouseover support for the depiction. I ended up making extensive use of CSS, which tells the web page how to lay out a page. I used 4 layers on top of each other to get things working. The bottom layer is the Ogham depiction, and is the PNG image you've seen elsewhere. This is generated on the web server but only needs to change if the SMILES or the image size changes.

On top of that, the third layer is a semi-transparent image showing which atoms have been selected, either from mouse selection or from the SMARTS/atom index selection. This must occur on the server because that's what understands SMARTS, and must be recreated if the size or SMILES changes.

The top two layers are for mouseover support. The top layer is a transparent image containing only an image map. Each hotspot on the map is a circle, centered on the center of an atom. I use this to tell if the mouse is over an atom. If the image size changes then I make a JSON request to the server to get the new atom locations and scaled atom radius.

The second layer contains a small PNG with a circle and a transparent background. There's a bit of Javascript which connects the "mouseover"/"mouseout" events from the first layer to move the circle around in the second layer. The result is a fast, client-side highlighting of the atom the mouse is over.

The four layers are aligned so to the user it looks like one coherent view, despite the implementation complexity.


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB