Suppose you want to make the systematic naming function available to other programs over the network. There is a huge number of ways to do it. You can program directly to the socket layer or use one of the many communications packages. A short list of the language independent libraries includes CORBA, SOAP, PVM, MPI and XML-RPC and the Python specific ones include Pyro, Twisted's Prospective Broker, or roll your own with Python's pickle or marshal protocols.
For most things I suggest using XML-RPC. It's a straight-forward spec and it's been around for a while so the various bugs have been worked out making it stable and relatively language neutral. As a big plus, Python ships with client and server XML-RPC libraries making it very simple to use.
For reference, here's the smi2name.py module I'll use for this essay. It's my working version of the subprocess-based version I developed in an earlier essay. The major difference is I decided to add the check for known-to-be-illegal characters as part of the code. You may recall that I go back and forth on where to put that test. It depends on where and how the code is going to be used. I've decided it's going to be close enough to untrusted input that the extra test is appropriate. I've also added code to detect the a few new error messages that might arise from bad SMILES strings
import re, select import subprocess import os, signal MOL2NAM = "/Users/dalke/tmp/ogham/mol2nam" class NamingError(Exception): pass # Used to find the character position that cause the problem _error_pos_pat = re.compile(r"^Warning: ( *)\^", re.MULTILINE) # Check for characters other than printable ASCII _unexpected_char_pat = re.compile(r"[^\040-\0176]") def _find_error(text): errmsg = "Cannot parse SMILES" if "\nWarning: Unclosed branch." in text: errmsg = "Unclosed branch" elif "\nWarning: Unclosed ring." in text: errmsg = "Unclosed ring" elif text.startswith("Warning: Unable to Kekulize SMILES"): # Strange: it's the first line of the error message ... errmsg = "Unable to Kekulize SMILES" elif "\nWarning: Incorrect reaction role" in text: errmsg = "Incorrect reaction role" m = _error_pos_pat.search(text) if m: errpos = len(m.group(1)) + 1 errmsg = errmsg + " at position %d" % errpos return errmsg class Smi2Name: def __init__(self, executable = None, timeout = None): # a subprocess.Popen connected to mol2nam self._mol2nam = None if executable is None: executable = MOL2NAM self.executable = executable self.timeout = timeout def _get_mol2nam(self): if self._mol2nam is None: mol2nam = subprocess.Popen( (self.executable, "-"), stdin = subprocess.PIPE, stdout = subprocess.PIPE, stderr = subprocess.PIPE, close_fds = True) # skip the three header lines mol2nam.stderr.readline() mol2nam.stderr.readline() mol2nam.stderr.readline() self._mol2nam = mol2nam return self._mol2nam def smi2name(self, smiles): """convert a SMILES string into an IUPAC name""" if smiles == "": return "vacuum" m = _unexpected_char_pat.search(smiles) if m: raise NamingError("Unexpected character at position %d" % (m.start(0)+1,)) mol2nam = self._get_mol2nam() try: mol2nam.stdin.write(smiles + "\n") except IOError: # coprocess died since the last call? Restart the connection self._mol2nam = None mol2nam = self._get_mol2nam() mol2nam.stdin.write(smiles + "\n") mol2nam.stdin.flush() rlist, _, _ = select.select([mol2nam.stdout, mol2nam.stderr], , , self.timeout) if mol2nam.stderr in rlist: # Tells mol2nam to quit mol2nam.stdin.close() stderr_text = mol2nam.stderr.read() # Doing this will restart the subprocess the next time through self._mol2nam = None raise NamingError(_find_error(stderr_text)) if mol2nam.stdout in rlist: name = mol2nam.stdout.readline().rstrip() if "BLAH" in name: raise NamingError("Unsupported structure") return name # Timeout reached. Kill the child and restart. try: os.kill(mol2nam.pid, signal.SIGTERM) except OSError: # Already died? pass self._mol2nam = None raise NamingError("timeout reached") # Defer instantiation of the wrapper until it's needed. # This lets other code change MOL2NAM if needed, but changes # will only work if done before calling this function. _smi2name = None def smi2name(smiles): """convert a SMILES string into an IUPAC name""" global _smi2name if _smi2name is None: _smi2name = Smi2Name().smi2name return _smi2name(smiles) def test(): for smi, name, errmsg in ( ("C", "methane", None), ("C"+chr(127)+"S", None, "Unexpected character at position 2"), ("CC"+chr(3), None, "Unexpected character at position 3"), ("S", "hydrogen sulfide", None), ("U", None, "Cannot parse SMILES at position 1"), ("CC1", None, "Unclosed ring at position 3"), ("C", "methane", None), ("C"*1000 , "kiliane", None), ("C"*32764 + "(C)", None, "Unclosed branch"), ("C\nC", None, "Unexpected character at position 2"), ("CCCC(C", None, "Unclosed branch at position 6"), ("CCCCCC)C", None, "Cannot parse SMILES at position 7"), ("[U]", "uranium", None), ("", "vacuum", None), ("c1ccccc1", "benzene", None), ("c1cccccc1", None, "Unable to Kekulize SMILES"), ("O>C>N", "oxidane; carbane; azane", None), ("OC>C.C", None, "Incorrect reaction role at position 6"), ("O>CC>N>U", None, "Cannot parse SMILES at position 7"), ("C1CC23CC4CC3C1C(C2)CC4", None, "Unsupported structure"), ("C#N", "hydrogen cyanide", None)): computed_name = computed_errmsg = None try: computed_name = smi2name(smi) except NamingError, err: computed_errmsg = str(err) if (name != computed_name or errmsg != computed_errmsg): raise AssertionError("SMILES: %r expected (%r %r) got (%r %r)" % (smi, name, errmsg, computed_name, computed_errmsg)) print "All tests passed." if __name__ == "__main__": test()
And here's the first version of an XML-RPC server for it, using the standard SimpleXMLRPCServer module. Note that it's listening on port 8000 of the local machine.
import SimpleXMLRPCServer import smi2name server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8000)) server.register_function(smi2name.smi2name, "smi2name") server.serve_forever()I called it smi2name_server.py because I'm creative that way. Run it from the command-line like this:
% python smi2name_server.pyIt'll just sit there waiting for requests.
In another shell window start Python and import the XML-RPC client library. I'll make a Server instance, which makes a wrapper to the XML-RPC server on the given URL.
>>> import xmlrpclib >>> server = xmlrpclib.Server("http://localhost:8000/") >>> server.smi2name("C") 'methane' >>>If it worked for you then the server window will print a statement like this
localhost - - [21/Apr/2005 10:25:03] "POST / HTTP/1.0" 200 -Because I don't find this message all that useful, later on I'll show how to disable it.
If you didn't set MOL2NAM to the right location then you probably got an exception on the client-side like this
Traceback (most recent call last): File "<stdin>", line 1, in ? File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1029, in __call__ return self.__send(self.__name, args) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1316, in __request verbose=self.__verbose File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1080, in request return self._parse_response(h.getfile(), sock) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1219, in _parse_response return u.close() File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 742, in close raise Fault(**self._stack) xmlrpclib.Fault: <Fault 1: 'exceptions.OSError:[Errno 2] No such file or directory'>By default, exceptions on the XML-RPC server get sent back to the client and converted into a local exception.
To kill the server hit control-C in its window. You may need to hit it twice; I don't know why. You do not need to exit the client because xmlrpclib uses a new HTTP connection for every request. It can't tell the difference if the server shuts down then restarts, though code that uses transfered data may be able to tell the difference.
The problem I showed is actually two problems. The first is the misconfiguration of MOL2NAM but the second is that the error isn't discovered until someone uses the service. It's best to fail early, but not too early. As a library it's best to fail at the first use, because the library might not be used. But in this server where everything is meant to be used it's best to fail when the server starts, to indicate that it's not functional.
I considered just checking if the executable file existed but decided it was best to just call the function and see if it returns a correct value. Here's the new version of the server code.
import SimpleXMLRPCServer import smi2name # Test that the library works name = smi2name.smi2name("C") if name != "methane": raise AssertionError("'C' returns %r" % (name,)) server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8000)) server.register_function(smi2name.smi2name, "smi2name") server.serve_forever()and when it's run with a misconfigured MOL2NAM setting
% python smi2name_server.py Traceback (most recent call last): File "smi2name_server.py", line 5, in ? name = smi2name.smi2name("C") File "/Users/dalke/novartis/smi2name.py", line 107, in smi2name return _smi2name(smiles) File "/Users/dalke/novartis/smi2name.py", line 62, in smi2name mol2nam = self._get_mol2nam() File "/Users/dalke/novartis/smi2name.py", line 45, in _get_mol2nam close_fds = True) File "/Users/dalke/novartis/subprocess.py", line 600, in __init__ errread, errwrite) File "/Users/dalke/novartis/subprocess.py", line 1053, in _execute_child raise child_exception OSError: [Errno 2] No such file or directoryI could also have used smi2name.test() but decided that that would be overkill. Also, test code like that is usually not meant to be part of the public API to a module. Perhaps I should have named it _test().
After fixing the MOL2SMI setting and starting the server I went back to the Python interactive window with the xmlrpclib client already running:
>>> server.smi2name("CC") 'ethane' >>> server.smi2name("c1ccccc1O") 'phenol' >>>Congratulations, you have a working server.
Sometimes if you quit the server and restart it you'll get a message like the following:
% python smi2name_server.py Traceback (most recent call last): File "smi2name_server.py", line 9, in ? server = SimpleXMLRPCServer.SimpleXMLRPCServer(("localhost", 8000)) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/SimpleXMLRPCServer.py", line 450, in __init__ SocketServer.TCPServer.__init__(self, addr, requestHandler) File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/SocketServer.py", line 330, in __init__ self.server_bind() File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/SocketServer.py", line 341, in server_bind self.socket.bind(self.server_address) File "<string>", line 1, in bind socket.error: (48, 'Address already in use') %This happens because of certain guarantees made by the TCP specification. Even after the connection is closed the operating system keeps it open for a bit longer in case, for instance, the client asks the server to resend the close message. The operating system will release the socket after a short time; from about 30 second to 4 minutes, depending on various settings.
I've tried to figure out just why things didn't close nicely but haven't managed to track it down. I think it's a timing problem when the server closes the connection before the client.
If you need the ability to restart you should do a few things. First, always shut down the server. This won't fix the problem but it's a good practice. I'll make the call in a try/finally block to ensure that it's always called.
try: server.serve_forever() finally: server.server_close()
Second, there's a configuration option called SO_REUSEADDR which tells the operating system to allow code to connect to a socket even if it's waiting for other potential packets. The SimpleXMLRPCServer class has a class variable named allow_reuse_address which when True tells the instance to set that option. Because it's used during the constructor and there's no constructor argument the options are to implement a new class whose constructor sets that value first then calls the base class constructor, or a new class which sets that class variable. I chose the second of these. Note also that I disable the logging because I didn't find the information useful.
import SimpleXMLRPCServer import smi2name class Server(SimpleXMLRPCServer.SimpleXMLRPCServer): allow_reuse_address = True # Test that the library works name = smi2name.smi2name("C") if name != "methane": raise AssertionError("'C' returns %r" % (name,)) server = Server(("localhost", 8000), logRequests = False) server.register_function(smi2name.smi2name, "smi2name") try: server.serve_forever() finally: server.server_close()
Using SO_REUSEADDR does have its downsides. As that page I mentioned earlier points out, it can cause other sorts of errors when trying to reconnect from the same machine and can cause security problems on some operating systems.
The above code is enough for personal use. Configuration changes require editing code. If it's used by more people and on different machines then it should be a bit more configurable on the command-line. To parse the command-line options use the optparse module from Python's standard library. Here's a version that lets users pick which host interface, port number, and mol2nam executable to use. To implement that last one I create a new Mol2Smi instance, which is prefered over changing smi2nam.MOL2NAM.
import SimpleXMLRPCServer import optparse import smi2name class Server(SimpleXMLRPCServer.SimpleXMLRPCServer): allow_reuse_address = True def run_server(addr, executable): smi2name_func = smi2name.Smi2Name(executable).smi2name # Test that the library works name = smi2name_func("C") if name != "methane": raise AssertionError("'C' returns %r" % (name,)) server = Server(addr, logRequests = False) server.register_function(smi2name_func, "smi2name") print "Starting smi2nam XML-RPC server at", print repr("http://%s:%d/" % (addr, addr)) try: server.serve_forever() finally: server.server_close() def main(): parser = optparse.OptionParser(conflict_handler="resolve") parser.add_option("-h", "--host", dest="host", default="localhost", help="host name of network interface") parser.add_option("-p", "--port", dest="port", default=8000, type="int", help="port number to use") parser.add_option("-e", "--executable", dest="executable", default=smi2name.MOL2NAM, help="path to mol2nam executable") (options, args) = parser.parse_args() if args: parser.error("unknown option %r" % (args,)) run_server( (options.host, options.port), options.executable ) if __name__ == "__main__": main()and here is the help text from using --help.
% python smi2name_server.py --help usage: smi2name_server.py [options] options: --help show this help message and exit -hHOST, --host=HOST host name of network interface -pPORT, --port=PORT port number to use -eEXECUTABLE, --executable=EXECUTABLE path to mol2nam executable %The conflict_handler="resolve" is needed because by default "-h" is another command-line option for help.
This is the starting off point for many features. It's easy to see how to add new services. If the server gets heavily used though then there will be problem. It is implemented with a single thread, which means it can only process one request at a time. The operating system will queue up a small number of requests (about three) but at some that will get filled up as well.
There are several ways to handle that. You can use multithreading, you can spawn off a new process to handle each request, or you can use a reactor-style framework like Twisted. If it's a multiprocessor box you might want to start several instances of mol2nam all used by one server. Or you can shift the problem upstream and have something like pythondirector. Clients point to the pythondirector instance which forwards the request to the next available server. If that server fails or is busy it tries the next server until one is available or there aren't any servers left to try.
The choice of what approach is complicated and depends on many factors. But don't worry about deciding upon the solution until you're sure you'll have a problem.
By the way, even this code will hang in a few strange ways. Suppose the executable points to the origianal version of mol2nam (without the flush) or to something like /bin/cat which accepts the given input but buffers its output. The wrapper will sit, blocked, waiting in _get_mol2nam() to read the header line from stderr. That can be fixed with careful use of select, but I don't think it's important enough to worry about.
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2013 Andrew Dalke Scientific AB