Dalke Scientific Software: More science. Less time. Products

XML-RPC

There are several different architectures for web applications. The most common goes by the name ReST, short for the ungainly phrase "Representational State Transfer". This is the style I like the best. (The last link has the best overview.)

Another style is based around RPCs, short for "Remote Procedure Calls". SOAP is one such example. That used to be short for "Simple Object Access Proctocol" but it's no longer simple. It's complicated, which means people end up needing tools to help build SOAP-based sytems and there's differences between the different implementations causing interoperability problems.

XML-RPC is a simple RPC style. (BTW, read about The Right Thing vs. Worse is Better as two contrasting development philosophies.) You should think of XML-RPC as something similar to a form POST request using HTTP, but with the ability to send somewhat more complex data. For example, you can send integers and lists, and not only strings. This makes it easier to do machine-to-machine communications.

CherryPy support XML-RPC, so it's easy to write an XML-RPC server, and the standard Python includes an xmlrpc client library, making XML-RPC the easiest RPC to set up. Even without CherryPy you could use the built-in SimpleXMLRPCServer library.

Why?

Why (or when) would you want to set up an XML-RPC server? I recommend it be used only for internal/in-house servers where you have influence on both the client and server code. The most frequent case for an XMl-RPC server is to forward services which only exist on a few machines. For example, to minimize the number of licenses or support overhead needed by forwarding requests to a central machine, or to provide access to software that is other operating system specific (eg, Linux programs that need to use an old program only available on SGIs.) The other case is to reduce startup costs.

A simple example

I'll start with a simple (and boring) function which adds two numbers. Here's the CherryPy server for it.

from cherrypy.lib.filter.xmlrpcfilter import XmlRpcFilter
from cherrypy import cpg

class Root:
    _cpFilterList = [XmlRpcFilter()]
    def add(self, a, b):
        return a+b
    add.exposed = True
 
cpg.root = Root()
cpg.server.start()
The major change was the addition of the XmlRpcFilter, which tells CherryPy how to convert an XML-RPC request into a normal method lookup.

I'll save this code to the file named addserver.py and start it up in one shell window:

% python addserver.py
2005/09/14 01:49:33 CONFIG INFO Server parameters:
2005/09/14 01:49:33 CONFIG INFO   logToScreen: 1
2005/09/14 01:49:33 CONFIG INFO   logFile: 
2005/09/14 01:49:33 CONFIG INFO   protocolVersion: HTTP/1.0
2005/09/14 01:49:33 CONFIG INFO   socketHost: 
2005/09/14 01:49:33 CONFIG INFO   socketPort: 8080
2005/09/14 01:49:33 CONFIG INFO   socketFile: 
2005/09/14 01:49:33 CONFIG INFO   reverseDNS: 0
2005/09/14 01:49:33 CONFIG INFO   socketQueueSize: 5
2005/09/14 01:49:33 CONFIG INFO   threadPool: 0
2005/09/14 01:49:33 CONFIG INFO   sslKeyFile: 
2005/09/14 01:49:33 CONFIG INFO   sessionStorageType: 
2005/09/14 01:49:33 CONFIG INFO   staticContent: []
2005/09/14 01:49:33 HTTP INFO Serving HTTP on socket: ('', 8080)
In another window I'll start the Python client and use the XML-RPC client library to first make a proxy to the server at the given URL. From that I can call the add method. The proxy wraps up the function parameters and forwards them to the server. The server's XmlRpcFilter gets the request, unpacks the values, and forwards the function parameters to the add method.
>>> import xmlrpclib
>>> server = xmlrpclib.Server("http://localhost:8080/")
>>> server.add(5,6)
11
>>> server.add("Hello, ", "NBN!")
'Hello, NBN!'
>>> 
Note that Python's XML-RPC libraries passes the numbers in as numbers and the strings in as strings. The server code doesn't care, so long as "+" works on the input values.

Errors

What happens if the server doesn't like the XML-RPC request? The default, at least for CherryPy-2.0 is to return a plain text document with the program traceback. I'll use the "verbose" option so you can see what's going on, along with some reformating

>>> server = xmlrpclib.Server("http://localhost:8080", verbose=1)
>>> server.add(5,"six")
connect: (localhost, 8080)
connect fail: ('localhost', 8080)
connect: (localhost, 8080)
send: 'POST /RPC2 HTTP/1.0\r\nHost: localhost:8080\r\n
User-Agent: xmlrpclib.py/1.0.1 (by www.pythonware.com)\r\n
Content-Type: text/xml\r\nContent-Length: 195\r\n\r\n'
send: "<?xml version='1.0'?>\n<methodCall>\n
<methodName>add</methodName>\n<params>\n
<param>\n<value><int>5</int></value>\n
</param>\n<param>\n<value><string>six
</string></value>\n</param>\n</params>\n
</methodCall>\n"
reply: 'HTTP/1.0 200 OK\r\n'
header: Content-Length: 564
header: Server: CherryPy/2.0.0
header: Date: Wed, 14 Sep 2005 00:03:20 GMT
header: Content-Type: text/plain
body: 'Traceback (most recent call last):\n  File "/System/Library/Frameworks/
Python.framework/Versions/2.3/lib/python2.3/site-packages/cherrypy/_cphttptools.py", 
line 211, in doRequest\n    handleRequest(cpg.response.wfile)\n  File 
"/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/
site-packages/cherrypy/_cphttptools.py", line 405, in handleRequest\n
    body = func(*(virtualPathList + cpg.request.paramList), **(cpg.request.paramMap))\n
  File "addserver.py", line 7, in add\n
    return a+b\nTypeError: unsupported operand type(s) for +: \'int\' and \'str\'\n'
Traceback (most recent call last):

  File "<stdin>", line 1, in ?
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1029, in __call__
    return self.__send(self.__name, args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1316, in __request
    verbose=self.__verbose
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1080, in request
    return self._parse_response(h.getfile(), sock)
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 1214, in _parse_response
    p.feed(response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.3/lib/python2.3/xmlrpclib.py", line 528, in feed

    self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: syntax error: line 1, column 0

>>> 
This error isn't very helpful. The XML-RPC client expects the server to return a XML-RPC error message when something is wrong, and not some strange text file. To change the default read the documentation and replace the default implementation of _cpOnError. For now don't worry about it.

Residue counts

Here's something a bit more sequence specific

from cherrypy.lib.filter.xmlrpcfilter import XmlRpcFilter
from cherrypy import cpg

class Root:
    _cpFilterList = [XmlRpcFilter()]

    def counts(self, seq):
        d = {}
        for c in seq:
            d[c] = d.get(c, 0) + 1
        return d
    counts.exposed = True
 
cpg.root = Root()
cpg.server.start()
and here's what happened when I called it.
>>> server = xmlrpclib.Server("http://localhost:8080")
>>> server.counts("ATATCG")
{'A': 2, 'C': 1, 'T': 2, 'G': 1}
>>> 

CherryPy's regular and XML-RPC interfaces both map the calls into attribute lookups. I can get to the counts() method directly as a hyperlink. When I do that I see:

{'A': 2, 'C': 1, 'T': 2, 'G': 1}
What's going on here? Some nomenclature helps. Serialization is the process of turning a data structure into a string, or more technically a sequence of bytes. Only bytes can be passed over the network so the XML-RPC request was serialized into a format (based on XML) and sent to the server. The server received the request and converted it back into a data structure. This is called deserialization. The deserialized data structure is used to get a CherryPy handler method and call it with the requested parameters. The return value of the method is serialized and sent back to the client.

When you follow the hyperlink you use the default serializations and not the XML-RPC serializations. The default gets the GET or POST parameters and assumes any values are strings. (Form requests don't support any other data type.) For the counts() method this is fine because it only expects a string. For the return the default CherryPy serialization converts the returned data structure into a string using the str() built-in function.

Hence what you see is the dictionary converted and displayed as a string.

Different results depending on the request

Sometimes you may want the same method to return different things depending on how it was called. Suppose I want to display the counts in an HTML table if the client does a normal GET or POST request and return a dictionary if it's an XML-RPC request. I can distinguish between the two by checking the "isRPC" attribute.

from cherrypy.lib.filter.xmlrpcfilter import XmlRpcFilter
from cherrypy import cpg
from cgi import escape

# Given a sequence return a dictionary where the keys are the
# characters in the sequence and the values the number of times that
# character exists in the sequence.
def compute_counts(seq):
    d = {}
    for c in seq:
        d[c] = d.get(c, 0) + 1
    return d

# Convert a dictionary into an HTML table.
def format_dictionary(key_header, value_header, seq):
    yield "<table border='1'>\n"
    yield "<tr><th>%s</th><th>%s</th></tr>\n" % (escape(key_header),
                                                 escape(value_header))
    items = seq.items()
    items.sort()
    for k, v in items:
        yield "<tr><td>%s</td><td>%s</td></tr>\n" % (escape(str(k)),
                                                     escape(str(v)))
    yield "</table>\n"

class Root:
    _cpFilterList = [XmlRpcFilter()]

    def counts(self, seq):
        d = compute_counts(seq)
        if cpg.request.isRPC:
            # XMl-RPC request
            return d
        else:
            # normal request
            return format_dictionary("symbol", "count", d)
        
    counts.exposed = True
 
cpg.root = Root()
cpg.server.start()

Opening a URL

Python's urllib2 module (and the older urllib module) lets you connect to a given URL and fetch the data through a file-like interface. Fuzzyman has a great tutorial. Here's a GET request:

>>> import urllib2
>>> f = urllib2.urlopen("http://localhost:8080/counts?seq=TGTCATAAG")
>>> print f.read()
<table border='1'>
<tr><th>symbol</th><th>count</th></tr>
<tr><td>A</td><td>3</td></tr>
<tr><td>C</td><td>1</td></tr>
<tr><td>G</td><td>2</td></tr>
<tr><td>T</td><td>3</td></tr>
</table>

>>> 
Doing a POST request is a bit more complicated, but not hard. You need to serialize your parameters into the correct form, like this:
>>> import urllib
>>> import urllib2
>>> values = {"seq": "ATCTCTCAACCGT"}
>>> data = urllib.urlencode(values)
>>> data
'seq=ATCTCTCAACCGT'
>>> f = urllib2.urlopen("http://localhost:8080/counts", data)
>>> print f.read()
<table border='1'>
<tr><th>symbol</th><th>count</th></tr>
<tr><td>A</td><td>3</td></tr>
<tr><td>C</td><td>5</td></tr>
<tr><td>G</td><td>1</td></tr>
<tr><td>T</td><td>4</td></tr>
</table>

>>> 
(If you need to upload a file take a look at Fuzzyman's upload.py and upload_test.py utilities.)

Different formats

I said ReST was a different approach than RPC, though at this level that difference is little more than a matter of how the serialization is done. If I didn't have (or want) XML-RPC then it might be harder for a program to get the data from a given service. For example, in this case I would need to extract the correct fields from the HTML table.

Tools like BeautifulSoup (see also my example) simplify data extraction from HTML, but it still requires some effort.

The problem is the HTML output is meant for people and not computers. People are flexible so if the service provider decides to change a few fields, perhaps to make a page more readable or add more external links then people can figure out the changes, while software cannot.

To solve this I would need some way so that the two target "users" can get the correct format - people (web browsers) get the HTML while programs get some other more easily parsed format. This may still be the XML-RPC schema, just using the serialization part of XML-RPC and not the communications part. Or I could decide that a tab-delimited file is better.

I'll go the tab-delimited CSV file route. One solution is to add a "format=..." option to the URL. For example

http://localhost:8080/counts?seq=ATTGGCCCA&format=text
Another is to use the "Accept" field requested by the client.
from cherrypy.lib.filter.xmlrpcfilter import XmlRpcFilter
from cherrypy import cpg
from cgi import escape

# Given a sequence return a dictionary where the keys are the
# characters in the sequence and the values the number of times that
# character exists in the sequence.
def compute_counts(seq):
    d = {}
    for c in seq:
        d[c] = d.get(c, 0) + 1
    return d

# Convert a dictionary into an HTML table.
def format_dictionary_html(key_header, value_header, seq):
    yield "<table border='1'>\n"
    yield "<tr><th>%s</th><th>%s</th></tr>\n" % (escape(key_header),
                                                 escape(value_header))
    items = seq.items()
    items.sort()
    for k, v in items:
        yield "<tr><td>%s</td><td>%s</td></tr>\n" % (escape(str(k)),
                                                     escape(str(v)))
    yield "</table>\n"

def format_dictionary_plain(seq):
    items = seq.items()
    for k, v in items:
        yield "%s\t%s\n" % (k, v)


class Root:
    _cpFilterList = [XmlRpcFilter()]

    def counts(self, seq):
        d = compute_counts(seq)
        print dir(cpg.request.headerMap)
        print (cpg.request.headerMap)
        if cpg.request.isRPC:
            # XML-RPC request
            return d
        elif cpg.request.headerMap["Accept"] == "text/plain":
            return format_dictionary_plain(d)
        else:
            # normal request
            return format_dictionary_html("symbol", "count", d)
        
    counts.exposed = True
 
cpg.root = Root()
cpg.server.start()
There's a certain elegance to having the client and the server negotiate which format to use. On the other hand you can't describe the alternate form as a simple URL. You also need to include which "Accept" to send, for example:
% curl 'http://localhost:8080/counts?seq=AATT'
<table border='1'>
<tr><th>symbol</th><th>count</th></tr>
<tr><td>A</td><td>2</td></tr>
<tr><td>T</td><td>2</td></tr>
</table>

% curl -H "Accept: text/plain" 'http://localhost:8080/counts?seq=AATT'
A       2
T       2
% 

Service discovery

I'll mention BioMOBY and Zeroconf.



Copyright © 2001-2013 Andrew Dalke Scientific AB