Dalke Scientific Software: More science. Less time. Products

ElementTree and SBML

I'll start this by going to my "Useful and New Modules" talk [key.tgz| PPT| PDF], which has a section on using ElementTree to parse XML files. ElementTree is a pythonic XML parser. The word "pythonic" is like the word "beautiful". It's hard to quantify and is mostly a matter of personal esthetics. The general idea is pythonic APIs should feel like other pythonic APIs; experience should help guide you in how to use the API. Like art, it's hard to train. There are several pythonic XML parsers besides ElementTree. ET was the most widely used of these and is now part of Python 2.5.

I talked about ElementTree a few months ago and last year so I'll not go into details here. Instead I'll show how to use ElementTree to parse an SBML file I got from Brett.

I'll start by parsing the file into an ElementTree object. The tree has a method called "getroot()" which returns the root node of the document.

>>> from xml.etree import ElementTree
>>> ElementTree.parse(open("BIOMD0000000023.xml"))
<xml.etree.ElementTree.ElementTree instance at 0x585aa8>
>>> tree = _
>>> tree.getroot()
<Element {http://www.sbml.org/sbml/level2}sbml at 595490>
>>> 
Each node in the tree has a few properties. "tag" is the element tag, in Clark notation. (That's the {curly braces} part, containing the namespace.) "attrib" is a dictionary containing the attributes of the start element. It's the empty dictionary if there are no attributes.
>>> root = tree.getroot()
>>> root.tag
'{http://www.sbml.org/sbml/level2}sbml'
>>> root.attrib
{'version': '1', 'metaid': 'metaid_0000001', 'level': '2'}
>>> 

An SBML file contains models. I can iterate over models in a couple ways. If the root node only has models then I iterate over all the children of the root. ElementTree maps this to Pythons list and iteration protocols, so the following work:

>>> len(root)
1
>>> for node in root:
...   print node.tag
... 
{http://www.sbml.org/sbml/level2}model
>>> root[0]
<Element {http://www.sbml.org/sbml/level2}model at 595468>
>>> del root[0]
>>> len(root)
0
>>> 
Of course as I deleted the root I'll need to reload it. The other way to get the children elements is to "find" it. This is a method on every node (and the tree) which implements a small subset of the XPath language
>>> root = ElementTree.parse(open("BIOMD0000000023.xml")).getroot()
>>> root.find("{http://www.sbml.org/sbml/level2}model")
<Element {http://www.sbml.org/sbml/level2}model at 739e18>
>>> root.findall("{http://www.sbml.org/sbml/level2}model")
[<Element {http://www.sbml.org/sbml/level2}model at 739e18>]
>>> 
That 'findall' wasn't very interesting. Here I'll get the species element
>>> root.findall("*/*/{http://www.sbml.org/sbml/level2}species")
[<Element {http://www.sbml.org/sbml/level2}species at 741238>,
<Element {http://www.sbml.org/sbml/level2}species at 7415f8>,
<Element {http://www.sbml.org/sbml/level2}species at 741918>,
<Element {http://www.sbml.org/sbml/level2}species at 741e68>,
<Element {http://www.sbml.org/sbml/level2}species at 10081c0>,
<Element {http://www.sbml.org/sbml/level2}species at 1008580>,
<Element {http://www.sbml.org/sbml/level2}species at 1008828>,
<Element {http://www.sbml.org/sbml/level2}species at 1008af8>,
<Element {http://www.sbml.org/sbml/level2}species at 1008e18>,
<Element {http://www.sbml.org/sbml/level2}species at 100f170>,
<Element {http://www.sbml.org/sbml/level2}species at 100f490>,
<Element {http://www.sbml.org/sbml/level2}species at 100f7b0>,
<Element {http://www.sbml.org/sbml/level2}species at 100fad0>]
>>> for ele in root.findall("*/*/{http://www.sbml.org/sbml/level2}species"):
...   print "%s has initial concentration %s" % (ele.attrib["id"], 
...                                     ele.attrib["initialConcentration"])
... 
Fru has initial concentration 1
Glc has initial concentration 1
HexP has initial concentration 1
Suc6P has initial concentration 1
Suc has initial concentration 1
Sucvac has initial concentration 0
glycolysis has initial concentration 0
phos has initial concentration 5.1
UDP has initial concentration 0.2
ADP has initial concentration 0.2
ATP has initial concentration 1
Glcex has initial concentration 5
Fruex has initial concentration 5
>>> 
I prefer using the correct path rather than the wildcard "*"
>>> root.find("{http://www.sbml.org/sbml/level2}model/"
...           "{http://www.sbml.org/sbml/level2}listOfSpecies/"
...           "{http://www.sbml.org/sbml/level2}species" )
<Element {http://www.sbml.org/sbml/level2}species at 741238>
>>> len(root.findall(
...   "{http://www.sbml.org/sbml/level2}model/"
...   "{http://www.sbml.org/sbml/level2}listOfSpecies/"
...   "{http://www.sbml.org/sbml/level2}species"))
13
>>> 

Another option is to use the getiterator method, which returns all elements which match the given tag.

>>> len(root.getiterator("{http://www.sbml.org/sbml/level2}species"))
13
>>> 
For a more complicated example, the kineticLaw section has an embedded MathML element which looks like this
<math xmlns="http://www.w3.org/1998/Math/MathML">
  <apply>
    <times/>
    <ci> compartment </ci>
    <ci> compartment </ci>
    <apply>
      <divide/>
      <apply>
        <times/>
        <ci> Vmax11 </ci>
        <ci> Suc </ci>
      </apply>
      <apply>
        <plus/>
        <ci> Km11Suc </ci>
        <ci> Suc </ci>
      </apply>
    </apply>
  </apply>
</math>
Here's a bit of code to get the text of the "ci" elements, uniquely. It uses the generator comprehension new to Python 2.5 and the set object added a few releases ago.
>>> set(ele.text for ele in root.getiterator("{http://www.w3.org/1998/Math/MathML}ci"))
set([' Keq6 ', ' Vmax8r ', ' Km4Fru ', ' ADP ', ' Ki8UDP ', ' Km6UDP ',
' Km9Suc ', ' Vmax2 ', ' Km10F6P ', ' Ki8Fru ', ' Km5Fru ', ' Km3Glc ',
' Vmax9 ', ' Km11Suc ', ' Km8Fru ', ' Km1Fruex ', ' Ki1Fru ', ' UDP ',
' Glcex ', ' Vmax4 ', ' Vmax11 ', ' Km6F6P ', ' compartment ', ' Vmax1 ',
' Ki6Pi ', ' phos ', ' ATP ', ' Ki3G6P ', ' Km7Suc6P ', ' Ki5ADP ',
' Km3ATP ', ' Km8Suc ', ' Km6Suc6P ', ' Vmax5 ', ' Km8UDP ', ' Ki4F6P ',
' Km5ATP ', ' Vmax7 ', ' Suc ', ' Km2Glcex ', ' Vmax3 ', ' Vmax6r ',
' Vmax8f ', ' Km6UDPGlc ', ' Glc ', ' Ki5Fru ', ' Ki9Glc ', ' Fru ',
' Fruex ', ' Vmax6f ', ' Ki2Glc ', ' Keq8 ', ' Suc6P ', ' Ki9Fru ',
' Ki8Suc ', ' Ki6UDPGlc ', ' HexP ', ' Vmax10 ', ' Km8UDPGlc ', ' Km4ATP ',
' Ki6F6P ', ' Ki6Suc6P '])
>>> 

For the last example, I'll export the notes body as XHTML. First I'll make a new Element containing the XHTML "html" element. and put in a head and title

>>> html = ElementTree.Element("{http://www.w3.org/1999/xhtml}html")
>>> head = ElementTree.SubElement(html, "{http://www.w3.org/1999/xhtml}head")
>>> title = ElementTree.SubElement(head, "{http://www.w3.org/1999/xhtml}title")
>>> title.text = "This is a note"
>>> head.tail = "\n"
>>> 
Next I'll get the body element from the existing root element and stick it into the new HTML document. The 'tail = "\n"' above and below are to make the output look nice.
>>> body = root.getiterator("{http://www.w3.org/1999/xhtml}body")[0]
>>> body
<Element {http://www.w3.org/1999/xhtml}body at 739f08>
>>> html.append(body)
>>> import sys
>>> html.tail = "\n"
>>> 
The last step is to write the new html element inside of an ElementTree and write the results to stdout.
>>> tree = ElementTree.ElementTree(html)
>>> tree.write(sys.stdout)
<html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:head><html:title>This is a note</html:title></html:head>
<html:body>
<html:p>
 <html:font face="Arial, Helvetica, sans-serif">
 <html:b>
<html:a href="http://www.sbml.org/">SBML</html:a> level 2 code generated for the JWS Online project by Jacky Snoep using <html:a href="http://pysces.sourceforge.net/">PySCeS</html:a>
<html:br />
Run this model online at <html:a href="http://jjj.biochem.sun.ac.za/">http://jjj.biochem.sun.ac.za</html:a>
<html:br />
To cite JWS Online please refer to: Olivier, B.G. and Snoep, J.L. (2004) <html:a href="http://bioinformatics.oupjournals.org/cgi/content/abstract/20/13/2143">Web-based 
modelling using JWS Online</html:a>, Bioinformatics, 20:2143-2144
 </html:b>
 </html:font>
</html:p>

</html:body>
    </html:html>
>>> 



Copyright © 2001-2020 Andrew Dalke Scientific AB