Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2006/08/23/element_tree_and_with_statements

Building an ElementTree using with statments

In my previous essay on the with statement I developed two context managers work together to simplify XML output generation. For fun I developed an API compatible system which generates an in-memory ElementTree instead of writing the output.

Before going any further be aware that the goal of this example is API compatibility with the previous XMLGenerator solution excepting the first constructor. Otherwise other solutions may be better, like using the ElementTree API directly and not using any context managers.

Here's what the final driver code looks like

from elementtree import ElementTree as etree

...

#with DocumentManager(encoding="utf-8") as doc:
with ETreeDocumentManager() as doc:
    with doc.element("AuthorList", text="\n", tail="\n"):
        for name in names:
            doc.characters("  ")
            doc.simple_element("Author",
                               {"forename": name.forename,
                                "surname": name.surname},
                               tail="\n")
                               
etree.ElementTree(doc.root).write(sys.stdout)
The only code difference is the top-level context manager.

In my previous case the DocumentManager's __enter__ returned self because it wasn't a factory. There was no way to get a new output file handle for each context. That's not a problem here. I'll have an ETreeDocumentManager which creates a new ETreeDocument for every context.

class ETreeDocumentManager(object):
    def __enter__(self):
        return ETreeDocument()
    def __exit__(self, type, value, traceback):
        pass
The only reason for this class is API compatibility with the older class. I could have said "doc = ETreeDocumentManager()" and not had the top-level "with" statment.

Adding characters or an element to the XMLgenerator document manager was easy because the output file was always in the right place to write the output. I didn't need to keep track of where to put things. But for ElementTree I do, including going up and down the tree as elements are removed or added.

The implementation was simple. Keep track of all parent nodes using a stack. Use __enter__ and __exit__ to push an element on the stack and remove it. The with statement context logic keeps tracks of the currently active node in the tree.

It's a bit harder figuring out what to do with the text and tail fields, mostly because they can be None on a node. But it's really just a matter of handling all the cases.

Here's the code including the driver code

from __future__ import with_statement

import sys
from elementtree import ElementTree as etree


# Here for API compatibility with the XMLGenerator-based DocumentManager
class ETreeDocumentManager(object):
    def __enter__(self):
        return ETreeDocument()
    def __exit__(self, type, value, traceback):
        pass


class ETreeDocument(object):
    def __init__(self):
        self.root = None
        self.parent_stack = []
        
    def characters(self, text):
        # dispatch to the current element
        if self.parent_stack:
            self.parent_stack[-1].characters(text)
        else:
            raise TypeError("No active nodes")

    def element(self, tag, attrib={}, text=None, tail=None):
        # dispatch to the current element
        if self.parent_stack:
            return self.parent_stack[-1].element(tag, attrib, text, tail)
        if self.root is not None:
            raise TypeError("There can be only one root element")

        self.root = root = etree.Element(tag, attrib)
        root.text = text
        root.tail = tail
        return ETreeNodeManager(root, self.parent_stack)

    def simple_element(self, tag, attrib={}, text=None, tail=None):
        if self.parent_stack:
            return self.parent_stack[-1].simple_element(
                tag, attrib, text, tail)
        # reuse the existing code
        return self.element(tag, attrib, text, tail)

class ETreeNodeManager(object):
    def __init__(self, node, parent_stack):
        self.node = node
        self.parent_stack = parent_stack
    def __enter__(self):
        self.parent_stack.append(self)
        return self
    def __exit__(self, type, value, traceback):
        self.parent_stack.pop()

    def characters(self, text):
        # This is a bit complicated
        node = self.node
        if len(node):
            last_child = node.getchildren()[-1]
            if last_child.tail is None:
                last_child.tail = text
            else:
                # Luckily in recent Pythons this is not O(N**2)
                last_child.tail += text
        else:
            if node.text is None:
                node.text = text
            else:
                node.text += text

    def element(self, tag, attrib={}, text=None, tail=None):
        child = etree.SubElement(self.node, tag, attrib=attrib)
        child.text = text
        child.tail = tail
        return ETreeNodeManager(child, self.parent_stack)

    # No specialized implementation for this
    simple_element = element

#####


class Name(object):
    def __init__(self, forename, surname):
        self.forename = forename
        self.surname = surname

names = [Name("Andrew", "Dalke"),
         Name("John", "Smith"),
         Name(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}sa", "Svensson"),]

            
#with DocumentManager(encoding="utf-8") as doc:
with ETreeDocumentManager() as doc:
    with doc.element("AuthorList", text="\n", tail="\n"):
        for name in names:
            doc.characters("  ")
            doc.simple_element("Author",
                               {"forename": name.forename,
                                "surname": name.surname},
                               tail="\n")
                               
etree.ElementTree(doc.root).write(sys.stdout)

Remember how I said the goal was API compatibility with the older system? Without that requirement I could about as easily have written

doc = ETreeDocument()
author_list = doc.element("AuthorList", text="\n", tail="\n")
for name in names:
    author_list.characters("  ")
    author_list.simple_element("Author",
                               {"forename": name.forename,
                                "surname": name.surname},
                                 tail="\n")
All the with statement does is keep track of the current node stack so new characters and elements are added in the right place. Granted, without that I need to use explicit variables ("author_list" in this case) to specify where the new elements are added, but that can be considered a good thing. ("Explicit is better than implicit.")

One minor thing to note. I had the ETreeNodeManager's __enter__ return itself. Combined with the previous statement means you can determine which nodes go on the active stack and you can modify elements which are no longer current, as in the following example:

with ETreeDocumentManager() as doc:
    with doc.element("AuthorList", text="\n", tail="\n") as author_list:
        forenames = author_list.element("Forenames", text="\n", tail="\n")
        surnames = author_list.element("Surnames", text="\n", tail="\n")
        for name in names:
            forenames.characters("  ")
            forenames.simple_element("Forename", text=name.forename, tail="\n")
            surnames.characters("  ")
            surnames.simple_element("Surname", text=name.surname, tail="\n")
            fullname = "%s, %s" % (name.surname, name.forename)
            doc.simple_element("Person", text=fullname, tail="\n")
the output of which is
<AuthorList>
<Forenames>
  <Forename>Andrew</Forename>
  <Forename>John</Forename>
  <Forename>&#197;sa</Forename>
</Forenames>
<Surnames>
  <Surname>Dalke</Surname>
  <Surname>Smith</Surname>
  <Surname>Svensson</Surname>
</Surnames>
<Person>Dalke, Andrew</Person>
<Person>Smith, John</Person>
<Person>Svensson, &#197;sa</Person>
</AuthorList>


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2013 Andrew Dalke Scientific AB