Building an ElementTree using with statments
In my previous essay on the with statement I developed two context managers work together to simplify XML output generation. For fun I developed an API compatible system which generates an in-memory ElementTree instead of writing the output.
Before going any further be aware that the goal of this example is API compatibility with the previous XMLGenerator solution excepting the first constructor. Otherwise other solutions may be better, like using the ElementTree API directly and not using any context managers.
Here's what the final driver code looks like
from elementtree import ElementTree as etree
...
#with DocumentManager(encoding="utf-8") as doc:
with ETreeDocumentManager() as doc:
with doc.element("AuthorList", text="\n", tail="\n"):
for name in names:
doc.characters(" ")
doc.simple_element("Author",
{"forename": name.forename,
"surname": name.surname},
tail="\n")
etree.ElementTree(doc.root).write(sys.stdout)
The only code difference is the top-level context manager.
In my previous case the DocumentManager's __enter__ returned self because it wasn't a factory. There was no way to get a new output file handle for each context. That's not a problem here. I'll have an ETreeDocumentManager which creates a new ETreeDocument for every context.
class ETreeDocumentManager(object):
def __enter__(self):
return ETreeDocument()
def __exit__(self, type, value, traceback):
pass
The only reason for this class is API compatibility with the older
class. I could have said "doc = ETreeDocumentManager()" and not had
the top-level "with" statment.
Adding characters or an element to the XMLgenerator document manager was easy because the output file was always in the right place to write the output. I didn't need to keep track of where to put things. But for ElementTree I do, including going up and down the tree as elements are removed or added.
The implementation was simple. Keep track of all parent nodes using a stack. Use __enter__ and __exit__ to push an element on the stack and remove it. The with statement context logic keeps tracks of the currently active node in the tree.
It's a bit harder figuring out what to do with the text and tail fields, mostly because they can be None on a node. But it's really just a matter of handling all the cases.
Here's the code including the driver code
from __future__ import with_statement
import sys
from elementtree import ElementTree as etree
# Here for API compatibility with the XMLGenerator-based DocumentManager
class ETreeDocumentManager(object):
def __enter__(self):
return ETreeDocument()
def __exit__(self, type, value, traceback):
pass
class ETreeDocument(object):
def __init__(self):
self.root = None
self.parent_stack = []
def characters(self, text):
# dispatch to the current element
if self.parent_stack:
self.parent_stack[-1].characters(text)
else:
raise TypeError("No active nodes")
def element(self, tag, attrib={}, text=None, tail=None):
# dispatch to the current element
if self.parent_stack:
return self.parent_stack[-1].element(tag, attrib, text, tail)
if self.root is not None:
raise TypeError("There can be only one root element")
self.root = root = etree.Element(tag, attrib)
root.text = text
root.tail = tail
return ETreeNodeManager(root, self.parent_stack)
def simple_element(self, tag, attrib={}, text=None, tail=None):
if self.parent_stack:
return self.parent_stack[-1].simple_element(
tag, attrib, text, tail)
# reuse the existing code
return self.element(tag, attrib, text, tail)
class ETreeNodeManager(object):
def __init__(self, node, parent_stack):
self.node = node
self.parent_stack = parent_stack
def __enter__(self):
self.parent_stack.append(self)
return self
def __exit__(self, type, value, traceback):
self.parent_stack.pop()
def characters(self, text):
# This is a bit complicated
node = self.node
if len(node):
last_child = node.getchildren()[-1]
if last_child.tail is None:
last_child.tail = text
else:
# Luckily in recent Pythons this is not O(N**2)
last_child.tail += text
else:
if node.text is None:
node.text = text
else:
node.text += text
def element(self, tag, attrib={}, text=None, tail=None):
child = etree.SubElement(self.node, tag, attrib=attrib)
child.text = text
child.tail = tail
return ETreeNodeManager(child, self.parent_stack)
# No specialized implementation for this
simple_element = element
#####
class Name(object):
def __init__(self, forename, surname):
self.forename = forename
self.surname = surname
names = [Name("Andrew", "Dalke"),
Name("John", "Smith"),
Name(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}sa", "Svensson"),]
#with DocumentManager(encoding="utf-8") as doc:
with ETreeDocumentManager() as doc:
with doc.element("AuthorList", text="\n", tail="\n"):
for name in names:
doc.characters(" ")
doc.simple_element("Author",
{"forename": name.forename,
"surname": name.surname},
tail="\n")
etree.ElementTree(doc.root).write(sys.stdout)
Remember how I said the goal was API compatibility with the older system? Without that requirement I could about as easily have written
doc = ETreeDocument()
author_list = doc.element("AuthorList", text="\n", tail="\n")
for name in names:
author_list.characters(" ")
author_list.simple_element("Author",
{"forename": name.forename,
"surname": name.surname},
tail="\n")
All the with statement does is keep track of the current node stack so
new characters and elements are added in the right place. Granted,
without that I need to use explicit variables ("author_list" in this
case) to specify where the new elements are added, but that can be
considered a good thing. ("Explicit is better than implicit.")
One minor thing to note. I had the ETreeNodeManager's __enter__ return itself. Combined with the previous statement means you can determine which nodes go on the active stack and you can modify elements which are no longer current, as in the following example:
with ETreeDocumentManager() as doc:
with doc.element("AuthorList", text="\n", tail="\n") as author_list:
forenames = author_list.element("Forenames", text="\n", tail="\n")
surnames = author_list.element("Surnames", text="\n", tail="\n")
for name in names:
forenames.characters(" ")
forenames.simple_element("Forename", text=name.forename, tail="\n")
surnames.characters(" ")
surnames.simple_element("Surname", text=name.surname, tail="\n")
fullname = "%s, %s" % (name.surname, name.forename)
doc.simple_element("Person", text=fullname, tail="\n")
the output of which is
<AuthorList> <Forenames> <Forename>Andrew</Forename> <Forename>John</Forename> <Forename>Åsa</Forename> </Forenames> <Surnames> <Surname>Dalke</Surname> <Surname>Smith</Surname> <Surname>Svensson</Surname> </Surnames> <Person>Dalke, Andrew</Person> <Person>Smith, John</Person> <Person>Svensson, Åsa</Person> </AuthorList>
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2010 Dalke Scientific Software, LLC.


