with statement
with statement considered hard?
Python 2.5 introduces the with statement. The documentation there and in PEP 343 refer to context managers. The PEP says the term comes from the draft documentation but doesn't explain it further. From that draft:
Context managers ensure a particular action is taken to establish the context before the contained suite is entered, and a second action to clean up the context when the suite is exited.and from the PEP
the actual state modifications made by the context managerAll of the examples in the PEP and the what's new document use the with statement for things like locks, ensuring a socket or file is closed, database transactions and temporarily modifying a system setting. These are heavy duty things. Deep things. Things most people don't want to mess with. The what's new document even says
Under the hood, the 'with' statement is fairly complicated. Most people will only use 'with' in company with existing objects and don't need to know these details
I think this viewpoint is wrong, or at least overly limited. I think the PEP is more generally useful than those examples show and the term "context manager" is too abstract. I also conjecture that the people working on the PEP were systems developers and not applications developers, hence the bias towards system/state modification examples. ;)
Using XMLGenerator for XML output
I like using XMLGenerator to make XML output. Here's an example.
import sys from xml.sax import saxutils class Name(object): def __init__(self, forename, surname): self.forename = forename self.surname = surname names = [Name("Andrew", "Dalke"), Name("John", "Smith"), Name(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}sa", "Svensson"),] gen = saxutils.XMLGenerator(sys.stdout, "utf-8") gen.startDocument() gen.startElement("NameList", {}) for name in names: gen.startElement("Name", {"forename": name.forename, "surname": name.surname}) gen.endElement("Name") gen.endElement("NameList") gen.endDocument()The output looks like
<?xml version="1.0" encoding="utf-8"?> <NameList><Name surname="Dalke" forename="Andrew"></Name><Name surname="Smith" forename="John"></Name><Name surname="Svensson" forename="Åsa"></Name></NameList>although I've added a newline for clarity. If I want people to read the output then I can add some indentation in the XML output like this:
gen = saxutils.XMLGenerator(sys.stdout, "utf-8") gen.startDocument() gen.startElement("NameList", {}) gen.characters("\n") for name in names: gen.characters(" "); gen.startElement("Name", {"forename": name.forename, "surname": name.surname}) gen.EndElement("Name") gen.characters("\n"); gen.endElement("NameList") gen.characters("\n"); gen.endDocument()with the output in this case being
<?xml version="1.0" encoding="utf-8"?> <NameList> <Name surname="Dalke" forename="Andrew"></Name> <Name surname="Smith" forename="John"></Name> <Name surname="Svensson" forename="Åsa"></Name> </NameList>
I like using XMLGenerator for this because there is no data structure overhead. The output goes directory to the output file with no additional buffering. I don't like the verbosity and I don't like the high likelihood of errors. (Even knowing I make mistakes when I write this code my original code omitted the endElement("Name") call and I didn't catch it on first view of the output. Luckily those errors are easy to catch automatically.)
Simplifing SAX output
The SAX API wasn't designed for humans to call directly. It's a consumer not producer API, and that shows. I'll steal from ElementTree and make a function named "simple_element" which creates elements with no subelement and takes both "text" and "tail" parameters.
def simple_element(gen, tag, attrib={}, text=None, tail=None): gen.startElement(tag, attrib) if text is not None: gen.characters(text) gen.endElement(tag) if tail is not None: gen.characters(tail)I can use this function to simplify the output code a bit
gen = saxutils.XMLGenerator(sys.stdout, "utf-8") gen.startDocument() gen.startElement("NameList", {}) gen.characters("\n") for name in names: gen.characters(" ") simple_element(gen, "Name", {"forename": name.forename, "surname": name.surname}, tail="\n") gen.endElement("NameList") gen.characters("\n"); gen.endDocument()but I can't think of a pre-with-statement way to make sure that the startElement and endElement calls always match up. ... Hmm... I take that back. (DIGRESSION WARNING: return to the main topic)
Context managers through decorators; a neat trick
Here's one way to (ab)using decorators but it's not something I want to see in real code.
class Document(object): def __init__(self, gen): self.gen = gen def characters(self, text): self.gen.characters(text) def element(self, tag, attrib={}, text=None, tail=None): def call_using_wrapping_element(f=None): self.gen.startElement(tag, attrib) if text is not None: self.gen.characters(text) if f is not None: f() self.gen.endElement(tag) if tail is not None: self.gen.characters(tail) return call_using_wrapping_element def document(out, encoding="us-ascii"): def call_with_document(f): gen = saxutils.XMLGenerator(out, encoding) gen.startDocument() doc = Document(gen) f(doc) gen.endDocument() return call_with_document @document(sys.stdout, "utf-8") def use_document(doc): @doc.element("NameList", text="\n", tail="\n") def add_names(): for name in names: doc.characters(" ") doc.element("Name", {"forename": name.forename, "surname": name.surname}, tail="\n")()The trick in the above code is that the decorator can call the function it's decorating. I've not seen any other decorator which does that, but it is a wide world.
I've been staring at it trying to figure out if it's useful. For example, here's a pseudo context manager to ensure that the opened file is closed once the function is finished.
def opening_file(filename, flags="r"): def with_opening_file(func): f = open(filename, flags) try: func(f) finally: f.close() return with_opening_file @opening_file("/etc/passwd") def verify_entries(f): for line in f: if line.strip().startswith("#"): continue user, pw, uid, gid, name, dir, shell = line[:-1].split(":") if user != "root" and shell != "/noshell": raise TypeError("%r has a shell %r" % (user, shell))Oh, and this is even better. I've changed the "opening_file" to return the results of calling the function. Watch this
def opening_file(filename, flags="r"): def with_opening_file(func): f = open(filename, flags) try: return func(f) finally: f.close() return with_opening_file @opening_file("/etc/passwd") def linecount(f): i = 0 for line in f: i += 1 return i print linecountIt prints 16, which is the correct line count for my /etc/passwd. The decorator mechanism is working exactly as documented, I just never suspected you could do this.
Interesting. It looks like this trick easily implements the context protocol.
import sys import inspect # This code is straight from the spec with a minor tweak to pass the # __enter__ value if the function takes an argument. Wanted to # emulate "as VAR". def in_context(mgr): exit = mgr.__exit__ value = mgr.__enter__() def call_in_context(f): exc = True argspec = inspect.getargspec(f) if argspec[0] or argspec[1]: args = (value,) else: args = () try: return f(*args) except: exc = False if not exit(*sys.exc_info()): raise finally: if exc: exit(None, None, None) return call_in_context # Emulate '''with open("/etc/passwd") as f:''' @in_context(open("/etc/passwd")) def counts(f): line_count = char_count = 0 while 1: block = f.read(2048) if not block: break line_count += block.count("\n") char_count += len(block) return line_count, char_count print countsThe only thing it can't do is assign to names in enclosing scopes and for most purposes that could be done afterwards using the return value from the decorator. It's also a bit more verbose and you have to kinda look through what the syntax suggests to figure out what it means.
This could be useful for someone wanting with-statement-like behavior in a pre-2.5 version of Python.
With an element here and and element there ...
Ignore the abstract concept "context manager." The with statement says "before running code X run this code A. After X is done run this code B. Let B know if X finished normally or with an exception." Got that? In a single spot you can define code to be run both before and after some other block of code. That solves the problem I have where I want to ensure that the startElement and endElement tags match.
I'll show what I mean with a new Element class.
from __future__ import with_statement import sys from xml.sax import saxutils # ... define the Name class and set 'names' here, as above ... class Element(object): def __init__(self, gen, tag, attrib={}, text=None, tail=None): self.gen = gen self.tag = tag self.attrib = attrib self.text = text self.tail = tail def __enter__(self): self.gen.startElement(self.tag, self.attrib) if self.text is not None: gen.characters(self.text) def __exit__(self, type, value, traceback): if type is None: self.gen.endElement(self.tag) if self.tail is not None: self.gen.characters(self.tail) gen = saxutils.XMLGenerator(sys.stdout, "utf-8") gen.startDocument() with Element(gen, "NameList", text="\n", tail="\n"): for name in names: gen.characters(" ") with Element(gen, "Name", {"forename": name.forename, "surname": name.surname}, tail="\n"): pass gen.endDocument()Each Element implements __enter__ and __exit__. On entry it writes the start tag and the text, if given. On exit it writes the close tag and the optional tail. My code only inserts the endElement when there was no exception in the wrapped block. (When there was no exception the three arguments are None.)
Some rhetorical questions
One question I've been thinking about is: when should code be in the constructor vs. the __enter__? For example, I could have defined Element as the simpler:
# This example contains a bug - IT DOES NOT WORK CORRECTLY! class Element(object): def __init__(self, gen, tag, attrib={}, text=None, tail=None): # This is wrong; the next three lines must go in __enter__ gen.startElement(tag, attrib) if text is not None: gen.characters(text) self.gen = gen self.tag = tag self.tail = tail def __enter__(self): pass def __exit__(self, type, value, traceback): if type is None: self.gen.endElement(self.tag) if self.tail is not None: self.gen.characters(self.tail)It works in the example driver code but it doesn't work in general. Consider this case where I want the output to be
<NameList> <Name>Andrew Dalke</Name> <Name>John Smith</Name> <Name>Åsa Svensson</Name> </NameList>The following code should be reasonable
gen = saxutils.XMLGenerator(sys.stdout, "utf-8") gen.startDocument() with Element(gen, "NameList", text="\n", tail="\n"): Name = Element(gen, "Name", tail="\n") for name in names: gen.characters(" ") with Name: gen.characters(name.forename + " " + name.surname) gen.endDocument()but because it reuses "Name" for all names the start tag will only be generated once, and in the wrong place while the end tag is generated for every call to __exit__. It instead generates:
<NameList> <Name> Andrew Dalke</Name> John Smith</Name> Åsa Svensson</Name> </NameList>
Another question I've had is: "why does the same __exit__ method get called after both failures and successes?" It's a bit annoying because I don't want to generate the closing tag if there was exception in the with body so I need check that there's no traceback.
I think the answer is that the with statment was modeled after the code in the "finally:" of a try block instead of having code in "except:" and code in "else:". I conjecture the PEP authors likely thought most context managers would have the same code for either case so having two functions would be a hassle.
My response is to suggest the following context protocol which uses __exit__ as a backup if __success__exit__ or __error_exit__ are not defined for the respective cases. This would be backwards compatible to the existing protocol.
missing = object() exit = getattr(mgr, "__exit__", missing) success_exit = getattr(mgr, "__success_exit__", missing) error_exit = getattr(mgr, "__error_exit__", missing) # could require that one of "exit and sucess_exit" are defined # could require that one of "exit and error_exit" are defined value = mgr.__enter__() exc = True try: BLOCK except: exc = False if error_exit is not missing: if not error_exit(*sys.exc_info(): raise elif exit is not missing if not exit(*sys.exc_info()): raise else: raise finally: if exc: if success_exit is not missing: success_exit() elif exit is not missing: exit(None, None, None)With this protocol I could define my Element class as
class Element(object): def __init__(self, gen, tag, attrib={}, text=None, tail=None): self.gen = gen self.tag = tag self.attrib = attrib self.text = text self.tail = tail def __enter__(self): self.gen.startElement(self.tag, self.attrib) if self.text is not None: gen.characters(self.text) def __success_exit__(self): # HYPTOTHETICAL API - this will not work if type is None: self.gen.endElement(self.tag) if self.tail is not None: self.gen.characters(self.tail)and not have to test "if type is None" in the __exit__.
A DocumentManager
With the (valid) Element class defined earlier the current output generation code is
gen = saxutils.XMLGenerator(sys.stdout, "utf-8") gen.startDocument() with Element(gen, "NameList", text="\n", tail="\n"): for name in names: gen.characters(" ") with Element(gen, "Name", {"forename": name.forename, "surname": name.surname}, tail="\n"): pass gen.endDocument()I want to improve a few things. The PEP uses the phrase "context manager" so I'll rename my Element class to ElementManager. There's a bit of matching code with the startDocument/endDocument pair so I'll make a new DocumentManager to handle those. I also don't like that I pass "gen" around. I can hide that by having the DocumentManager create the ElementManager, passing in the current XMLGenerator.
Oh yeah, and I don't like that empty "with ...: pass". I want a way to make a simple element with no subelements. In addition, XMLGenerator doesn't make self-closed tags like <this/> so I'll fake support for that case by bypassing XMLGenerator and doing lower-level writes.
The code
Here is the full code including my driver
from __future__ import with_statement from xml.sax import saxutils class DocumentManager(object): def __init__(self, out=None, encoding="us-ascii"): self.gen = saxutils.XMLGenerator(out, encoding) def __enter__(self): self.gen.startDocument() return self def __exit__(self, type, value, traceback): if type is None: self.gen.endDocument() def characters(self, text): self.gen.characters(text) def element(self, tag, attrib={}, text=None, tail=None): return ElementManager(self.gen, tag, attrib, text, tail) def simple_element(self, tag, attrib={}, text=None, tail=None): # Special code because I want to allow self-closing elements # <this where="here"/> which XMLGenerator does not support. if text is None: write = self.gen._write write('<'+tag) for (name, value) in attrib.items(): write(' %s=%s' % (name, saxutils.quoteattr(value))) write('/>') else: self.gen.startElement(tag, attrib) self.gen.characters(text) self.gen.endElement(tag) if tail is not None: self.gen.characters(tail) class ElementManager(object): def __init__(self, gen, tag, attrib={}, text=None, tail=None): self.gen = gen self.tag = tag self.tail = tail gen.startElement(tag, attrib) if text is not None: gen.characters(text) def __enter__(self): return None def __exit__(self, type, value, traceback): if type is None: self.gen.endElement(self.tag) if self.tail is not None: self.gen.characters(self.tail) # The above in action class Name(object): def __init__(self, forename, surname): self.forename = forename self.surname = surname names = [Name("Andrew", "Dalke"), Name("John", "Smith"), Name(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}sa", "Svensson"),] with DocumentManager(encoding="utf-8") as doc: with doc.element("AuthorList", text="\n", tail="\n"): for name in names: doc.characters(" ") doc.simple_element("Author", {"forename": name.forename, "surname": name.surname}, tail="\n")
The __enter__ method returns the object used by the "as" part of the with statement. In this case "doc" gets the DocumentManager instance because DocumentManager.__enter__ returns self.
The __enter__ does not need to return self. It can be any value. I could have had it return something like "Document(self.gen)" and had the Document class implement "characters", "element" and "simple_element." That makes sense only if the manager is a true factory, returning new objects each time it's used in with statement. In this case the DocumentManager holds on to the output file object so multiple hypothetical Document objects would overwrite each other.
(Suppose the DocumentManager created a new output file through tempfile every time it's used in a with statment. In that case it could and likely should have the __enter__ return a new Document every time.)
The __enter__ method must exist which is why ElementManager has one even though it does nothing. Well, it returns None but there's no effective difference between "pass" and "return None" in this case. The __exit__ method must also exist but if you don't need __exit__ then there's no reason to use the with statement.
Using the two controllers makes the XML generation code quite short and easier to read. It also makes mismatched tag names impossible. That's why I like this approach and encourage others to use it. When appropriate of course.
with statement and OpenGL
When the with statement was under discussion I mentioned the idea that it might be useful for OpenGL programming. That's another case where there is a lot of code of the form "do A", "do X", "do B" where B is the counterpart of A. For example, "push matrix/multmatrix", "draw object", "pop matrix". My comment is still around. I bring it up here to show another example of how cool the with statement is and why you should learn more about it.
OpenGL programmers have complained about using Python because the code indentation doesn't follow the display tree. For an example pulled from one of my projects:
glBegin(GL_QUAD_STRIP) glColor3f(1.0,1.0,1.0) #corner 1 glNormal3f(0.57735027, 0.57735027, 0.57735027) glVertex3f(0.5, 0.5, 0.5) glColor3f(1.0,0.0,1.0) #corner 2 glNormal3f(0.57735027, -0.57735027, 0.57735027) glVertex3f(0.5, -0.5, 0.5) ... glEnd()To get the indentation right some people write this like
glBegin(GL_QUAD_STRIP) if 1: glColor3f(1.0,1.0,1.0) #corner 1 glNormal3f(0.57735027, 0.57735027, 0.57735027) glVertex3f(0.5, 0.5, 0.5) glColor3f(1.0,0.0,1.0) #corner 2 glNormal3f(0.57735027, -0.57735027, 0.57735027) glVertex3f(0.5, -0.5, 0.5) ... glEnd()Better would be a try/finally so that exceptions don't trash the OpenGL state, like
glBegin(GL_QUAD_STRIP) try: glColor3f(1.0,1.0,1.0) #corner 1 glNormal3f(0.57735027, 0.57735027, 0.57735027) glVertex3f(0.5, 0.5, 0.5) glColor3f(1.0,0.0,1.0) #corner 2 glNormal3f(0.57735027, -0.57735027, 0.57735027) glVertex3f(0.5, -0.5, 0.5) ... finally: glEnd()This is exactly what the with statment was designed for - state modification and restoration. Consider this context manager
class QUAD_STRIP(object): @staticmethod def __enter__(): glBegin(GL_QUAD_STRIP) @staticmethod def __exit__(*args): glEnd()Assuming things work as I expect the GL_QUAD_STRIP code becomes
with QUAD_STRIP: glColor3f(1.0,1.0,1.0) #corner 1 glNormal3f(0.57735027, 0.57735027, 0.57735027) glVertex3f(0.5, 0.5, 0.5) glColor3f(1.0,0.0,1.0) #corner 2 glNormal3f(0.57735027, -0.57735027, 0.57735027) glVertex3f(0.5, -0.5, 0.5) ...
I mentioned this to Mike Fletcher (of PyOpenGL fame) last year when the with statement was under development. His views were (paraphrased):
- OpenGL performance in Python is already slow so this won't make things much worse
- It's deeper layering hence more to learn in order to debug effectively
- Most of his code wouldn't need this, and try/finally isn't hard
- "Fixing" the indentation is pleasant but is this really the intent of the PEP?
Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me
Copyright © 2001-2020 Andrew Dalke Scientific AB