Dalke Scientific Software: More science. Less time. Products
[ previous | newer ]     /home/writings/diary/archive/2006/08/23/with_statement

with statement

with statement considered hard?

Python 2.5 introduces the with statement. The documentation there and in PEP 343 refer to context managers. The PEP says the term comes from the draft documentation but doesn't explain it further. From that draft:

Context managers ensure a particular action is taken to establish the context before the contained suite is entered, and a second action to clean up the context when the suite is exited.
and from the PEP
the actual state modifications made by the context manager
All of the examples in the PEP and the what's new document use the with statement for things like locks, ensuring a socket or file is closed, database transactions and temporarily modifying a system setting. These are heavy duty things. Deep things. Things most people don't want to mess with. The what's new document even says
Under the hood, the 'with' statement is fairly complicated. Most people will only use 'with' in company with existing objects and don't need to know these details

I think this viewpoint is wrong, or at least overly limited. I think the PEP is more generally useful than those examples show and the term "context manager" is too abstract. I also conjecture that the people working on the PEP were systems developers and not applications developers, hence the bias towards system/state modification examples. ;)

Using XMLGenerator for XML output

I like using XMLGenerator to make XML output. Here's an example.

import sys
from xml.sax import saxutils

class Name(object):
    def __init__(self, forename, surname):
        self.forename = forename
        self.surname = surname

names = [Name("Andrew", "Dalke"),
         Name("John", "Smith"),
         Name(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}sa", "Svensson"),]

gen = saxutils.XMLGenerator(sys.stdout, "utf-8")
gen.startDocument()
gen.startElement("NameList", {})
for name in names:
    gen.startElement("Name",
                     {"forename": name.forename,
                      "surname": name.surname})
    gen.endElement("Name")
gen.endElement("NameList")
gen.endDocument()
The output looks like
<?xml version="1.0" encoding="utf-8"?>
<NameList><Name surname="Dalke" forename="Andrew"></Name><Name surname="Smith"
forename="John"></Name><Name surname="Svensson" forename="Åsa"></Name></NameList>
although I've added a newline for clarity. If I want people to read the output then I can add some indentation in the XML output like this:
gen = saxutils.XMLGenerator(sys.stdout, "utf-8")
gen.startDocument()
gen.startElement("NameList", {})
gen.characters("\n")
for name in names:
    gen.characters("  ");
    gen.startElement("Name",
                     {"forename": name.forename,
                      "surname": name.surname})
    gen.EndElement("Name")
    gen.characters("\n");
gen.endElement("NameList")
gen.characters("\n");
gen.endDocument()
with the output in this case being
<?xml version="1.0" encoding="utf-8"?>
<NameList>
  <Name surname="Dalke" forename="Andrew"></Name>
  <Name surname="Smith" forename="John"></Name>
  <Name surname="Svensson" forename="Åsa"></Name>
</NameList>

I like using XMLGenerator for this because there is no data structure overhead. The output goes directory to the output file with no additional buffering. I don't like the verbosity and I don't like the high likelihood of errors. (Even knowing I make mistakes when I write this code my original code omitted the endElement("Name") call and I didn't catch it on first view of the output. Luckily those errors are easy to catch automatically.)

Simplifing SAX output

The SAX API wasn't designed for humans to call directly. It's a consumer not producer API, and that shows. I'll steal from ElementTree and make a function named "simple_element" which creates elements with no subelement and takes both "text" and "tail" parameters.

def simple_element(gen, tag, attrib={}, text=None, tail=None):
    gen.startElement(tag, attrib)
    if text is not None:
        gen.characters(text)
    gen.endElement(tag)
    if tail is not None:
        gen.characters(tail)
I can use this function to simplify the output code a bit
gen = saxutils.XMLGenerator(sys.stdout, "utf-8")
gen.startDocument()
gen.startElement("NameList", {})
gen.characters("\n")
for name in names:
    gen.characters("  ")
    simple_element(gen, "Name",
                   {"forename": name.forename, "surname": name.surname},
                   tail="\n")
gen.endElement("NameList")
gen.characters("\n");
gen.endDocument()
but I can't think of a pre-with-statement way to make sure that the startElement and endElement calls always match up. ... Hmm... I take that back. (DIGRESSION WARNING: return to the main topic)

Context managers through decorators; a neat trick

Here's one way to (ab)using decorators but it's not something I want to see in real code.

class Document(object):
    def __init__(self, gen):
        self.gen = gen
    def characters(self, text):
        self.gen.characters(text)
    def element(self, tag, attrib={}, text=None, tail=None):
        def call_using_wrapping_element(f=None):
            self.gen.startElement(tag, attrib)
            if text is not None:
                self.gen.characters(text)
            if f is not None:
                f()
            self.gen.endElement(tag)
            if tail is not None:
                self.gen.characters(tail)
        return call_using_wrapping_element

def document(out, encoding="us-ascii"):
    def call_with_document(f):
        gen = saxutils.XMLGenerator(out, encoding)
        gen.startDocument()
        doc = Document(gen)
        f(doc)
        gen.endDocument()
    return call_with_document

@document(sys.stdout, "utf-8")
def use_document(doc):
    @doc.element("NameList", text="\n", tail="\n")
    def add_names():
        for name in names:
            doc.characters("  ")
            doc.element("Name",
                        {"forename": name.forename, "surname": name.surname},
                        tail="\n")()
The trick in the above code is that the decorator can call the function it's decorating. I've not seen any other decorator which does that, but it is a wide world.

I've been staring at it trying to figure out if it's useful. For example, here's a pseudo context manager to ensure that the opened file is closed once the function is finished.

def opening_file(filename, flags="r"):
    def with_opening_file(func):
        f = open(filename, flags)
        try:
            func(f)
        finally:
            f.close()
    return with_opening_file

@opening_file("/etc/passwd")
def verify_entries(f):
    for line in f:
        if line.strip().startswith("#"):
            continue
        user, pw, uid, gid, name, dir, shell = line[:-1].split(":")
        if user != "root" and shell != "/noshell":
            raise TypeError("%r has a shell %r" % (user, shell))
Oh, and this is even better. I've changed the "opening_file" to return the results of calling the function. Watch this
def opening_file(filename, flags="r"):
    def with_opening_file(func):
        f = open(filename, flags)
        try:
            return func(f)
        finally:
            f.close()
    return with_opening_file

@opening_file("/etc/passwd")
def linecount(f):
    i = 0
    for line in f:
        i += 1
    return i

print linecount
It prints 16, which is the correct line count for my /etc/passwd. The decorator mechanism is working exactly as documented, I just never suspected you could do this.

Interesting. It looks like this trick easily implements the context protocol.

import sys
import inspect

# This code is straight from the spec with a minor tweak to pass the
# __enter__ value if the function takes an argument.  Wanted to
# emulate "as VAR".
def in_context(mgr):
    exit = mgr.__exit__
    value = mgr.__enter__()
    def call_in_context(f):
        exc = True
        argspec = inspect.getargspec(f)
        if argspec[0] or argspec[1]:
            args = (value,)
        else:
            args = ()
        try:
            return f(*args)
        except:
            exc = False
            if not exit(*sys.exc_info()):
                raise
        finally:
            if exc:
                exit(None, None, None)
    return call_in_context

# Emulate '''with open("/etc/passwd") as f:'''
@in_context(open("/etc/passwd"))
def counts(f):
    line_count = char_count = 0
    while 1:
        block = f.read(2048)
        if not block:
            break
        line_count += block.count("\n")
        char_count += len(block)
    return line_count, char_count

print counts
The only thing it can't do is assign to names in enclosing scopes and for most purposes that could be done afterwards using the return value from the decorator. It's also a bit more verbose and you have to kinda look through what the syntax suggests to figure out what it means.

This could be useful for someone wanting with-statement-like behavior in a pre-2.5 version of Python.

With an element here and and element there ...

Back to the with statement...

Ignore the abstract concept "context manager." The with statement says "before running code X run this code A. After X is done run this code B. Let B know if X finished normally or with an exception." Got that? In a single spot you can define code to be run both before and after some other block of code. That solves the problem I have where I want to ensure that the startElement and endElement tags match.

I'll show what I mean with a new Element class.

from __future__ import with_statement

import sys
from xml.sax import saxutils

# ... define the Name class and set 'names' here, as above ...

class Element(object):
    def __init__(self, gen, tag, attrib={}, text=None, tail=None):
        self.gen = gen
        self.tag = tag
        self.attrib = attrib
        self.text = text
        self.tail = tail

    def __enter__(self):
        self.gen.startElement(self.tag, self.attrib)
        if self.text is not None:
            gen.characters(self.text)

    def __exit__(self, type, value, traceback):
        if type is None:
            self.gen.endElement(self.tag)
            if self.tail is not None:
                self.gen.characters(self.tail)

gen = saxutils.XMLGenerator(sys.stdout, "utf-8")
gen.startDocument()
with Element(gen, "NameList", text="\n", tail="\n"):
    for name in names:
        gen.characters("  ")
        with Element(gen, "Name",
                     {"forename": name.forename,
                      "surname": name.surname},
                     tail="\n"):
            pass
gen.endDocument()
Each Element implements __enter__ and __exit__. On entry it writes the start tag and the text, if given. On exit it writes the close tag and the optional tail. My code only inserts the endElement when there was no exception in the wrapped block. (When there was no exception the three arguments are None.)

Some rhetorical questions

One question I've been thinking about is: when should code be in the constructor vs. the __enter__? For example, I could have defined Element as the simpler:

# This example contains a bug - IT DOES NOT WORK CORRECTLY!
class Element(object):
    def __init__(self, gen, tag, attrib={}, text=None, tail=None):
        # This is wrong; the next three lines must go in __enter__
        gen.startElement(tag, attrib)
        if text is not None:
            gen.characters(text)
        self.gen = gen
        self.tag = tag
        self.tail = tail

    def __enter__(self):
        pass

    def __exit__(self, type, value, traceback):
        if type is None:
            self.gen.endElement(self.tag)
            if self.tail is not None:
                self.gen.characters(self.tail)
It works in the example driver code but it doesn't work in general. Consider this case where I want the output to be
<NameList>
 <Name>Andrew Dalke</Name>
 <Name>John Smith</Name>
 <Name>Åsa Svensson</Name>
</NameList>
The following code should be reasonable
gen = saxutils.XMLGenerator(sys.stdout, "utf-8")
gen.startDocument()
with Element(gen, "NameList", text="\n", tail="\n"):
    Name = Element(gen, "Name", tail="\n")
    for name in names:
        gen.characters("  ")
        with Name:
            gen.characters(name.forename + " " + name.surname)
gen.endDocument()
but because it reuses "Name" for all names the start tag will only be generated once, and in the wrong place while the end tag is generated for every call to __exit__. It instead generates:
<NameList>
<Name>  Andrew Dalke</Name>
  John Smith</Name>
  Åsa Svensson</Name>
</NameList>

Another question I've had is: "why does the same __exit__ method get called after both failures and successes?" It's a bit annoying because I don't want to generate the closing tag if there was exception in the with body so I need check that there's no traceback.

I think the answer is that the with statment was modeled after the code in the "finally:" of a try block instead of having code in "except:" and code in "else:". I conjecture the PEP authors likely thought most context managers would have the same code for either case so having two functions would be a hassle.

My response is to suggest the following context protocol which uses __exit__ as a backup if __success__exit__ or __error_exit__ are not defined for the respective cases. This would be backwards compatible to the existing protocol.

    missing = object()

    exit = getattr(mgr, "__exit__", missing)
    success_exit = getattr(mgr, "__success_exit__", missing)
    error_exit = getattr(mgr, "__error_exit__", missing)
    # could require that one of "exit and sucess_exit" are defined
    # could require that one of "exit and error_exit" are defined

    value = mgr.__enter__()
    exc = True
    try:
        BLOCK
    except:
        exc = False
        if error_exit is not missing:
            if not error_exit(*sys.exc_info():
                raise
        elif exit is not missing
            if not exit(*sys.exc_info()):
                raise
        else:
            raise
    finally:
        if exc:
            if success_exit is not missing:
                success_exit()
            elif exit is not missing:
                exit(None, None, None)
With this protocol I could define my Element class as
class Element(object):
    def __init__(self, gen, tag, attrib={}, text=None, tail=None):
        self.gen = gen
        self.tag = tag
        self.attrib = attrib
        self.text = text
        self.tail = tail

    def __enter__(self):
        self.gen.startElement(self.tag, self.attrib)
        if self.text is not None:
            gen.characters(self.text)

    def __success_exit__(self):
        # HYPTOTHETICAL API - this will not work
        if type is None:
            self.gen.endElement(self.tag)
            if self.tail is not None:
                self.gen.characters(self.tail)
and not have to test "if type is None" in the __exit__.

A DocumentManager

With the (valid) Element class defined earlier the current output generation code is

gen = saxutils.XMLGenerator(sys.stdout, "utf-8")
gen.startDocument()
with Element(gen, "NameList", text="\n", tail="\n"):
    for name in names:
        gen.characters("  ")
        with Element(gen, "Name",
                     {"forename": name.forename,
                      "surname": name.surname},
                     tail="\n"):
            pass
gen.endDocument()
I want to improve a few things. The PEP uses the phrase "context manager" so I'll rename my Element class to ElementManager. There's a bit of matching code with the startDocument/endDocument pair so I'll make a new DocumentManager to handle those. I also don't like that I pass "gen" around. I can hide that by having the DocumentManager create the ElementManager, passing in the current XMLGenerator.

Oh yeah, and I don't like that empty "with ...: pass". I want a way to make a simple element with no subelements. In addition, XMLGenerator doesn't make self-closed tags like <this/> so I'll fake support for that case by bypassing XMLGenerator and doing lower-level writes.

The code

Here is the full code including my driver

from __future__ import with_statement

from xml.sax import saxutils

class DocumentManager(object):
    def __init__(self, out=None, encoding="us-ascii"):
        self.gen = saxutils.XMLGenerator(out, encoding)
    def __enter__(self):
        self.gen.startDocument()
        return self
    def __exit__(self, type, value, traceback):
        if type is None:
            self.gen.endDocument()

    def characters(self, text):
        self.gen.characters(text)

    def element(self, tag, attrib={}, text=None, tail=None):
        return ElementManager(self.gen, tag, attrib, text, tail)

    def simple_element(self, tag, attrib={}, text=None, tail=None):
        # Special code because I want to allow self-closing elements
        # <this where="here"/> which XMLGenerator does not support.
        if text is None:
            write = self.gen._write
            write('<'+tag)
            for (name, value) in attrib.items():
                write(' %s=%s' % (name, saxutils.quoteattr(value)))
            write('/>')
        else:
            self.gen.startElement(tag, attrib)
            self.gen.characters(text)
            self.gen.endElement(tag)

        if tail is not None:
            self.gen.characters(tail)

class ElementManager(object):
    def __init__(self, gen, tag, attrib={}, text=None, tail=None):
        self.gen = gen
        self.tag = tag
        self.tail = tail

        gen.startElement(tag, attrib)
        if text is not None:
            gen.characters(text)

    def __enter__(self):
        return None

    def __exit__(self, type, value, traceback):
        if type is None:
            self.gen.endElement(self.tag)
            if self.tail is not None:
                self.gen.characters(self.tail)

# The above in action

class Name(object):
    def __init__(self, forename, surname):
        self.forename = forename
        self.surname = surname

names = [Name("Andrew", "Dalke"),
         Name("John", "Smith"),
         Name(u"\N{LATIN CAPITAL LETTER A WITH RING ABOVE}sa", "Svensson"),]


with DocumentManager(encoding="utf-8") as doc:
    with doc.element("AuthorList", text="\n", tail="\n"):
        for name in names:
            doc.characters("  ")
            doc.simple_element("Author",
                               {"forename": name.forename,
                                "surname": name.surname},
                               tail="\n")

The __enter__ method returns the object used by the "as" part of the with statement. In this case "doc" gets the DocumentManager instance because DocumentManager.__enter__ returns self.

The __enter__ does not need to return self. It can be any value. I could have had it return something like "Document(self.gen)" and had the Document class implement "characters", "element" and "simple_element." That makes sense only if the manager is a true factory, returning new objects each time it's used in with statement. In this case the DocumentManager holds on to the output file object so multiple hypothetical Document objects would overwrite each other.

(Suppose the DocumentManager created a new output file through tempfile every time it's used in a with statment. In that case it could and likely should have the __enter__ return a new Document every time.)

The __enter__ method must exist which is why ElementManager has one even though it does nothing. Well, it returns None but there's no effective difference between "pass" and "return None" in this case. The __exit__ method must also exist but if you don't need __exit__ then there's no reason to use the with statement.

Using the two controllers makes the XML generation code quite short and easier to read. It also makes mismatched tag names impossible. That's why I like this approach and encourage others to use it. When appropriate of course.

with statement and OpenGL

When the with statement was under discussion I mentioned the idea that it might be useful for OpenGL programming. That's another case where there is a lot of code of the form "do A", "do X", "do B" where B is the counterpart of A. For example, "push matrix/multmatrix", "draw object", "pop matrix". My comment is still around. I bring it up here to show another example of how cool the with statement is and why you should learn more about it.

OpenGL programmers have complained about using Python because the code indentation doesn't follow the display tree. For an example pulled from one of my projects:

    glBegin(GL_QUAD_STRIP)
    glColor3f(1.0,1.0,1.0) #corner 1
    glNormal3f(0.57735027, 0.57735027, 0.57735027)
    glVertex3f(0.5, 0.5, 0.5)
    glColor3f(1.0,0.0,1.0) #corner 2
    glNormal3f(0.57735027, -0.57735027, 0.57735027)
    glVertex3f(0.5, -0.5, 0.5)
    ...
    glEnd()
To get the indentation right some people write this like
    glBegin(GL_QUAD_STRIP)
    if 1:
        glColor3f(1.0,1.0,1.0) #corner 1
        glNormal3f(0.57735027, 0.57735027, 0.57735027)
        glVertex3f(0.5, 0.5, 0.5)
        glColor3f(1.0,0.0,1.0) #corner 2
        glNormal3f(0.57735027, -0.57735027, 0.57735027)
        glVertex3f(0.5, -0.5, 0.5)
        ...
    glEnd()
Better would be a try/finally so that exceptions don't trash the OpenGL state, like
    glBegin(GL_QUAD_STRIP)
    try:
        glColor3f(1.0,1.0,1.0) #corner 1
        glNormal3f(0.57735027, 0.57735027, 0.57735027)
        glVertex3f(0.5, 0.5, 0.5)
        glColor3f(1.0,0.0,1.0) #corner 2
        glNormal3f(0.57735027, -0.57735027, 0.57735027)
        glVertex3f(0.5, -0.5, 0.5)
        ...
    finally:
        glEnd()
This is exactly what the with statment was designed for - state modification and restoration. Consider this context manager
class QUAD_STRIP(object):
    @staticmethod
    def __enter__():
        glBegin(GL_QUAD_STRIP)
    @staticmethod
    def __exit__(*args):
        glEnd()
Assuming things work as I expect the GL_QUAD_STRIP code becomes
    with QUAD_STRIP:
        glColor3f(1.0,1.0,1.0) #corner 1
        glNormal3f(0.57735027, 0.57735027, 0.57735027)
        glVertex3f(0.5, 0.5, 0.5)
        glColor3f(1.0,0.0,1.0) #corner 2
        glNormal3f(0.57735027, -0.57735027, 0.57735027)
        glVertex3f(0.5, -0.5, 0.5)
        ...

I mentioned this to Mike Fletcher (of PyOpenGL fame) last year when the with statement was under development. His views were (paraphrased):

All good points and applicable to more than OpenGL code. This is a place where I would like to see more real-world experience.


Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me



Copyright © 2001-2020 Andrew Dalke Scientific AB