I/O location and error handling

[ previous | newer ] /home/writings/diary/archive/2014/07/12/io_location_and_errors

I/O location and error handling

I've been interested in a thorny question of how to handle errors and location information in file I/O, especially as it applies to chemical structure file formats. I'll organize my ideas around a parser for the /etc/passwd file on my Mac.

Some of the lines from my passwd file are:

# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh

Comment lines start with a "#". Each record is on a single line, with colon separated fields.

A simple parser

This is very easy to parse. Here's an example, along with a driver:

from __future__ import print_function

class PasswdEntry(object):
    def __init__(self, name, passwd, uid, gid, gecos, dir, shell):
        self.name = name
        self.passwd = passwd
        self.uid = uid
        self.gid = gid
        self.gecos = gecos
        self.dir = dir
        self.shell = shell
        
def read_passwd_entries(infile):
    for line in infile:
        line = line.rstrip("\n")  # remove trailing newline
        # Ignore comments and blank lines
        if line[:1] == "#" or line.strip() == "": 
            continue
     
        name, passwd, uid, gid, gecos, dir, shell = line.split(":")
        yield PasswdEntry(name, passwd, uid, gid, gecos, dir, shell)

def main():
    import sys
    filename = "/etc/passwd"
    if sys.argv[1:]:
      filename = sys.argv[1]
    with open(filename) as passwd_file:
        for entry in read_passwd_entries(passwd_file):
            print(entry.name, "=>", entry.shell)

if __name__ == "__main__":
    main()

The output when I run this starts with:

nobody => /usr/bin/false
root => /bin/sh
daemon => /usr/bin/false

Great! I have working code which handles the wide majority of things I want to do with a passwd file.

Errors should match the API level

But it doesn't handle everything. For example, what happens if a file contains a poorly formatted line? When I test with the following line:

dalke:*:-2:-2:Andrew Dalke:/var/empty

I get the output:

% python entries.py bad_passwd 
nobody => /usr/bin/false
Traceback (most recent call last):
  File "entries.py", line 33, in <module>
    main()
  File "entries.py", line 29, in main
    for entry in read_passwd_entries(passwd_file):
  File "entries.py", line 20, in read_passwd_entries
    name, passwd, uid, gid, gecos, dir, shell = line.split(":")
ValueError: need more than 6 values to unpack

This isn't very helpful. Which record failed, and which line do I need to fix?

It isn't hard to modify read_passwd_entries() to make the error message match the API level:

def read_passwd_entries(infile):
    for lineno, line in enumerate(infile, 1):
        line = line.rstrip("\n")  # remove trailing newline
        # Ignore comments and blank lines
        if line[:1] == "#" or line.strip() == "": 
            continue

        try:
            name, passwd, uid, gid, gecos, dir, shell = line.split(":")
        except ValueError:
            name = getattr(infile, "name", None)
            if name is not None:
                where = " of %r" % (name,)
            else:
                where = ""
            raise ValueError("Cannot parse password entry on line %d%s: %r"
                             % (lineno, where, line))
        yield PasswdEntry(name, passwd, uid, gid, gecos, dir, shell)

The error behavior is now:

% python entries.py bad_passwd
nobody => /usr/bin/false
Traceback (most recent call last):
  File "entries.py", line 42, in <module>
    main()
  File "entries.py", line 38, in main
    for entry in read_passwd_entries(passwd_file):
  File "entries.py", line 29, in read_passwd_entries
    % (lineno, where, line))
ValueError: Cannot parse password entry on line 12 of 'bad_passwd': 'dalke:*:-2:-2:Andrew Dalke:/var/empty'

Specify error handling policy

That's beter, but sometimes I want the parser to ignore an error record and continue processing, instead of stopping. This is frequently needed in chemistry file formats. While the formats are pretty well defined, different chemistry toolkits have different chemistry models and different levels of strictness. RDKit, for example, does not support hexavalent carbon, while OEChem does accept non-realistic valences.

The failure rates in structure files are low, with failures in perhaps 1 in 10,000 structures, depending on the data source and toolkit combination. The usual policy is to report a warning message and continue.

The solution I like is to have a parameter which describes the error handling policy. By default I'll say it's "strict", which stops processing. The "warn" policy prints a message to stderr, and "ignore" just skips the record. (It's not hard to think of other policies, or to support user-defined error handler objects in addition to strings.)

I modified the parser code to handle these three policies, and updated the driver so you could specify the policies on the command-line. Here's the complete program:

from __future__ import print_function

import sys
import argparse

class PasswdEntry(object):
    def __init__(self, name, passwd, uid, gid, gecos, dir, shell):
        self.name = name
        self.passwd = passwd
        self.uid = uid
        self.gid = gid
        self.gecos = gecos
        self.dir = dir
        self.shell = shell

# Define different error handlers, and a common error formatter.

def _format_msg(lineno, source, line):
    if source is None:
        where = "on line %d" % (lineno, source)
    else:
        where = "on line %d of %r" % (lineno, source)
    return "Cannot parse record %s: %r" % (where, line)

def ignore_handler(lineno, source, line):
    pass

def warn_handler(lineno, source, line):
    msg = _format_msg(lineno, source, line)
    sys.stderr.write(msg + "\n")

def strict_handler(lineno, source, line):
    msg = _format_msg(lineno, source, line)
    raise ValueError(msg)

error_handlers = {
    "ignore": ignore_handler,
    "warn": warn_handler,
    "strict": strict_handler,
}
        
def read_passwd_entries(infile, errors="warn"):
    # Get the error handler for the given policy.
    # (A more sophisticated solution might support a user-defined
    # error handler as well as a string.)
    try:
        error_handler = error_handlers[errors]
    except KeyError:
        raise ValueError("Unsupported errors value %r" % (errors,))
    
    for lineno, line in enumerate(infile, 1):
        line = line.rstrip("\n")  # remove trailing newline
        # Ignore comments and blank lines
        if line[:1] == "#" or line.strip() == "": 
            continue

        try:
            name, passwd, uid, gid, gecos, dir, shell = line.split(":")
        except ValueError:
            # Handle the failure
            source_name = getattr(infile, "name", None)
            error_handler(lineno, source_name, line)
            # If we get here then continue to the next record
            continue
        
        yield PasswdEntry(name, passwd, uid, gid, gecos, dir, shell)

# Driver code to help test the library

parser = argparse.ArgumentParser(
    description = "List username/shell entries in a password file")
parser.add_argument("--errors", choices = ["ignore", "warn", "strict"],
                    default = "strict",
                    help = "Specify the error handling policy")
parser.add_argument("filename", nargs="?", default=["/etc/passwd"],
                    help = "Password file to parse")
                  
def main():
    args = parser.parse_args()
    filename = args.filename[0]
    try:
        with open(filename) as passwd_file:
            try:
                for entry in read_passwd_entries(passwd_file, errors=args.errors):
                    print(entry.name, "=>", entry.shell)
            except ValueError as err:
                raise SystemExit("ERROR with password entry: %s" % (err,))
    except IOError as err:
        raise SystemExit("ERROR with password file: %s" % (err,))
        

if __name__ == "__main__":
    main()

Let's see it in action. First, the default, which raises a ValueError. The main driver catches the ValueError, reports the error message, and exits:

% python entries.py bad_passwd 
nobody => /usr/bin/false
ERROR with password entry: Cannot parse record on line 12 of 'bad_passwd': 'dalke:*:-2:-2:Andrew Dalke:/var/empty'

Next, I'll specify the "warn" handler. (The warning message comes first because stderr does not buffer while stdout does.)

% python entries.py bad_passwd --errors warn | head -4
Cannot parse record on line 12 of 'bad_passwd': 'dalke:*:-2:-2:Andrew Dalke:/var/empty'
nobody => /usr/bin/false
root => /bin/sh
daemon => /usr/bin/false
_uucp => /usr/sbin/uucico

Finally, ignore all errors and keep on processing:

% python entries.py bad_passwd --errors ignore | head -4
nobody => /usr/bin/false
root => /bin/sh
daemon => /usr/bin/false
_uucp => /usr/sbin/uucico

API access to location information

Unfortunately, there are still a few things the parser API doesn't support. For example, which line contains the "root" record? The parser tracks the line number, but there's no way for the caller to get access to that information.

My solution was to develop a "Location" object. You might have noticed that source filename, current line number, and line content were all passed around together. This is often a hint that those parameters should be part of the same data type, rather than individually. I tried it out, and found the result much easier to understand.

Here's the new Location class:

class Location(object):
    def __init__(self, source=None, lineno=None, line=None):
        self.source = source
        self.lineno = lineno
        self.line = line

    def where(self):
        source = self.source
        if source is None:
            return "line %s" % self.lineno
        else:
            return "line %d of %r" % (self.lineno, source)

It's also nice that I could abstract the "where()" code into a function, so it's the only place which needs to worry about an unknown source name.

The error handlers are now easier, because they only take a single location argument, and because the "where()" method hides some of the complexity:

# Define different error handlers, and a common error formatter.

def _format_msg(location):
    return "Cannot parse record %s: %s" % (location.where(), location.line)

def ignore_handler(location):
    pass

def warn_handler(location):
    msg = _format_msg(location)
    sys.stderr.write(msg + "\n")

def strict_handler(location):
    msg = _format_msg(location)
    raise ValueError(msg)

The parser code has a few changes. I want people to be able to pass in their own location tracker, or if not specified, to create my own local one. More importantly, I need to see the location's "lineno" and "line" properties before passing control either to the error handler or back to the caller. I also decided that the final value of "location.lineno" should be the number of lines in the file, and if the file is empty then it should be 0.

Here's the updated parser code:

def read_passwd_entries(infile, errors="warn", location=None):        
    # Get the error handler for the given policy.
    # (A more sophisticated solution might support a user-defined
    # error handler as well as a string.)
    try:
        error_handler = error_handlers[errors]
    except KeyError:
        raise ValueError("Unsupported errors value %r" % (errors,))

    if location is None:
        location = Location(getattr(infile, "name", None))
    
    lineno = 0  # define lineno even if the file is empty

    for lineno, line in enumerate(infile, 1):
        line = line.rstrip("\n")  # remove trailing newline
        # Ignore comments and blank lines
        if line[:1] == "#" or line.strip() == "": 
            continue

        # Track where we are
        location.lineno = lineno
        location.line = line

        try:
            name, passwd, uid, gid, gecos, dir, shell = line.split(":")
        except ValueError:
            # Handle the failure
            error_handler(location)
            # If we get here then continue to the next record
            continue
        
        yield PasswdEntry(name, passwd, uid, gid, gecos, dir, shell)

    # Save the final line number
    location.lineno = location

It's getting more complex, but I think you can still follow the control flow.

I made some changes to the driver code to test the new location API. If you specify "--with-lineno" then each output line will start with the line number for the given record.

Here's the updated driver code:

parser = argparse.ArgumentParser(
    description = "List username/shell entries in a password file")
parser.add_argument("--errors", choices = ["ignore", "warn", "strict"],
                    default = "strict",
                    help = "Specify the error handling policy")
parser.add_argument("--with-lineno", action="store_true",
                    help="include line numbers in the output")
parser.add_argument("filename", nargs="?", default=["/etc/passwd"],
                    help = "Password file to parse")
                  
def main():
    args = parser.parse_args()
    filename = args.filename[0]
    location = Location(filename)
    if args.with_lineno:
        output_fmt = "{location.lineno}: {entry.name} => {entry.shell}"
    else:
        output_fmt = "{entry.name} => {entry.shell}"
    try:
        with open(filename) as passwd_file:
            try:
                for entry in read_passwd_entries(
                        passwd_file, errors=args.errors, location=location):
                    print(output_fmt.format(entry=entry, location=location))
            except ValueError as err:
                raise SystemExit("ERROR with password entry: %s" % (err,))
    except IOError as err:
        raise SystemExit("ERROR with password file: %s" % (err,))

Now to see if the new code works. First, run it with the defaults to get the same output as before:

% python entries.py | head -4
nobody => /usr/bin/false
root => /bin/sh
daemon => /usr/bin/false
_uucp => /usr/sbin/uucico

and then with option to show the line numbers:

% python entries.py --with-lineno | head -4
11: nobody => /usr/bin/false
12: root => /bin/sh
13: daemon => /usr/bin/false
14: _uucp => /usr/sbin/uucico

Yay!

A PasswdReader iterator, with location

I think this API is still complicated. The caller must create the Location instance and pass it to read_passwd_entries() in order to the location information. The thing is, even if the location is None, the function ends up creating a Location in order to have good error reporting. What about making that otherwise internal location public, so I can use it as the default location, instead of always specifying it when I need it?

I did this with a bit of wrapping. The read_passwd_entries() function API returns an iterator of PasswdEntry instances. The iterator is currently implemented using a generator, but it doesn't have to be an generatore. I'll instead have it return a PasswdReader instance, where PasswdReader.location gives the location, and iterating over the PasswdReader gives the PasswdEntry elements from the underlying generator.

The PasswdReader iterator class, which takes the location and the underlying generator, is as follows:

class PasswdReader(object):
    def __init__(self, location, reader):
        self.location = location
        self._reader = reader

    def __iter__(self):
        return self._reader

    # For Python 2
    def next(self):
        return self._reader.next()

    # For Python 3
    def __next__(self):
        return next(self._reader)

I also modified read_passwd_entries() so it returns this new iterator instead of the original generator. I've found that the easiest way is to break the original function into two parts. The first does parameter validation and normalization, and the second is the actual generator:

def read_passwd_entries (infile, errors="warn", location=None):
    # Get the error handler for the given policy.
    # (A more sophisticated solution might support a user-defined
    # error handler as well as a string.)
    try:
        error_handler = error_handlers[errors]
    except KeyError:
        raise ValueError("Unsupported errors value %r" % (errors,))
    
    if location is None:
        location = Location(getattr(infile, "name", None))

    return PasswdReader(location,
                        _read_passwd_entries(infile, error_handler, location))
    
def _read_passwd_entries(infile, error_handler, location):
    lineno = 0  # define lineno even if the file is empty
    for lineno, line in enumerate(infile, 1):
       ...

The main reason to split it up this way is to have eager evaluation for the parameter checking, and lazy evaluation for the actual reader.

PasswdReader iterator as context manager

The end result is a simpler driver code, though it's still not as simple as I would like. The old code was something like:

    location = Location(filename)
    ...
        with open(filename) as passwd_file:
            try:
                for entry in read_passwd_entries(
                        passwd_file, errors=args.errors, location=location):
                    print(output_fmt.format(entry=entry, location=location))
            except ValueError as err:
                raise SystemExit("ERROR with password entry: %s" % (err,))

while the new driver only needs to get the "location" field. The critical core in the new driver is:

        with open(filename) as passwd_file:
            reader = read_passwd_entries(passwd_file, errors=args.errors)
            location = reader.location
            try:
                for entry in reader:
                    print(output_fmt.format(entry=entry, location=location))
            except ValueError as err:
                raise SystemExit("ERROR with password entry: %s" % (err,))

To make it simpler still, I'll include the open() functionality as part of the read_passwd_entries() API. That is, if the first parameter is a string then I'll assume it's a file name and open the file myself, otherwise I'll assume it's a Python file object. (In CS terms, I'll make the function polymorphic on the first parameter.)

That is, I want the driver code to look like this:

        with read_passwd_entries(filename, errors=args.errors) as reader:
            location = reader.location
            try:
                for entry in reader:
                    print(output_fmt.format(entry=entry, location=location))
            except ValueError as err:
                raise SystemExit("ERROR with password entry: %s" % (err,))

That means that the object returned from read_passwd_entries() must also support the context manager API. The simplest (and wrong) version adds dummy __enter__ and __exit__ methods to the PasswdReader, like this:

# WARNING: incomplete code
class PasswdReader(object):
    def __init__(self, location, reader):
        self.location = location
        self._reader = reader

    ...
    def __enter__(self):
        return self

    def __exit__(self, *args):
        pass

This is wrong because if the reader is given a filename and calls open() then the context manager must call the corresponding close() in the __exit__, while if the reader is given a file object then the __exit__ should not do anything to the file.

The more correct class takes a "close" parameter which, if not None, is called during __exit__:

class PasswdReader(object):
    def __init__(self, location, reader, close):
        self.location = location
        self._reader = reader
        self._close = close

    ...

    def __enter__(self):
        return self

    def __exit__(self, *args):
        if self._close is not None:
            self._close()
            self._close = None

I also need to change read_passwd_entries() to do the right thing based on the type of the first parameter. I've renamed the parameter to "source", and added a bit of special checking to handle the correct type check for both Python 2 and Python 3. Here are the core changes:

try:
    _basestring = basestring
except NameError:
    _basestring = str

def read_passwd_entries(source, errors="warn", location=None):
    if isinstance(source, _basestring):
        infile = open(source)
        source_name = source
        close = infile.close
    else:
        infile = source
        source_name = getattr(infile, "name", None)
        close = None
    
    # Get the error handler for the given policy.
    # (A more sophisticated solution might support a user-defined
    # error handler as well as a string.)
    try:
        error_handler = error_handlers[errors]
    except KeyError:
        raise ValueError("Unsupported errors value %r" % (errors,))
    
    if location is None:
        location = Location(source_name)

    return PasswdReader(location,
                        _read_passwd_entries(infile, error_handler, location),
                        close)

The final version

Enough has changed that I'll show the full code instead of you trying to piece the parts together yourself.

from __future__ import print_function

import sys
import argparse

class PasswdReader(object):
    def __init__(self, location, reader, close):
        self.location = location
        self._reader = reader
        self._close = close

    def __iter__(self):
        return self._reader

    # For Python 2
    def next(self):
        return self._reader.next()

    # For Python 3
    def __next__(self):
        return next(self._reader)

    def __enter__(self):
        return self

    def __exit__(self, *args):
        if self._close is not None:
            self._close()
            self._close = None

class PasswdEntry(object):
    def __init__(self, name, passwd, uid, gid, gecos, dir, shell):
        self.name = name
        self.passwd = passwd
        self.uid = uid
        self.gid = gid
        self.gecos = gecos
        self.dir = dir
        self.shell = shell

class Location(object):
    def __init__(self, source=None, lineno=None, line=None):
        self.source = source
        self.lineno = lineno
        self.line = line

    def where(self):
        source = self.source
        if source is None:
            return "line %s" % self.lineno
        else:
            return "line %d of %r" % (self.lineno, source)
            

# Define different error handlers, and a common error formatter.

def _format_msg(location):
    return "Cannot parse record %s: %s" % (location.where(), location.line)

def ignore_handler(location):
    pass

def warn_handler(location):
    msg = _format_msg(location)
    sys.stderr.write(msg + "\n")

def strict_handler(location):
    msg = _format_msg(location)
    raise ValueError(msg)

error_handlers = {
    "ignore": ignore_handler,
    "warn": warn_handler,
    "strict": strict_handler,
}

# Main API call to read from a passwd file

try:
    _basestring = basestring
except NameError:
    _basestring = str

def read_passwd_entries(source, errors="warn", location=None):
    if isinstance(source, _basestring):
        infile = open(source)
        source_name = source
        close = infile.close
    else:
        infile = source
        source_name = getattr(infile, "name", None)
        close = None
    
    # Get the error handler for the given policy.
    # (A more sophisticated solution might support a user-defined
    # error handler as well as a string.)
    try:
        error_handler = error_handlers[errors]
    except KeyError:
        raise ValueError("Unsupported errors value %r" % (errors,))
    
    if location is None:
        location = Location(source_name)

    return PasswdReader(location,
                        _read_passwd_entries(infile, error_handler, location),
                        close)

# The actual passwd file parser, as a generator used by the PasswdReader.

def _read_passwd_entries(infile, error_handler, location):
    lineno = 0  # define lineno even if the file is empty
    for lineno, line in enumerate(infile, 1):
        line = line.rstrip("\n")  # remove trailing newline
        # Ignore comments and blank lines
        if line[:1] == "#" or line.strip() == "": 
            continue

        # Track where we are
        location.lineno = lineno
        location.line = line

        try:
            name, passwd, uid, gid, gecos, dir, shell = line.split(":")
        except ValueError:
            # Handle the failure
            error_handler(location)
            # If we get here then continue to the next record
            continue
        
        yield PasswdEntry(name, passwd, uid, gid, gecos, dir, shell)

    # Save the final line number
    location.lineno = lineno

# Driver code to help test the library

parser = argparse.ArgumentParser(
    description = "List username/shell entries in a password file")
parser.add_argument("--errors", choices = ["ignore", "warn", "strict"],
                    default = "strict",
                    help = "Specify the error handling policy")
parser.add_argument("--with-lineno", action="store_true",
                    help="include line numbers in the output")
parser.add_argument("filename", nargs="?", default=["/etc/passwd"],
                    help = "Password file to parse")
                  
def main():
    args = parser.parse_args()
    filename = args.filename[0]
    if args.with_lineno:
        output_fmt = "{location.lineno}: {entry.name} => {entry.shell}"
    else:
        output_fmt = "{entry.name} => {entry.shell}"
    try:
        with read_passwd_entries(filename, errors=args.errors) as reader:
            location = reader.location
            try:
                for entry in reader:
                    print(output_fmt.format(entry=entry, location=location))
            except ValueError as err:
                raise SystemExit("ERROR with password entry: %s" % (err,))
    except IOError as err:
        raise SystemExit("ERROR with password file: %s" % (err,))

if __name__ == "__main__":
    main()

The result is a lot more complex than the original 8 line parser, because it's a lot more configurable and provides more information. You can also see how to expand it futher, like tracking the current record as another location field. In my chemistry parsers, I also track byte positions, so users can determine out that record XYZ123 is between bytes M and N.

Location through accessor functions

Unfortunately, this more complex API, with location tracking, is also a bit slower than the simple parser, and trickier to write. For example, the following lines:

        # Track where we are
        location.lineno = lineno
        location.line = line

always occur, even when the location information isn't needed, because the parser doesn't know if the API caller will need the information. In addition to the extra processing overhead, I found that a more complex format might have multiple branches for the different processing and error paths, each of which need to set location information correctly. This is error prone.

There are a few ways to minimize or shift that overhead, at the expensive of yet more code. In my chemical structure parsers, I expect that most people won't want any of this information, so I made the location code more expensive to get and less expensive to track.

What I did was to change the Location class so I can register a handler for each attribute. To get the value for "lineno", the location calls the associated handler. This function can be defined inside of the scope of the actual reader function, so it has access to the internal "lineno" variable. This means that reader function doesn't need to set a location.lineno, at the expense of an extra function call lookup to get the value.

I'll sketch what I mean, so you can get an idea. This essay is long enough as it is, so I'll only give a flavor and not show the full implementation.

Here's a variation of the Location class, with the ability to register the "lineno" property:

class Location(object):
    def __init__(self, filename=None):
        self.filename = filename
        self._get_lineno = lambda: None
    
    def register(self, **kwargs):
        if "get_lineno" in kwargs:
            self._get_lineno = kwargs["get_lineno"]
    
    def save(self, **kwargs):
        if "lineno" in kwargs:
            self._get_lineno = lambda value=kwargs["value"]: value
    
    @property
    def lineno(self):
        return self._get_lineno()

The updated reader code, which registers the handler and cleans up aftwards, is:

def read_passwd_entries (source, errors="warn", location=None):
    ...
    reader = PasswdReader(location,
                        _read_passwd_entries(infile, error_handler, location),
                        close)
    next(reader) # prime the reader
    return reader

def _read_passwd_entries (infile, error_handler, location):
    lineno = 0
    def get_lineno():
        return lineno
    
    location.register(get_lineno = get_lineno)
    yield "ready" # ready to start reading
    
    try:
        for lineno, line in enumerate(infile, 1):
            ....
    finally:
        location.save(lineno=lineno)

Property registration by forcing the generator to start

The biggest complication with this approach is the timing order of location registration. Registration must occur inside of the generator, in order to get access to the local scope. However, the code in the generator won't be executed until the first item is needed.

Consider for example:

with read_passwd_entries(filename) as reader:
    print(reader.lineno)

You should expect this to print either 0 or 1 (I think it should print 0). But if the generator hasn't started then the get_lineno isn't registered, so the default value of None will be returned.

Instead, I prime the generator in read_passwd_entries() by doing:

    next(reader)

This forces the generator to execute, up the first "yield" statement, which is:

    yield "ready" # ready to start reading

(The value of "ready" is thrown away.)

Removing reference cycles

The other complication is the cyclical memory references. The _read_passwd_entries() generator has a reference to the location instance as a local variable, the location instance has a reference to the get_lineno() function, the get_lineno() function knows the outer scope, which is in the _read_passwd_entries() generator. While Python has garbage collection which can handle those cycles, some of the location properties can be complex objects, like file handles and molecule objects, which use resoures that aren't fully exposed to Python.

The try/finally block exists to break the cycle. The finally removes the link to get_lineno() and replaces it with the final value for lineno.

Influence from XML processing

My ideas are influenced by the SAX2 API, which you can see by looking at its Locator class, and to a lesser extent the ErrorHandler class. (I once implemented an entire bioinformatics parsing system called Martel, based on SAX events.)

One of the biggest differences is that XML and SAX deals with potentially deep tree structures, while I only work with a linear sequence of records, turned into molecules. My thoughts about an API which is better for this data were originally guided by Fredrik Lundh's iterparse.

Andrew Dalke is an independent consultant focusing on software development for computational chemistry and biology. Need contract programming, help, or training? Contact me