Parser Generators and Decorators
Years ago John Aycock's SPARK introduced the idea of using docstring to store the pattern and rule definitions for a parser generator. Rather, that's where I first heard of it; someone else could have been first.
Here's part of the scanner definition from Biopython for parsing GenBank locations:
def t_caret(self, input):
r" \^ "
self.rv.append(Token("caret"))
def t_comma(self, input):
r" \, "
self.rv.append(Token("comma"))
def t_integer(self, input):
r" -?[0-9]+ "
self.rv.append(Integer(int(input)))
Here's part of the parser
def p_base_position(self, args):
"""
base_position ::= integer
base_position ::= low_base_bound
base_position ::= high_base_bound
base_position ::= two_base_bound
"""
return args[0]
def p_low_base_bound(self, args):
"""
low_base_bound ::= greater_than integer
"""
return LowBound(args[1])
The idea of using the docstring was rather cute but a bit of a hack. That was the easiest way to associate data with a function. Modern Python has decorators and function object attributes are writeable. Has anyone revisted the approach except using a decorator instead of a docstring?
The above two examples would look something like this:
@exact_token("^")
def caret(self, input):
self.rv.append(Token("caret"))
@exact_token(",")
def comma(self, input):
self.rv.append(Token("comma"))
@token(r"-?[0-9]+")
def integer(self, input):
self.rv.append(Integer(int(input)))
where I made things a bit easier for authors by having a decorator for
exact string matches and a decorator for re-based tokenization.
the parser code would look like
@rule("base_position", "integer")
@rule("base_position", "low_base_bound")
@rule("base_position", "high_base_bound")
@rule("base_position", "two_base_bound")
def base_position(self, args):
return args[0]
@rule("low_base_bound", "greater_than integer")
def low_base_bound(self, args):
return LowBound(args[1])
or possibly allow a list for the rhs part of the rule, as in
@rule("base_position", ["integer",
"low_base_bound,
"high_base_bound",
"two_base_bound"])
def base_position(self, args):
return args[0]
@rule("low_base_bound", "greater_than integer")
def low_base_bound(self, args):
return LowBound(args[1])
I looked at SPARK for about 2 minutes to figure out if it was doable. It is, but would take a couple hours to do. SPARK is stable, meaning the code hasn't changed in years. John has updated the page recently but only to add a reference to another parsing project. I don't know if he's interested in this and will take diffs.
PLY has a similar structure to SPARK and decorators would work there. Perhaps that's a better choice.
Then again, I want to try out PyParsing, which looks like it uses ideas similar to that in my Martel project, which I took from Greg Ewing's Plex. I recall that ?!ng was also an early advocate of that style but would need to dig through archives for verification. (In other words, I was a late-comer to the game and make no suggestion that PyParsing based anything on Martel.)
Copyright © 2001-2008 Dalke Scientific Software, LLC.


