import re
import sys
import operator
import pprint as pretty_print
A few years ago, I tried to work my way through Structure and Interpretation of Computer Programs. It’s one of these books every Lisp person will tell you to read. You can actually tell it’s a classic because almost no one read it all the way through.
One thing that stuck with me is the concept of the metacircular evaluator. A metacircular evaluator is a fancy way to call how a Lisp interpreter that evaluates a Lisp expression recursively. There’s a lot of abstruse schemas on the Internet to explain it but for some reason I could never wrap my head around it.
The other day I stumbled upon Peter Norvig’s How to write a Lisp in Python, which took me a few years back. This time, I would figure it out! So, after a bit of Googling and some light reading on Lisp, I started working on a very simple evaluator, in Python.
Python is a great language for trying things out because it gets out of the way and gives us a lot of things out of the box, like memory management – having to write a garbage collector just to implement a basic Lisp doesn’t sound fun!
This is the source for this Lisp interpreter. It’s split in three parts:
import re
import sys
import operator
import pprint as pretty_print
pprint = lambda obj: pretty_print.PrettyPrinter(indent=4).pprint(obj)
def fail(s):
print s
sys.exit(-1)
These classes are mostly containers for our interpreter. They make the code a little clearer to write.
class InterpreterObject(object):
def __init__(self, value):
self.value = value
def __repr__(self):
return self.value
class Symbol(InterpreterObject):
pass
class String(InterpreterObject):
pass
class Lambda(InterpreterObject):
def __init__(self, arguments, code):
self.arguments = arguments
self.code = code
def __repr__(self):
return "(lambda (%s) (%s)" % (self.arguments, self.code)
def tokenize(s):
ret = []
in_string = False
current_word = ''
The algorithm is simple: we scan the string char by char and look if we’re at a word, string or s-expr boundary. Depending on the case, we either add a new char to the current word, create a new word or a new sublist. The algorithm is complicated by the fact that we have several delimiters: spaces, simple quotes and braces, which makes it some sort of weird state machine.
for i, char in enumerate(s):
if char == "'":
if in_string is False:
in_string = True
current_word += char
else:
in_string = False
current_word += char
ret.append(current_word)
current_word = ''
elif in_string is True:
current_word += char
elif char in ['\t', '\n', ' ']:
continue
elif char in ['(', ')']:
ret.append(char)
else:
current_word += char
if i < len(s) - 1 and s[i+1] in ['(', ')', ' ', '\n', '\t']:
ret.append(current_word)
current_word = ''
return ret
Next are a handful of utility functions that will help us convert tokens to their actual values.
def is_integer(s):
try:
int(s)
return True
except ValueError:
return False
def is_float(s):
try:
float(s)
return True
except ValueError:
return False
def is_string(s):
if s[0] == "'" and s[-1] == "'":
return True
return False
The parse
function is actually a wrapper for do_parse
– I wanted
to pass an iterator around because it felt a lot nicer than passing
raw array indices around.
def parse(tokens):
itert = iter(tokens)
token = itert.next()
if token != '(':
fail("Unexpected token {}".format(token))
return do_parse(itert)
def do_parse(tokens):
ret = []
current_sexpr = None
in_sexp = False
for token in tokens:
if token == '(':
ret.append(do_parse(tokens))
elif token == ')':
return ret
elif is_integer(token):
ret.append(int(token))
elif is_float(token):
ret.append(float(token))
elif is_string(token):
ret.append(String(token[1:][0:-1]))
else:
ret.append(Symbol(token))
The interpreter is broken into two functions, eval
and apply
. Both take an s-expression (expr
) and a dictionary of variables in scope (environment
) as parameters.
eval
‘s role is to take an expression and return its value. For example, if you pass a symbol to eval
, it will look up its value in the symbol table and return it.
apply
is reserved for evaluating functions. It takes as parameters a function (written in Lisp or Python), a list of arguments and calls the function. How does it do that? It simply updates the environment
to define the function’s parameters as local variables, and then calls eval
!
def eval(expr, environment):
First, let’s define how to evaluate numbers, strings and Symbols.
if isinstance(expr, int):
return expr
elif isinstance(expr, str):
return expr
elif isinstance(expr, float):
return expr
elif isinstance(expr, String):
return expr.value
elif isinstance(expr, Symbol):
if expr.value not in environment:
fail("Couldn't find symbol {}".format(expr.value))
return environment[expr.value]
elif isinstance(expr, list):
Most of the language’s built-ins are defined as Python code
but we need to handle some language constructs like lambda
or if
directly
in the interpreter, because they require a specific evaluation order.
For example if we defined if
as a function,
this expression (if (= 3 2) (print '3 = 2') (print '3 = 3'))
would print both
3 = 2 and 3 = 3, because the eval function evaluates its arguments in order.
if isinstance(expr[0], Symbol):
if expr[0].value == 'lambda':
arg_names = expr[1]
code = expr[2]
Lambda
is simply an object that holds arguments
(a list of arguments with their names) and code
(a list of Lisp instructions to execute).
return Lambda(arg_names, code)
elif expr[0].value == 'if':
condition = expr[1]
then = expr[2]
_else = None
if len(expr) == 4:
_else = expr[3]
if eval(condition, environment) != False:
return eval(then, environment)
elif _else is not None:
return eval(_else, environment)
elif expr[0].value == 'define':
name = expr[1].value
value = eval(expr[2], environment)
environment[name] = value
elif expr[0].value == 'begin':
for ex in expr[1:]:
eval(ex, environment)
else:
fn = eval(expr[0], environment)
args = [eval(arg, environment) for arg in expr[1:]]
return apply(fn, args, environment)
Apply is pretty simple too. It checks if a function is an interpreter built-in or not.
def apply(fn, args, environment):
If it is, it simply passes the arguments to the built-in.
if callable(fn):
return fn(*args)
Otherwise we have to actually evaluate the function.
if isinstance(fn, Lambda):
new_env = dict(environment)
if len(args) != len(fn.arguments):
fail("Mismatched number of arguments to lambda")
To do this we bind the values of the arguments to the environment.
for i in range(len(fn.arguments)):
new_env[fn.arguments[i].value] = args[i]
And we call eval on the function body. That’s it!
return eval(fn.code, new_env)
Finally we define a handful of system built-ins.
base_environment = {
'+': operator.add,
'-': operator.sub,
'*': operator.mul,
'/': operator.div,
'>': operator.gt,
'>=': operator.ge,
'<': operator.lt,
'<=': operator.le,
'=': operator.eq,
'!=': operator.ne,
'nil': None,
'print': lambda x: sys.stdout.write(str(x) + '\n'),
}
def main():
if len(sys.argv) != 2:
print "usage: python {} <file>".format(sys.argv[0])
sys.exit(-1)
with open(sys.argv[1]) as fd:
contents = fd.read()
parsed = parse(tokenize(contents))
eval(parsed, base_environment)
if __name__ == '__main__':
main()
As you can see, it turns out its pretty easy to make your own interpreter! As a general rule, when you find yourself stumped by a weird concept, the easiest way to figure out how it works is to implement a basic prototype using it.
If you have questions, don’t hesitate to send me an email to “hello” @ this website’s domain name.