lisp.py

lisp.py
¶

A few years ago, I tried to work my way through Structure and Interpretation of Computer Programs. It’s one of these books every Lisp person will tell you to read. You can actually tell it’s a classic because almost no one read it all the way through.
¶

One thing that stuck with me is the concept of the metacircular evaluator. A metacircular evaluator is a fancy way to call how a Lisp interpreter that evaluates a Lisp expression recursively. There’s a lot of abstruse schemas on the Internet to explain it but for some reason I could never wrap my head around it.
¶

The other day I stumbled upon Peter Norvig’s How to write a Lisp in Python, which took me a few years back. This time, I would figure it out! So, after a bit of Googling and some light reading on Lisp, I started working on a very simple evaluator, in Python.
¶

Python is a great language for trying things out because it gets out of the way and gives us a lot of things out of the box, like memory management – having to write a garbage collector just to implement a basic Lisp doesn’t sound fun!
¶

This is the source for this Lisp interpreter. It’s split in three parts:
1. Utility functions and data structures
2. Parser
3. Interpreter
```
import re
import sys
import operator
import pprint as pretty_print
```

Utility functions and data structures

pprint = lambda obj: pretty_print.PrettyPrinter(indent=4).pprint(obj)

def fail(s):
    print s
    sys.exit(-1)

These classes are mostly containers for our interpreter. They make the code a little clearer to write.

class InterpreterObject(object):
    def __init__(self, value):
        self.value = value

    def __repr__(self):
        return self.value

class Symbol(InterpreterObject):
    pass

class String(InterpreterObject):
    pass

class Lambda(InterpreterObject):
    def __init__(self, arguments, code):
        self.arguments = arguments
        self.code = code

    def __repr__(self):
        return "(lambda (%s) (%s)" % (self.arguments, self.code)

Parser

The parser is pretty simple because lisp has an extremely simple grammar.

def tokenize(s):
    ret = []
    in_string = False
    current_word = ''

The algorithm is simple: we scan the string char by char and look if we’re at a word, string or s-expr boundary. Depending on the case, we either add a new char to the current word, create a new word or a new sublist. The algorithm is complicated by the fact that we have several delimiters: spaces, simple quotes and braces, which makes it some sort of weird state machine.

    for i, char in enumerate(s):
        if char == "'":
            if in_string is False:
                in_string = True
                current_word += char
            else:
                in_string = False
                current_word += char
                ret.append(current_word)
                current_word = ''

        elif in_string is True:
            current_word += char

        elif char in ['\t', '\n', ' ']:
            continue

        elif char in ['(', ')']:
            ret.append(char)

        else:
            current_word += char
            if i < len(s) - 1 and s[i+1] in ['(', ')', ' ', '\n', '\t']:
                ret.append(current_word)
                current_word = ''

    return ret

Next are a handful of utility functions that will help us convert tokens to their actual values.

def is_integer(s):
    try:
        int(s)
        return True
    except ValueError:
        return False

def is_float(s):
    try:
        float(s)
        return True
    except ValueError:
        return False


def is_string(s):
    if s[0] == "'" and s[-1] == "'":
        return True

    return False

The parse function is actually a wrapper for do_parse – I wanted to pass an iterator around because it felt a lot nicer than passing raw array indices around.

def parse(tokens):
    itert = iter(tokens)
    token = itert.next()

    if token != '(':
        fail("Unexpected token {}".format(token))

    return do_parse(itert)


def do_parse(tokens):
    ret = []

    current_sexpr = None
    in_sexp = False

    for token in tokens:
        if token == '(':
            ret.append(do_parse(tokens))
        elif token == ')':
            return ret
        elif is_integer(token):
            ret.append(int(token))
        elif is_float(token):
            ret.append(float(token))
        elif is_string(token):
            ret.append(String(token[1:][0:-1]))
        else:
            ret.append(Symbol(token))

¶

Interpreter

The interpreter is broken into two functions, eval and apply. Both take an s-expression (expr) and a dictionary of variables in scope (environment) as parameters.

eval‘s role is to take an expression and return its value. For example, if you pass a symbol to eval, it will look up its value in the symbol table and return it.

apply is reserved for evaluating functions. It takes as parameters a function (written in Lisp or Python), a list of arguments and calls the function. How does it do that? It simply updates the environment to define the function’s parameters as local variables, and then calls eval!
```
def eval(expr, environment):
```

First, let’s define how to evaluate numbers, strings and Symbols.

    if isinstance(expr, int):
        return expr
    elif isinstance(expr, str):
        return expr
    elif isinstance(expr, float):
        return expr
    elif isinstance(expr, String):
        return expr.value
    elif isinstance(expr, Symbol):
        if expr.value not in environment:
            fail("Couldn't find symbol {}".format(expr.value))
        return environment[expr.value]
    elif isinstance(expr, list):

¶

Most of the language’s built-ins are defined as Python code but we need to handle some language constructs like lambda or if directly in the interpreter, because they require a specific evaluation order.

For example if we defined if as a function, this expression (if (= 3 2) (print '3 = 2') (print '3 = 3')) would print both 3 = 2 and 3 = 3, because the eval function evaluates its arguments in order.
```
        if isinstance(expr[0], Symbol):
            if expr[0].value == 'lambda':
                arg_names = expr[1]
                code = expr[2]
```

Lambda is simply an object that holds arguments (a list of arguments with their names) and code (a list of Lisp instructions to execute).

                return Lambda(arg_names, code)

            elif expr[0].value == 'if':
                condition = expr[1]
                then = expr[2]
                _else = None
                if len(expr) == 4:
                    _else = expr[3]

                if eval(condition, environment) != False:
                    return eval(then, environment)
                elif _else is not None:
                    return eval(_else, environment)

            elif expr[0].value == 'define':
                name = expr[1].value
                value = eval(expr[2], environment)
                environment[name] = value

            elif expr[0].value == 'begin':
                for ex in expr[1:]:
                    eval(ex, environment)

            else:
                fn = eval(expr[0], environment)
                args = [eval(arg, environment) for arg in expr[1:]]
                return apply(fn, args, environment)

¶

Apply is pretty simple too. It checks if a function is an interpreter built-in or not.
```
def apply(fn, args, environment):
```

If it is, it simply passes the arguments to the built-in.

    if callable(fn):
        return fn(*args)

Otherwise we have to actually evaluate the function.

    if isinstance(fn, Lambda):
        new_env = dict(environment)
        if len(args) != len(fn.arguments):
            fail("Mismatched number of arguments to lambda")

To do this we bind the values of the arguments to the environment.

        for i in range(len(fn.arguments)):
            new_env[fn.arguments[i].value] = args[i]

¶

And we call eval on the function body. That’s it!
```
        return eval(fn.code, new_env)
```

Finally we define a handful of system built-ins.

base_environment = {
    '+': operator.add,
    '-': operator.sub,
    '*': operator.mul,
    '/': operator.div,
    '>': operator.gt,
    '>=': operator.ge,
    '<': operator.lt,
    '<=': operator.le,
    '=': operator.eq,
    '!=': operator.ne,
    'nil': None,
    'print': lambda x: sys.stdout.write(str(x) + '\n'),
}


def main():
    if len(sys.argv) != 2:
        print "usage: python {} <file>".format(sys.argv[0])
        sys.exit(-1)

    with open(sys.argv[1]) as fd:
        contents = fd.read()
        parsed = parse(tokenize(contents))
        eval(parsed, base_environment)


if __name__ == '__main__':
    main()

¶

Parting words

As you can see, it turns out its pretty easy to make your own interpreter! As a general rule, when you find yourself stumped by a weird concept, the easiest way to figure out how it works is to implement a basic prototype using it.

If you have questions, don’t hesitate to send me an email to “hello” @ this website’s domain name.

lisp.py

Utility functions and data structures

Parser

Interpreter

Parting words