LISP is all about expressions. An expression can be a literal, an identifier, or a list consisting of an operator and one or more arguments.
A literal is an object with an intrinsic value. In our system, that's
either an integer or NIL
(if you consider "nothing" to be
a value).
An identifier is a name for an object. Symbols can be identifiers.
Everything else is a list of the form (operator argument...)
where argument...
means zero or more arguments.
To associate identifiers with objects we need an environment. This is a collection of bindings, each of which consists of an identifier and its corresponding value. For example:
Bindings | |
---|---|
Identifier | Value |
FOO | 42 |
BAR | NIL |
BAZ | (X Y Z) |
BAZ
is a list containing three symbols.
An environment can also have a parent environment. If there is no binding for a particular identifier in the environment, we can check the parent, the parent's parent and so on. In this way we can create a tree of environments which share bindings with their ancestors unless explicit replacements exist.
There is a convenient way of representing environments using our LISP data types:
(parent (identifier . value)...)So the environment above (assuming it has no parent) is:
(NIL (FOO . 42) (BAR . NIL) (BAZ . (X Y Z)))
Here is a function to create an empty environment with a specified
parent (which could be NIL
):
Atom env_create(Atom parent) { return cons(parent, nil); }
Next we have two functions to retrieve and create bindings in an environment.
int env_get(Atom env, Atom symbol, Atom *result) { Atom parent = car(env); Atom bs = cdr(env); while (!nilp(bs)) { Atom b = car(bs); if (car(b).value.symbol == symbol.value.symbol) { *result = cdr(b); return Error_OK; } bs = cdr(bs); } if (nilp(parent)) return Error_Unbound; return env_get(parent, symbol, result); }Disallowing duplicate symbols means that we don't have to call
strcmp
here, which should mean that this lookup function
is not too slow.
int env_set(Atom env, Atom symbol, Atom value) { Atom bs = cdr(env); Atom b = nil; while (!nilp(bs)) { b = car(bs); if (car(b).value.symbol == symbol.value.symbol) { cdr(b) = value; return Error_OK; } bs = cdr(bs); } b = cons(symbol, value); cdr(env) = cons(b, cdr(env)); return Error_OK; }
Only env_get
recursively checks the parent environments.
We don't want to modify the bindings of parents.
Now that we have expressions, we can start to evaluate them. Evalution is a process which takes an expression and an environment, and produces a value (the result). Let's specify the rules.
QUOTE
(QUOTE EXPR)
is
EXPR
, which is returned without evaluating.
DEFINE
(DEFINE SYMBOL EXPR)
creates a binding
for SYMBOL
(or modifies an existing binding) in the
evaluation environment. SYMBOL
is bound to the value
obtained by evaluating EXPR
. The final result is
SYMBOL
.
We will need to check whether an expression is a proper list.
int listp(Atom expr) { while (!nilp(expr)) { if (expr.type != AtomType_Pair) return 0; expr = cdr(expr); } return 1; }
The Error
enumeration needs a few more entires:
Error_Unbound |
Attempted to evaluate a symbol for which no binding exists |
Error_Args |
A list expression was shorter or longer than expected |
Error_Type |
An object in an expression was of a different type than expected |
The function to perform evaluation is now a straightforward translation of the rules into C.
int eval_expr(Atom expr, Atom env, Atom *result) { Atom op, args; Error err; if (expr.type == AtomType_Symbol) { return env_get(env, expr, result); } else if (expr.type != AtomType_Pair) { *result = expr; return Error_OK; } if (!listp(expr)) return Error_Syntax; op = car(expr); args = cdr(expr); if (op.type == AtomType_Symbol) { if (strcmp(op.value.symbol, "QUOTE") == 0) { if (nilp(args) || !nilp(cdr(args))) return Error_Args; *result = car(args); return Error_OK; } else if (strcmp(op.value.symbol, "DEFINE") == 0) { Atom sym, val; if (nilp(args) || nilp(cdr(args)) || !nilp(cdr(cdr(args)))) return Error_Args; sym = car(args); if (sym.type != AtomType_Symbol) return Error_Type; err = eval_expr(car(cdr(args)), env, &val); if (err) return err; *result = sym; return env_set(env, sym, val); } } return Error_Syntax; }
Extending the read-print loop from the previous chapter, we now have a read-eval-print loop (REPL). This is the core of our LISP interpreter.
int main(int argc, char **argv) { Atom env; char *input; env = env_create(nil); while ((input = readline("> ")) != NULL) { const char *p = input; Error err; Atom expr, result; err = read_expr(p, &p, &expr); if (!err) err = eval_expr(expr, env, &result); switch (err) { case Error_OK: print_expr(result); putchar('\n'); break; case Error_Syntax: puts("Syntax error"); break; case Error_Unbound: puts("Symbol not bound"); break; case Error_Args: puts("Wrong number of arguments"); break; case Error_Type: puts("Wrong type"); break; } free(input); } return 0; }
Let's see what it can do.
> foo Symbol not bound > (quote foo) FOO > (define foo 42) FOO > foo 42 > (define foo (quote bar)) FOO > foo BAR
We can now interactively assign names to objects.