We will define four kinds of object to begin with:
FOO
, BAR
, ADD-TWO
.
We will normalize characters to upper-case in this project, but this
is not strictly necessary.NIL
NULL
in C and other
languages.NIL
, or a reference to another pair.
The types of each element may be different.NIL
are called simple data.
The term atom can refer to either a simple datum or a pair
(purists may disagree on this point).
Note that integers and symbols are immutable, so we can think of two integers with the same value as being the same object. This is particularly useful for symbols, because it allows us to test for equality by comparing pointers.
Let's declare some C types to hold our data. There are many clever ways to store LISP objects efficiently, but for this implementation we will stick to a very simple scheme [please excuse the pun].
struct Atom { enum { AtomType_Nil, AtomType_Pair, AtomType_Symbol, AtomType_Integer } type; union { struct Pair *pair; const char *symbol; long integer; } value; }; struct Pair { struct Atom atom[2]; }; typedef struct Atom Atom;
A few macros will be handy:
#define car(p) ((p).value.pair->atom[0]) #define cdr(p) ((p).value.pair->atom[1]) #define nilp(atom) ((atom).type == AtomType_Nil) static const Atom nil = { AtomType_Nil };The "p" in
nilp
stands for "predicate". Identifiers in C
may not contain question marks. There is no need to restrict our LISP
implementation in that way, of course.
Integers and (pointers to) strings can be copied around, but we need to allocate pairs on the heap.
Atom cons(Atom car_val, Atom cdr_val) { Atom p; p.type = AtomType_Pair; p.value.pair = malloc(sizeof(struct Pair)); car(p) = car_val; cdr(p) = cdr_val; return p; }
cons
is a function to allocate a pair on the heap and
assign its two elements.
At this point you will have noticed that using cons
will
leak memory the moment its return value is discarded. We will deal with
that later. Of course, if you are using a garbage-collected language
then the problem is already taken care of.
Now we can start creating LISP objects. An integer:
Atom make_int(long x) { Atom a; a.type = AtomType_Integer; a.value.integer = x; return a; }And a symbol:
Atom make_sym(const char *s) { Atom a; a.type = AtomType_Symbol; a.value.symbol = strdup(s); return a; }
We will write a pair like this:
(a . b)where
a
is the car and b
is the
cdr.
By using the cdr of a pair to reference another pair, we can create a chain:
(a . (b . (c . (d . NIL))))Notice that the cdr of the last pair is
NIL
. This
signifies the end of the chain, and we call this structure a
list. To avoid having to write a large number of brackets, we
will write the previous list like this:
(a b c d)Finally, if the cdr of the last pair in a list is not
NIL
, we will write this:
(p q . r)which is equivalent to
(p . (q . r))This is called an improper list.
Printing an atom or list is simple.
void print_expr(Atom atom) { switch (atom.type) { case AtomType_Nil: printf("NIL"); break; case AtomType_Pair: putchar('('); print_expr(car(atom)); atom = cdr(atom); while (!nilp(atom)) { if (atom.type == AtomType_Pair) { putchar(' '); print_expr(car(atom)); atom = cdr(atom); } else { printf(" . "); print_expr(atom); break; } } putchar(')'); break; case AtomType_Symbol: printf("%s", atom.value.symbol); break; case AtomType_Integer: printf("%ld", atom.value.integer); break; } }By using recursion we can print aribtrarily complex data structures. (Actually that's not true: for a very deeply nested structure we will run out of stack space, and a self-referencing tree will never finish printing).
See what print_expr
does with various atoms:
Atom | Output |
---|---|
make_int(42) | 42 |
make_sym("FOO") | FOO |
cons(make_sym("X"), make_sym("Y")) | (X . Y) |
cons(make_int(1), | (1 2 3) |
All this is pretty trivial. We'll get on to some more interesting stuff in the next chapter.
Remember we said that we would treat identical symbols as being the same object? We can enforce that by keeping track of all the symbols created, and returning the same atom if the same sequence of characters is requested subsequently.
Languages with a set or hashtable container make this easy, but we can use the LISP data structures already implemented to store the symbols in a list:
static Atom sym_table = { AtomType_Nil }; Atom make_sym(const char *s) { Atom a, p; p = sym_table; while (!nilp(p)) { a = car(p); if (strcmp(a.value.symbol, s) == 0) return a; p = cdr(p); } a.type = AtomType_Symbol; a.value.symbol = strdup(s); sym_table = cons(a, sym_table); return a; }Neat, huh? It's not particularly efficient, but it will do fine for now.