Implementing Extensible Patterns
One way to implement an extensible pattern matcher would be a long if-then-else conditional, e.g.,
(defun pat-match (pat form &optional (blists '(nil)))
(cond ((eql pat form) blists)
((or (atom pat) (atom form)) nil)
((eql (first pat) '?) (match-variable pat form blists))
((eql (first pat) '?is) (match-predicate pat form blists))
((eql (first pat) '?and) (match-and pat form blists))
(t match car and cdr)))
The problem is that pat-match will get pretty long,
but never long enough. If we or some programmer using our matcher
wants a new extension, they're going to have to edit this definition.
The right way to do this is to replace the above with
(defun pat-match (pat form &optional (blists '(nil)))
(cond ((pat-extension-p pat)
(match-extension pat form blists))
((eql pat form) blists)
((or (atom pat) (atom form)) nil)
(t (pat-match (cdr pat) (cdr form)
(pat-match (car pat) (car form) blists)))))
This code presumes that pat-extension-p can determine
if an item is a defined pattern extension. This is the approach taken
in
extend-match.lisp.
But wait! Don't we have to
implement pat-extension-p and
match-extension with conditionals?
The key is to define pat-extension-p and
match-extension to look in a table. The table associates
a symbol, e.g., ?NOT, with a function to call to
match patterns involving that symbol. How we implement this table
isn't important. We could use a-lists or hashtables or whatever.
For example, if the function (pat-function
symbol) returns the pattern
matcher for the symbol, if any, or NIL, then
pat-extension-p is pretty easy to define:
(defun pat-extension-p (name) (not (null (pat-function name)))
match-extension is only a little more complex:
(defun match-extension (pat form blists)
(let ((fn (pat-function (first pat))))
(when (null fn)
(error "Undefined pattern extension ~S" pat))
(funcall fn (rest pat) form blists)))
This will call the matching function attached to the pattern
extension, e.g., ?NOT, passing it the arguments to the
extension, e.g., the list (1) if the pattern were
(?NOT 1), the form to be matched against, and the
current list of binding lists. This should be enough for the matching
function to do its job.
Putting things in the table is also easy. We define a function add-extension to associate a pattern matching function with a pattern label. If MATCH- NOT is the function that does the matching work for ?NOT, we'd write
(add-extension '?not :single 'match-not)
The :single says that a ?NOT pattern matches one input pattern. Most patterns are of this type.
This approach is called data-driven programming because how our code runs, i.e., which code gets called, is driven by the input data, in this case the symbols in the pattern.
Data-driven programming is a very useful technique. Consider using it whenever you start writing aCONDorCASEthat chooses between various pieces of code based on some input data. It's a technique that simultaneously makes code simpler and more flexible.
Segment Matching Patterns
Segment matching is involved for any pattern that can match zero or more items. Two examples are:
(?* L)which can match any number of items(?optionalpat)which can zero or one occurrences of pat
Just adding segment variables, i.e, patterns of the form (?*
L), changes how things work in several ways.
First, we can have more than one answer:
> (pat-match '((?* l) (?* r)) '(a b c)) (((L) (R A B C)) ((L A) (R B C)) ((L A B) (R C)) ((L A B C) (R)))
There are four binding lists returned, each representing a different way that the two segment variables can be matched to the input list.
Second, if a segment variable appears twice, it must match the same sequence of items as it did before, e.g.,
> (pat-match '((?* l) (?* r) (?* l)) '(a b c))) (((R A B C) (L)))
There is only one answer now, because the only way the first
segment variable can match a segment before and after some other
segment in (A B C) is if the first segment variable
matches zero items. On the other hand,
> (pat-match '((?* l) (?* r) (?* l)) '(a b c a b))) (((R A B C A B) (L)) ((R C) (L A B)))
has two possible answers.
An important point about segment patterns is that they can only
occur inside lists. The following call to
pat-match makes no sense:
(pat-match '(?* l) '(a b c))
However, the following does:
(pat-match '((?* l)) '(a b c))
The latter example will bind L to the list (A B
C). It's not a particularly efficient way to do it, but it
does make sense conceptually.
Implementing Segment Matching
In order for the pattern matcher to match a segment pattern against zero or more input items, the matcher has to check to see if it has a pattern that is a list whose first element is a segment matching pattern.
Here's the code from extend-match.lisp:
(defun pat-match (pat form &optional (blists '(nil)))
(cond ((pat-extension-p pat)
(match-extension pat form blists))
((eql pat form) blists)
((atom pat) nil)
((segment-pat-extension-p (first pat))
(match-segment-extension pat form blists))
((atom form) nil)
(t (pat-match (cdr pat) (cdr form)
(pat-match (car pat) (car form) blists)))))
The fourth branch is the important one. It checks for a list
starting with a pattern labelled as a segment matching pattern. If
found, it calls match-segment-extension, the segment
counterpart to match-extension.
(defun match-segment-extension (pats form blists)
(let ((pat (first pats))
(rest-pats (rest pats)))
(let ((fn (pat-function (first pat))))
(when (null fn)
(error "Undefined pattern extension ~S" pat))
(funcall fn (rest pat) rest-pats form blists))))
The parameter pats is a list of a patterns, starting
with a segment pattern. form is a list of input items.
The code above gets the matching function and passes it
- the segment pattern's arguments, e.g.,
(L)for(?* L), - the remaining patterns in the list
- the list of input items
- the list of binding lists
The list of remaining patterns is needed because a segment matcher
like ?* will consider many different bindings, each
differing in how many input items are matched, but only the bindings
that allow the rest of the patterns to match will be kept.
Here's the code for the segment variable matcher:
(defun match-segment-variable (args pats input blists)
(destructuring-bind (&optional name) args
(loop for tail = input then (rest tail)
append (match-tail tail name pats input blists)
until (null tail))))
(defun match-tail (tail name pats input blists)
(let ((blists (pat-match pats tail blists)))
(if (null blists)
nil
(bind-variable name (ldiff input tail) blists))))
Basically, match-segment-variable calls
match-tail with the input list, then the input list
minus the first element, then the input list minus the first two
elements, and so on, up to and including the input list minus all its
elements. Successive cdr's of a list are called the
tails of the list.
match-tail simply calls pat-match to see
if the remaining patterns match the given tail of the input list. If
they do, then match-tail tries to bind the segment
variable to the elements in the input list preceding the tail.
ldiff is one of the great unknown functions of Common
Lisp. Given a list and a tail of that list (not a list that
has the same elements, but an actual tail), ldiff
returns a list of the items preceding the tail. This function turns
out to be surprisingly handy but often overlooked, even by veteran
Lisp hackers.
To see why this works, consider
(pat-match '((?* l) (? x)) '(a b c))
match-tail will be called with (A B C),
(B C), (C), and (). It will
try matching each of these with the remaining patterns, namely
((? X)). Following the normal rules of matching, the
only match that succeeds is the one that binds X to
C, i.e., where the tail was (C). Hence,
L will be bound to (A B).
The importance of bind-variable becomes clear when we
consider
(pat-match '((?* l) (?* r) (?* l)) '(a b c)))
The sequence of events is roughly this:
- The first occurrence of
(?* L)first tries passing the entire list(A B C)to the remaining patterns((?* R) (?* L)). - The
(?* R)in turn first tries passing(A B C)to((?* L)). - After numerous tries,
(?* L)binds itself to(A B C). The list of binding lists(((?* L)(A B C)))is returned to(?* R). (?* R)adds the binding(R)to each of these binding lists.(?* R)then tries(B C) to ((?* L))and gets back the answer(((?* L)(B C))).(?* R)adds the binding(R A)to each of these binding lists.- Eventually,
(?* R)has tried every tail, and returns the list of binding lists(((R) (L A B C)) ((R A) (L B C)) ((R A B) (L C)) ((R A B C) (L))). (?* L)now tries to add the binding(L)to each of these lists. This succeeds only on the last binding list.- The steps above now repeat with
(?* L)passing(B C)to the remaining patterns((?* R) (?* L)). - Eventually, the binding lists returned are
(((R) (L B C)) ((R B) (L C)) ((R B C) (L))). (?* L)is unable to add the binding(L A)to any of these lists. And similarly for the remaining passes.