Simple Pattern Matching
Consider a simple pattern matcher, such as the one defined in pat-match.lisp. This is basically the unifier in Artificial Intelligence Programming, simplified to handle only patterns against constant forms, rather than patterns against patterns. Some example calls:
> (pat-match '(?x b ?y) '(a b c)) (((?Y C) (?X A))) > (pat-match '(?x b ?x) '(a b c)) NIL > (pat-match '(?x b ?x) '(a b a)) (((?X A)))
Issues
Pattern matching is the basis of a great deal of AI programs. Patterns are easy to write, compact, easily extended, and so on.
A serious AI program will have hundreds or thousands of patterns, and may apply a significant fraction of those patterns while solving a problem. As a result, time spent matching patterns may dominate the run-time of an AI program.
There are two approaches to reducing this cost:
- use good indexing techniques, such as discrimination trees, to reduce the number of patterns matched
- compile patterns to speed up the matching process
These approaches are independent. It's usually best to do the first one first. Then, if pattern matching is still the bottleneck in your application, even with the reduced number of patterns to be matched, compile them.
Patterns versus Code
To see the magnitude of the costs involved in pattern matching, consider these two functions. The first calls the pattern matcher:
(defun using-pat (input) (pat-match '(?x b ?x) input))
The second produces the same results for any given input, but is coded quite differently:
(defun using-code (input) (and (consp input) (let ((blists (match-variable '?x (car input) '(nil)))) (and (consp (cdr input)) (eql 'b (cadr input)) (consp (cddr input)) (null (cdddr input)) (match-variable '?x (caddr input) blists)))))
using-pat
is clearly more compact, readable, and
maintainable.
using-code
however can be significantly more
efficient. Macintosh Common Lisp (version 3) gives the following
results:
> (time (dotimes (i 100) (using-pat '(a b c)))) (DOTIMES (I 100) (USING-PAT '(A B C))) took 164 milliseconds (0.164 seconds) to run. Of that, 9 milliseconds (0.009 seconds) were spent in The Cooperative Multitasking Experience. 6,400 bytes of memory allocated. NIL > (time (dotimes (i 100) (using-code '(a b c)))) (DOTIMES (I 100) (USING-CODE '(A B C))) took 13 milliseconds (0.013 seconds) to run. 3,200 bytes of memory allocated. NIL
For this simple 3-part pattern, using-code
is
over 10 times faster than using-pat
!
Why is using-code so much faster?
Much of the work that pat-match
does is looking
at the pattern. Effectively, pattern matching involves these
steps:
- check to see if the pattern is a variable
- if so, check to see if it can be bound to the input
- else, check to see if the pattern is an atom
- if so, check to see if it's equal to the input
- else, check to see if the pattern is a list
- if so, check to see if the input is a list
- if so, recursively apply the above steps to the
car
andcdr
of the pattern and input
The hand-written code in using-code
omits the
run-time checks in Steps 1, 2, and 3. Every call to
match-variable
is known in eql
advance.
Furthermore, there's no recursive calls to pat-match
because it's already known when the pattern is a list.
As a result, there are fewer conditional checks, fewer function calls, and the function calls are all explicit and compilable into machine code before the code is run.
Getting the best of both worlds
Looking at the run-times for using-pat
and
using-code
, it's clear which code we'd rather run.
On the other hand, looking at their definitions, it's also clear
which code we'd rather write.
Since we clearly don't want to have to write code like that in
using-code
, we get the computer to do it for us.
The idea behind compiling patterns is simple. In most cases, the patterns in a knowledge base are static, i.e., known in advance. Therefore, we can
- take those patterns,
- generate Lisp functions that would produce the same binding lists that the patterns would generate when applied to an input,
- compile the functions, and
- call the functions instead of the pattern matcher
expand-pat.lisp
defines a simple pattern compiler, capable of turning the
patterns accepted by pat-match.lisp
into
the compilable Lisp code.
We'll first look at how expand-pat
works, then at
several ways in which it can be used.
How to Compile Patterns
Compiling a pattern turns out to be quite simple. It's very much like defining a macro expansion.
As with macros, we need to keep track of two types of variables:
- compile-time variables, whose values are known when the
pattern is being compiled; we have one here:
pat
, the pattern being compiled. - run-time variables, whose values won't be known until the
expanded code is applied to some input; we have two here:
input
andblists
, bound to the input form and current list of binding lists, respectively.
Notice that pat-match
had the same three
variables. The difference is that pat
is now a
compile-time variable. It won't exist at run-time.
Now we need to figure out what Lisp code to generate for each kind of pattern elements. Lisp s-expression patterns have only three elements:
- atoms
- variables
- lists
The top-level compiler function, expand-pat
,
simply calls functions for each of these cases. ("One function to
a function" -- the function of expand-pat
is to
select what other function to run.)
Variables are easy to compile. They just become calls to
match-variable
, e.g.,
> (expand-pat '?x) (match-variable '?x input blists)
Atoms are equally easy to compile. They just become calls to
eql
that return blists
if
successful:
> (expand-pat 'a) (and (eql 'a input) blists)
The only tricky bit is with patterns that are lists.
Obviously, we can recursively expand the car
and
cdr
of the pattern into Lisp code, but how to we
combine the two pieces of code that result?
What we want is that, at run-time,
- the code from the
car
of the pattern will be executed withinput
bound to thecar
of the input, andblists
is bound to the current list of binding lists; - the code from the
cdr
of the pattern will be executed withinput
bound to thecdr
of the input, andblists
is bound to the list of binding lists produced by the code for thecar
of the pattern;
The following Lisp template achieves these two goals:
(let ((blists (let ((input (car input))) -- put pattern CAR code here --)) (input (cdr input))) -- put pattern CDR code here --)
This will evaluate the code from the two parts of the pattern
in contexts with the desired values for input
and
blists
.
Normally, it's a bad idea to reuse variables like this, but
the generated code is not meant for human eyes, and doing it this
way means the pattern expander can always count on
input
and blists
being bound to the
right things.
How to get compiled patterns into code
We can now see how to turn patterns into Lisp code. One big
problem remains, however. How do we turn Lisp code into callable
Lisp functions? In particular, expand-pat
doesn't
generate a function, it generates a list, e.g.,
> (expand-pat '?x) (MATCH-VARIABLE '?X INPUT BLISTS)
It's not in function form, and input
and
blists
free variables.
compile-pat
does a little better:
> (compile-pat '?x) #'(LAMBDA (INPUT &OPTIONAL (BLISTS '(NIL))) (MATCH-VARIABLE '?X INPUT BLISTS))
but this is still just a list, not a function.
It can be converted into a function using eval
,
e.g.,
> (eval (compile-pat '?x)) #<Anonymous Function #xB22986>
Unfortunately,
eval
is evil, and- this won't work in many Lisp's that build stand-alone
applications, because the first thing they delete is
eval
.
The solution is to look at where our patterns are being defined. For example, suppose we have a file of "if--then" rules, where we link patterns to code for a robot to execute if some input command matches the pattern, e.g.,
(defrule put-rule (put-in ?x ?y) => (move-hand-to ?x) (grasp) (move-hand-into ?y) (ungrasp))
Without compiling, we'd have a robot command interpreter that would look something like this:
(defun run-robot (input) (dolist (rule *rules*) (let ((blists (pat-match (rule-test rule) input))) (unless (null blists) (do-actions (instantiate (rule-actions rule) blists))))))
defrule
would simply be
(defmacro defrule (name test => &rest actions) `(add-rule :test ',test :actions ',actions))
To use a pattern compiler, we'd redefine defrule
as follows:
(defmacro defrule (name test => &rest actions) `(add-rule :test ,(compile-pat test) :actions ',actions))
Now, when our rule file is compiled, all the patterns become compiled Lisp functions.
Note: we also have to redefine the robot interpreter:
(defun run-robot (input) (dolist (rule *rules*) (let ((blists (funcall (rule-test rule) input))) (unless (null blists) (do-actions (instantiate (rule-actions rule) blists))))))
Increasing maintainability of compiled patterns
One problem with compiling patterns is that rules become hard to read and trace. We no longer see anything that tells us what the rule is looking for, because the patterns are now anonymous functions.
Therefore, it's usually a good idea to modify our
defrule
to store both the pattern and the function.
The former is for debugging, the latter is for efficiency.
(defmacro defrule (test => &rest actions) `(add-rule :test ,(compile-pat test) :pattern ',test :actions ',actions))