Here's a very common programming problem:
Repeatedly collect items from an input stream into chunks, until the stream is empty.
For example, compilers for C and Pascal, and the Lisp reader, collect characters from the input stream to form tokens, where each token is either a name or number or punctuation element.
Another example is compress on page 37 of Graham's ANSI Common Lisp. This function collects items from a list into pairs of the form (number element), indicating how often each item is repeated.
Ideally, we'd like our code to break down into two pieces:
The key problem in defining these scanning functions is that the second function needs two values from the first function:
Doing this cleanly, following the Cardinal Rule of Functions, takes some thought.
Graham's definition of compress
defined a new
function, compr
, that took three arguments:
The second and third arguments hold the two values needed, and recursion was used to repeat the process until the list was emptied.
Unfortunately, the function compr
is not very clear.
It has a terrible name, but it's hard to give it a good name because
it violates the Cardinal Rule of
Functions and mixes several tasks together.
In this particular example, it turns out to not be too hard to use a scanning function that takes an object and a list and returns the length of the run of that element. We'll give a recursive definition of the scanning function here:
(defun get-run (x l) (cond ((null l) 0) ((not (eql x (first l))) 0) (t (1+ (get-run x (rest l))))))
An iterative definition would be better here, because it would handle runs of any length without overflowing the function call stack.
Even better would be to take advantage of Common Lisp's large library of sequence functions, e.g.,
(defun get-run (x l) (or (position x l :test-not #'eql)) (length l)))
Or, if you want to avoid the "deprecated" :test-not
feature:
(defun get-run (x l) (or (position x l :test (complement #'eql)) (length l)))
compress
can then be defined as a repeated call to
get-run
:
(defun compress (l) (cond ((null l) nil) (t (let ((run (1+ (get-run (first l) (rest l))))) (cons (list run (first l)) (compress (nthcdr run l)))))))
Unfortunately, this version of compress
is still a
little messy.
More importantly, it doesn't generalize well. After
get-run
scans the list, it returns how many items it
found. This number is then used to trim the list. It works because
it's easy and relatively cheap to use the run-length to trim off the
items used so far, but this isn't always the case, plus it leads to
the messy nesting.
Because chunking streams is a common problem, we could go for a general solution and define some functions that capture the procedural flow underlying the general solution to this kind of problem.
Note. generators.lisp contains everything given below, except for the compression-specific functions, plus a few extras.
First, we define a new concept, a generator. A generator is a function that you call to get elements from some source. That source can be an input stream, a list of elements, an incrementing counter, whatever.
A generator takes one argument:
:get
, the current generator
element is returned, if any.
:test
, then nil
is returned if there is no current element.
:next
, the current element is
advanced to the next one, if any, and the previous current element
is returned.
(Odd as the last rule may seem, it tends to be the most useful of
the obvious options, just as i++
is more commonly used
than ++i
in C and C++.)
We can define generators using closures. We can do this on as needed basis, but a couple of generator-creating functions are particularly useful and easy to define.
For example, make-number-gen
makes generators that
generate numbers from M to N. If no N is given,
the generator never runs out, i.e, never returns nil
to
:test
.
(defun make-number-gen (m &optional n) #'(lambda (cmd) (ecase cmd (:test (and (numberp n) (> m n))) (:get m) (:next (prog1 m (unless (and (numberp n) (> m n)) (incf m)))))))
Note the use of prog1
to save and return the old
value of m
.
The function make-list-gen
makes generators that
generate elements from a list, from left to right.
(defun make-list-gen (l) #'(lambda (cmd) (ecase cmd (:test (null l)) (:get (car l)) (:next (pop l)))))
We could define more generator-making functions, but now let's define three simple functions that will make code using generators more readable, and less dependent on how they're implemented.
(defun empty-gen-p (gen) (funcall gen :test)) (defun gen-elt (gen) (funcall gen :get)) (defun advance-gen (gen) (funcall gen :next))
A good example of a generator-using function, and a handy one for
testing generators, is extrude
. (extrude
n gen
)
returns a list of the
first N elements generated by the generator gen.
(defun extrude (n gen) (do ((l nil (cons (advance-gen gen) l)) (i 0 (1+ i))) ((or (empty-gen-p gen) (>= i n)) (nreverse l))))
Here are some examples:
? (extrude 5 (make-number-gen 3)) (3 4 5 6 7) ? (extrude 5 (make-number-gen 3 6)) (3 4 5) ? (extrude 5 (make-list-gen '(a b c d e f))) (A B C D E) ? (extrude 5 (make-list-gen '(a b c))) (A B C)
Now, with all this background out of the way, here is how we would use generators for our compression problem.
First, we define get-run
. It keeps calling the
generator until an element different than the given element is
returned. It returns the pair (
number
elt
)
indicating how many occurrences of the
element it saw, counting the given one.
(defun get-run (elt gen) (do ((n 1 (1+ n))) ((or (empty-gen-p gen) (not (eql elt (gen-elt gen)))) (list n elt)) (advance-gen gen)))
Finally, compress
just calls get-run
until the generator is exhausted.
(defun compress (l) (let ((gen (make-list-gen l))) (do ((run-pairs nil (cons (get-run (advance-gen gen) gen) run-pairs))) ((empty-gen-p gen) (nreverse run-pairs)))))
Generators have a long history in Lisp and other functional programming languages. The Common Lisp standard defines a particularly sophisticated set of series, generators and gatherers. See Common Lisp The Language, 2nd edition, by Guy L. Steele Jr., Appendices A and B (Digital Press, 1990).
In Scheme, they're called streams. See Structure and Interpretation of Computer Programs, Harold Abelson and Gerald Jay Sussman with Julie Sussman, Section 3.4 (MIT Press, 1985)
C++ has the same thing, only they call them iterators. Mitchell Model explains quite nicely why C++ iterators are really not iterators, and why real iterators would be hard to do in C++. See Data Structures, Data Abstraction: A Contemporary Introduction Using C++, Mitchell L. Model, p. 57 and Appendix C (Prentice-Hall, 1994).
Comments? Send mail to Chris Riesbeck.