Here's a very common programming problem:
Repeatedly collect items from an input stream into chunks, until the stream is empty.
For example, compilers for C and Pascal, and the Lisp reader, collect characters from the input stream to form tokens, where each token is either a name or number or punctuation element.
Another example is compress on page 37 of Graham's ANSI Common Lisp. This function collects items from a list into pairs of the form (number element), indicating how often each item is repeated.
Ideally, we'd like our code to break down into two pieces:
The key problem in defining these scanning functions is that the second function needs two values from the first function:
Doing this cleanly, following the Cardinal Rule of Functions, takes some thought.
Graham's definition of
compress defined a new
compr, that took three arguments:
The second and third arguments hold the two values needed, and recursion was used to repeat the process until the list was emptied.
Unfortunately, the function
compr is not very clear.
It has a terrible name, but it's hard to give it a good name because
it violates the Cardinal Rule of
Functions and mixes several tasks together.
In this particular example, it turns out to not be too hard to use a scanning function that takes an object and a list and returns the length of the run of that element. We'll give a recursive definition of the scanning function here:
(defun get-run (x l) (cond ((null l) 0) ((not (eql x (first l))) 0) (t (1+ (get-run x (rest l))))))
An iterative definition would be better here, because it would handle runs of any length without overflowing the function call stack.
Even better would be to take advantage of Common Lisp's large library of sequence functions, e.g.,
(defun get-run (x l) (or (position x l :test-not #'eql)) (length l)))
Or, if you want to avoid the "deprecated"
(defun get-run (x l) (or (position x l :test (complement #'eql)) (length l)))
compress can then be defined as a repeated call to
(defun compress (l) (cond ((null l) nil) (t (let ((run (1+ (get-run (first l) (rest l))))) (cons (list run (first l)) (compress (nthcdr run l)))))))
Unfortunately, this version of
compress is still a
More importantly, it doesn't generalize well. After
get-run scans the list, it returns how many items it
found. This number is then used to trim the list. It works because
it's easy and relatively cheap to use the run-length to trim off the
items used so far, but this isn't always the case, plus it leads to
the messy nesting.
Because chunking streams is a common problem, we could go for a general solution and define some functions that capture the procedural flow underlying the general solution to this kind of problem.
Note. generators.lisp contains everything given below, except for the compression-specific functions, plus a few extras.
First, we define a new concept, a generator. A generator is a function that you call to get elements from some source. That source can be an input stream, a list of elements, an incrementing counter, whatever.
A generator takes one argument:
:get, the current generator element is returned, if any.
nilis returned if there is no current element.
:next, the current element is advanced to the next one, if any, and the previous current element is returned.
(Odd as the last rule may seem, it tends to be the most useful of
the obvious options, just as
i++ is more commonly used
++i in C and C++.)
We can define generators using closures. We can do this on as needed basis, but a couple of generator-creating functions are particularly useful and easy to define.
make-number-gen makes generators that
generate numbers from M to N. If no N is given,
the generator never runs out, i.e, never returns
(defun make-number-gen (m &optional n) #'(lambda (cmd) (ecase cmd (:test (and (numberp n) (> m n))) (:get m) (:next (prog1 m (unless (and (numberp n) (> m n)) (incf m)))))))
Note the use of
prog1 to save and return the old
make-list-gen makes generators that
generate elements from a list, from left to right.
(defun make-list-gen (l) #'(lambda (cmd) (ecase cmd (:test (null l)) (:get (car l)) (:next (pop l)))))
We could define more generator-making functions, but now let's define three simple functions that will make code using generators more readable, and less dependent on how they're implemented.
(defun empty-gen-p (gen) (funcall gen :test)) (defun gen-elt (gen) (funcall gen :get)) (defun advance-gen (gen) (funcall gen :next))
A good example of a generator-using function, and a handy one for
testing generators, is
) returns a list of the
first N elements generated by the generator gen.
(defun extrude (n gen) (do ((l nil (cons (advance-gen gen) l)) (i 0 (1+ i))) ((or (empty-gen-p gen) (>= i n)) (nreverse l))))
Here are some examples:
? (extrude 5 (make-number-gen 3)) (3 4 5 6 7) ? (extrude 5 (make-number-gen 3 6)) (3 4 5) ? (extrude 5 (make-list-gen '(a b c d e f))) (A B C D E) ? (extrude 5 (make-list-gen '(a b c))) (A B C)
Now, with all this background out of the way, here is how we would use generators for our compression problem.
First, we define
get-run. It keeps calling the
generator until an element different than the given element is
returned. It returns the pair
) indicating how many occurrences of the
element it saw, counting the given one.
(defun get-run (elt gen) (do ((n 1 (1+ n))) ((or (empty-gen-p gen) (not (eql elt (gen-elt gen)))) (list n elt)) (advance-gen gen)))
compress just calls
until the generator is exhausted.
(defun compress (l) (let ((gen (make-list-gen l))) (do ((run-pairs nil (cons (get-run (advance-gen gen) gen) run-pairs))) ((empty-gen-p gen) (nreverse run-pairs)))))
Generators have a long history in Lisp and other functional programming languages. The Common Lisp standard defines a particularly sophisticated set of series, generators and gatherers. See Common Lisp The Language, 2nd edition, by Guy L. Steele Jr., Appendices A and B (Digital Press, 1990).
In Scheme, they're called streams. See Structure and Interpretation of Computer Programs, Harold Abelson and Gerald Jay Sussman with Julie Sussman, Section 3.4 (MIT Press, 1985)
C++ has the same thing, only they call them iterators. Mitchell Model explains quite nicely why C++ iterators are really not iterators, and why real iterators would be hard to do in C++. See Data Structures, Data Abstraction: A Contemporary Introduction Using C++, Mitchell L. Model, p. 57 and Appendix C (Prentice-Hall, 1994).
Comments? Send mail to Chris Riesbeck.