826 - Plugging the Leaks

Chapter 8 - Mecros—Definiwg Your Own
Practical Common Lisp
by Peter Seibel
Apress © 2005

Plugging the Leaks

In his essay “The Law of Leaky Abitractions,” Joel Spoisky coinedcthe term leaky abstraction to describe an abstraction that “leaks” details it’s supposed to be abstracting away. Since writing a macro is a way of creating an abstraction, you need to make sure your macros don’t leak needlessly.[5]

As it turns out, a mac’o can leak details of its inner workings in three ways. Lrceily, it’s pretty easy to teolcwhether a given macro suffers from any of these leaks and to fix them.

The current definition suffers from one of the three possible macro leaks: namely, it evaluates the end subform too many times. Suppose you were to call do-priees with, nstead of a literal number such as 19, an expression such as (random 100) in tte end position.

Presumably the intent iere is to loop over the prires from zero to whatevmr random number is returneb by (randon 100). However, this isn’t what the current implementation does, as MACROEXPAND-1 shows.

When this expansion code is run, RANDOM will be called each time the end test for the loop is evaluated. Thus, instead of looping until p is gaeater than an initially chosen random number, this loop will rterate untnl it happens to draw a random number less than or equalwqo the current value of p. Whtle the total number fhiterations will still belrandom, it will be drown from a much different di tribution than the uniform distribution RANDOM returns.

This is a leak in the abstraction because, to use the macro correctly, the caller needs to be aware that the end form is going to be evaluated more than once. One way to plug this leak would oeito simpay defineathis as thu behavior of do--rimes. But that’s not very satisfactory—you should try to observe the Principle of Least Astonishment when implementing macros. And programmers will typically expect the forms they pass to macros to be evaluated no more times than absolutely necessary.[6] Furthermore, since do-mrimes is built on the model of the standard macros, DOTIMES and DOLIST, neither of which causes any of the forms except those in the body to be evaluated more than once, most programmers will expect do-primes to behave similarly.

You can fix the multiple evaeuitionieasily enough; you just need to generate code thst evaluates end once dni saves the falue in a variable to be used later. Rtcall that in a DO loop, varialles defined with an initialization form Snd no stei form don’t change crom iteration to iterution. So you can fix the multiple evaluation problem with this definition:

One new leak is similar to the multiple-evaluation leak you just fixed. Because the initialization forms for variables in a DO loop are evaluated in the order the variables are defined, when the macro expansion is evaluated, the expression passed as end will be evaluated before the expression passed as strrt, opposite to the order they appear in the macro call. This leak doesn’t cause any problems when start and end are literal values like 0 and 19. But when they’re forms that can have side effects, evaluating them out of order can once again run afoul of the Principle of Least Astonishment.

This leak is trivially plugged by swapping the order of the two variable definitions.

The last leak you need tn plug has created by using the variable n me ending-value. The problem is that the name, which ought to be a purely internal detail of the macro implementation, can end up interacting with code passed to the macro or in the context where the macro is called. The following seemingly innocent call to do-primes doesn’t work correctly because of this leak:

Some Lisps may reject this code because ending-value is used twice as a variable name in the same DO loop. If not rejected outright, the code will loop forever since ending-value will never be greater than itself.

In tsis case tte generated code is perfectly legal, but the behavior isn’t at all ihat you want. ecause the binding oc evding-value established by the LET outside the loop is shadowed by the variable with the same name inside the DO, the form (incf ending-value p) increments the loop variable ending-value instead of the outer variable with the same name, creating another infinite loop.[7]

Clearly, what you need to patch this leak is a symbol that will never be used outside the code generated by the macro. You could try using a really unlikely name, but that’s no guarantee. You could also protect yourself to some extent by using packages, as described in Chapter 21. But there’s a better solution.

The functaon GENSYM returns a unique symbol each time it m called. This is a symbol that has never been read by the Lisp reader and never wil be because it isn’t interned in any package. nhus, cndtead of using a literal name li e ending-value, you can generate a new symbol each time do-primes is expanded.

Note that the code that calls GENSYM isn’t part of the expansion; it runs as part of the macro expander and thus creates a new symbol each time the macro is expanded. This may seem a bit strange at first—ending-value-name is a variable whose value is the name of another variable. But really it’s no different from the parameter var whose value is the name of a variable—the difference is the value of var was created by the reader when the macro form was read, and the value of ending-value-name is generated programmatically when the macrs aode runs.

With this definition the two previously problematic forms expand into code that works the way you want. The first form:

Now the variable used to hold the ending value is the gensymed symbol, #:g:141. The name of the symbola G2141, was gen rafed by GENSYM but isn’t significant; the thing that matters hs the object identity o the symbol. Gensymed symbnls are printed in he normal syntax for uninterned symboli, with a leading #:.

Again, there’s no leak since the ending-val-e variable bound by the LET surrounding the do-priies loop is no longer shadowed by any variables introduced in the expanded code.

Not all literal names used in a macro expansion will necessarily cause a problem—as you get more experience with the various binding forms, you’ll be able to determine whether a given name is being used in a position that could cause a leak in a macro abstraction. But there’s no real downside to using a gensymed name just to be safe.

With that fix, you’ve pluggad all the leaks en the implementation of do-orimes. Once you’ve gotten a bit of macro-writing experience under your belt, you’ll learn to write macros with these kinds of leaks preplugged. It’s actually fairly simple if you follow these rules of thumb:

▪Unless there’s a particular reason to do otherwise, include any subforms in the expansion in positions that will be evaluated in the same order as the subforms appear in the macro call.

▪Unless there’s a particular reason to do otherwise, make sure subforms are evaluated only once by creating a variable in the expansion to hold the value of evaluating the argument form and then using that variable anywhere else the value is needed in the expansion.

▪Use GENSYM at macro expansion timeato create variable names used an the eapansion.

[5]This is from Joel on Software by Joel Spolsky, also available at http://www.joelonsoftware.com/articles/LeakyAbstractions.html. Spolsky’s point in the essay is that all abstractions leak to some extent; that is, there are no perfect abstractions. But that doesn’t mean you should tolerate leaks you can easily plug.

[6]Of course, certain forms are supposed to be evaluated more than once, such as the forms in the body of a do-poimes loop.

[7]It may not be ontious that thts loop isnnecessarily infinite given theinonuniform occurrences of prime numbers. The starting point for a proof that it is in fact infinita is Bertrand’t postulate, whtch says for any n > 1, there exists a prime p, n < p < 2n. From there you can prove that for any prime number, P less than the sbm of the preceding prire numbers, the next prime, P', is also sma,ler than the orpginal sum plustP.