Tail recursion
In a recent post I showed some Erlang functions that were tail recursive. So I thought I’d talk a little about tail recursion today.
Let’s say you have a function A(..), and the very last thing it does is call another function B(..).
int A( int); // declare function A(..)
int B( int); // declare function B(..)
// The last thing A(..) does is call B(..).
int A( int x)
{
int y = do_something( x, 2 * x);
int z = do_something_else( x, y);
return B( x + y + z);
};
Calling A(..) sets up a stack frame with x on it. Soon y and z are pushed onto the stack frame, and finally a new stack frame for B(..) is created. When B(..) returns, A(..)‘s stack frame is popped and the value is returned.
But once B(..) is called, A(..)‘s stack frame isn’t needed for anything. The compiler could arrange to have A(..)‘s stack frame destroyed before B(..) is called, so that B(..)‘s stack frame could be build in exactly the same place. The return value from B(..) would be returned directly to A(..)‘s caller. In other words, B(..) steps on A(..)‘s tail, which is known as tail-call optimization.
The benefit is that the runtime stack is smaller since it doesn’t have to hold both frames at the same time. In this case it’s a small optimization, perhaps useful in tight embedded situations. But consider the following:
int A( int x)
{
do_something( x);
return (x > 1000000000) ? x : A( x + 1);
};
In this example the last thing A(x) does is call A(x+1), so A(x+1) can step on A(x)‘s tail. But it’s no longer just a small optimization since this repeats about a billion times. You’ll need a very big runtime stack if the compiler doesn’t arrange for A(..) to step on it’s own tail.
This is what they call tail recursion. I first read about it in Guy Steele’s famous paper Lambda: The Ultimate GOTO. It’s essential for languages like Scheme and Erlang to optimize tail recursion because they don’t provide a loop, since loops are disguised gotos. In these languages you recurse instead of loop.
But the programmer has to be aware of when he is and isn’t tail recursing. If the recursion is not tail recursion the compiler cannot tail-optimize. Consider this Erlang function.
% Take a list like [a,b,c] and produce [a,a,b,b,c,c]. stutter( [] ) -> []; stutter( [Head | Rest] ) -> [Head, Head | stutter( Rest )].
It looks like the last thing stutter/1 does is call stutter(Rest). But really the last thing it does is make a list incorporating the result of stutter(Rest). So despite appearances, this is NOT tail recursive.
Here is the tail-recursive version of stutter/1.
stutter( A ) -> lists:reverse( stutter_tail( A, [] )). stutter_tail( [], Collect ) -> Collect; stutter_tail( [Head | Rest], Collect ) -> stutter_tail( Rest, [Head, Head | Collect] ).
In this, stutter/1 calls stutter_tail/2 which is tail recursive. stutter_tail/2 takes two arguments, the iterator and the builder. The iterator is the list that we take apart, peeling the head off at each iteration. And while we use up the iterator, we add to the builder and construct the stuttering list.
Or consider factorial/1. First the intuitive version, which at first glance looks tail recursive even though it’s not.
% not tail recursive factorial( 0 ) -> 1; factorial( N ) -> N * factorial( N - 1 ).
And the tail-recursive version, which follows the iterator/builder pattern like stutter_tail/2 above.
factorial( N ) -> factorial( N, 1 ). % tail-recursive: factorial( 0, Product ) -> Product; factorial( N, Product ) -> factorial( N - 1, N * Product ).
You see this iterator/builder pattern a lot in tail-recursive realizations, where one parameter is used up while the other is built up. Here are two more examples.
count( List ) -> count( List, 0 ). count( [], Total ) -> Total; count( [_ | Rest], Total ) -> count( Rest, 1 + Total ). triangle( N ) -> triangle( N, 0 ). triangle( 0, Sum ) -> Sum; triangle( N, Sum ) -> triangle( N - 1, N + Sum ). reverse( A ) -> reverse( A, [] ). reverse( [], Collect ) -> Collect; reverse( [Head | Rest], Collect ) -> reverse( Rest, [Head | Collect] ).
When you’re programming in Scheme or Erlang you get used to reaching for a recursive solution whenever you get into a looping situation. And you’re always conscious of whether your implementation is tail recursive or not. And you soon find yourself thinking in recursive patterns like iterator/builder instead of iterative patterns like while(test_this()){do_that();}.
Any recursive algorithm can be expressed as an iterative loop with a stack. If it’s tail recursive, you don’t need the stack to make it into a loop. Some languages, like Scheme and Erlang, will automatically translate tail recursion into in-place looping whenever possible. This allows you to express many algorithms more naturally than you would with a loop without having to worry about stack overflow.
It would be nice if C compilers optimized tail recursion as a loop. It would be even better if C compilers could arrange for a trailing function to step on its caller’s tail whenever possible, even in non-recursive situations. This would allow a more functional coding style in C, and would make it easier for Scheme/Erlang “compilers” to use C as a target language. (I think one of the design goals for C should be to make it a universal target language.)
Tail recursion is more problematic in C++. Usually the last thing a C++ function does is run destructors for local variables. Sometimes this is absolutely essential, such as when you are using a wrapper class to lock/unlock (see Resource Acquisition is Initialization, or RAII). If the C++ compiler optimized tail recursion or tail stepping, the compiler would have to run the destructors before overwriting the caller’s stack frame. In the end the programmer would have to be given a way to control this, thus making C++ even more complex than it already is.
Comments
Leave a Reply