Template code trimming
I’ve been talking about recursive templates and how to control code generation in the last three posts, and I thought of another technique. This shows how to use a separate template param to trim generated code for special cases. In this example the special case is caught by specializing the check_bounds<false> struct.
As usual I use the factorial function to illustrate. This is just an outline for a technique, so I did not pretty it up with namespaces and asserts.
# include <boost/cstdint.hpp>
typedef boost::uint_fast64_t uint_v;
const int max_n = 20;
template< bool XXX >
struct check_bounds {
template< int N >
static uint_v calc( ) { return calc< N - 1 >( ) * N; }
template<>
static uint_v calc< 0 >( ) { return 1; }
};
template<>
struct check_bounds< false > {
template< int N >
static uint_v calc( ) { return 0; }
};
template< int N >
uint_v factorial( )
{
return
check_bounds< (0 <= N) && (N <= max_n) >::calc< N >( );
}
This correctly calculates factorials 0..20, returns zero for all other values, and the compiler has no trouble with expressions like factorial< 1000000 >( ) and factorial< -1000000 >( ). It trims the code and avoids huge expansions. I talk about this more in yesterday’s Template recursion post.
Recursive inline expansion
Since I’m talking about recursive templates today I thought I’d mention recursive inline expansion. Suppose we define factorial (with no bounds checking) like this.
template< int N > inline int factorial( )
{
return N * factorial< N - 1 >( );
}
template<> inline int factorial< 0 >( )
{
return 1;
}
The template language will calculate values of N and only generate the necessary code. If the compiler chose to follow the “inline” request it would generate a good optimization. factorial< 5 >( ) would become the constant (5*4*3*2*1*1), or 120.
Now let’s do the same thing without templates.
inline int factorial( int N)
{
return (N == 0) ? 1 : (N * factorial( N - 1));
}
An inline expansion of this function could go forever unless N is a compile-time constant and the compiler trims the code by watching for (N==0). You can see something similar when using #define.
// You cannot recurse with #define, but this kludge works. # define FACTORIAL( N ) (((N)==0)?1:((N)*FACTORIAL_1((N)-1))) // Helper macros: # define FACTORIAL_1( N ) (((N)==0)?1:((N)*FACTORIAL_2((N)-1))) # define FACTORIAL_2( N ) (((N)==0)?1:((N)*FACTORIAL_3((N)-1))) # define FACTORIAL_3( N ) (((N)==0)?1:((N)*FACTORIAL_4((N)-1))) # define FACTORIAL_4( N ) (((N)==0)?1:((N)*FACTORIAL_5((N)-1))) # define FACTORIAL_5( N ) (((N)==0)?1:((N)*FACTORIAL_6((N)-1))) # define FACTORIAL_6( N ) (((N)==0)?1:((N)*FACTORIAL_7((N)-1))) # define FACTORIAL_7( N ) (((N)==0)?1:((N)*FACTORIAL_8((N)-1))) # define FACTORIAL_8( N ) 0
This only works though FACTORIAL( 7). FACTORIAL( 8) is zero. Something trivial like FACTORIAL( 2) expands into a huge mess.
FACTORIAL( 2 ) becomes: (((2)==0)?1:((2)* ((((2)-1)==0)?1:(((2)-1)* (((((2)-1)-1)==0)?1:((((2)-1)-1)* ((((((2)-1)-1)-1)==0)?1:(((((2)-1)-1)-1)* (((((((2)-1)-1)-1)-1)==0)?1:((((((2)-1)-1)-1)-1)* ((((((((2)-1)-1)-1)-1)-1)==0)?1:(((((((2)-1)-1)-1)-1)-1)* (((((((((2)-1)-1)-1)-1)-1)-1)==0)?1:((((((((2)-1)-1)-1)-1)-1)-1)* ((((((((((2)-1)-1)-1)-1)-1)-1)-1)==0)?1:(((((((((2)-1)-1)-1)-1)-1)-1)-1)* 0))))))))))))))))
Imagine how big that’d be if we defined it all the way through FACTORIAL( 20).
So you see that recursive inline expansion can get out of hand. That’s why the C++ preprocessor doesn’t allow it, and why compilers usually don’t attempt it. But it is not always a bad idea, as the template example shows. Templates force the compiler to look at values and trim code accordingly. Code generation is a wonderful thing, and intelligent code generation is much more useful than simple macro expansion.
Template specialization
In the last post I define a recursive template function to calculate factorial. But since only the first 21 factorials fit in an unsigned 64-bit int I can just define them all with specialization.
// all overflow calculations return zero
template< int N > factorial( ) { return 0; /* overflow */ }
// brute force
template<> factorial< 0 >( ) { return 1; }
template<> factorial< 1 >( ) { return 1; }
template<> factorial< 2 >( ) { return 2; }
template<> factorial< 3 >( ) { return 3*2; }
template<> factorial< 4 >( ) { return 4*3*2; }
.. etc ..
template<> factorial< 20 >( )
{
return 20ull *
19 * 18 * 17 * 16 * 15 * 14 * 13 * 12 * 11 *
10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1;
}
Factorial is probably the wrong example to use here, but this could be useful in other situations, such as defining specialized functions for all the values of an enum.