Template code trimming

I’ve been talking about recursive templates and how to control code generation in the last three posts, and I thought of another technique. This shows how to use a separate template param to trim generated code for special cases. In this example the special case is caught by specializing the check_bounds<false> struct.

As usual I use the factorial function to illustrate. This is just an outline for a technique, so I did not pretty it up with namespaces and asserts.

# include <boost/cstdint.hpp>

typedef boost::uint_fast64_t uint_v;
const int max_n = 20;

template< bool XXX >
struct check_bounds {
    template< int N >
    static uint_v calc( ) { return calc< N - 1 >( ) * N; }

    template<>
    static uint_v calc< 0 >( ) { return 1; }
};

template<>
struct check_bounds< false > {
    template< int N >
    static uint_v calc( ) { return 0; }
};

template< int N >
uint_v factorial( )
{
    return
        check_bounds< (0 <= N) && (N <= max_n) >::calc< N >( );
}

This correctly calculates factorials 0..20, returns zero for all other values, and the compiler has no trouble with expressions like factorial< 1000000 >( ) and factorial< -1000000 >( ). It trims the code and avoids huge expansions. I talk about this more in yesterday’s Template recursion post.

Recursive inline expansion

Since I’m talking about recursive templates today I thought I’d mention recursive inline expansion. Suppose we define factorial (with no bounds checking) like this.

template< int N > inline int factorial( )
{
    return N * factorial< N - 1 >( );
}
template<> inline int factorial< 0 >( )
{
    return 1;
}

The template language will calculate values of N and only generate the necessary code. If the compiler chose to follow the “inline” request it would generate a good optimization. factorial< 5 >( ) would become the constant (5*4*3*2*1*1), or 120.

Now let’s do the same thing without templates.

inline int factorial( int N)
{
    return (N == 0) ? 1 : (N * factorial( N - 1));
}

An inline expansion of this function could go forever unless N is a compile-time constant and the compiler trims the code by watching for (N==0). You can see something similar when using #define.

// You cannot recurse with #define, but this kludge works.
# define FACTORIAL( N )   (((N)==0)?1:((N)*FACTORIAL_1((N)-1)))
// Helper macros:
# define FACTORIAL_1( N ) (((N)==0)?1:((N)*FACTORIAL_2((N)-1)))
# define FACTORIAL_2( N ) (((N)==0)?1:((N)*FACTORIAL_3((N)-1)))
# define FACTORIAL_3( N ) (((N)==0)?1:((N)*FACTORIAL_4((N)-1)))
# define FACTORIAL_4( N ) (((N)==0)?1:((N)*FACTORIAL_5((N)-1)))
# define FACTORIAL_5( N ) (((N)==0)?1:((N)*FACTORIAL_6((N)-1)))
# define FACTORIAL_6( N ) (((N)==0)?1:((N)*FACTORIAL_7((N)-1)))
# define FACTORIAL_7( N ) (((N)==0)?1:((N)*FACTORIAL_8((N)-1)))
# define FACTORIAL_8( N ) 0

This only works though FACTORIAL( 7). FACTORIAL( 8) is zero. Something trivial like FACTORIAL( 2) expands into a huge mess.

FACTORIAL( 2 ) becomes:

(((2)==0)?1:((2)*
((((2)-1)==0)?1:(((2)-1)*
(((((2)-1)-1)==0)?1:((((2)-1)-1)*
((((((2)-1)-1)-1)==0)?1:(((((2)-1)-1)-1)*
(((((((2)-1)-1)-1)-1)==0)?1:((((((2)-1)-1)-1)-1)*
((((((((2)-1)-1)-1)-1)-1)==0)?1:(((((((2)-1)-1)-1)-1)-1)*
(((((((((2)-1)-1)-1)-1)-1)-1)==0)?1:((((((((2)-1)-1)-1)-1)-1)-1)*
((((((((((2)-1)-1)-1)-1)-1)-1)-1)==0)?1:(((((((((2)-1)-1)-1)-1)-1)-1)-1)*
0))))))))))))))))

Imagine how big that’d be if we defined it all the way through FACTORIAL( 20).

So you see that recursive inline expansion can get out of hand. That’s why the C++ preprocessor doesn’t allow it, and why compilers usually don’t attempt it. But it is not always a bad idea, as the template example shows. Templates force the compiler to look at values and trim code accordingly. Code generation is a wonderful thing, and intelligent code generation is much more useful than simple macro expansion.

Template specialization

In the last post I define a recursive template function to calculate factorial. But since only the first 21 factorials fit in an unsigned 64-bit int I can just define them all with specialization.

// all overflow calculations return zero
template< int N > factorial( )  { return 0; /* overflow */ }

// brute force
template<> factorial< 0 >( )  { return 1; }
template<> factorial< 1 >( )  { return 1; }
template<> factorial< 2 >( )  { return 2; }
template<> factorial< 3 >( )  { return 3*2; }
template<> factorial< 4 >( )  { return 4*3*2; }
.. etc ..
template<> factorial< 20 >( )
{
    return 20ull *
        19 * 18 * 17 * 16 * 15 * 14 * 13 * 12 * 11 *
        10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1;
}

Factorial is probably the wrong example to use here, but this could be useful in other situations, such as defining specialized functions for all the values of an enum.

← Previous PageNext Page →