The C++ preprocessor – recursion

I’ve been looking at how the C++ preprocessor handles recursive macros. Consider the following behavior, which is the same on MSVC++ 9.0 (Visual Studio 2008) and GNU C++ (MinGW Eclipse).

# define XX XX
# if defined( XX ) && (XX == 0)
#   // This happens
# endif

# define AA AB
# define AB AA
# if defined( AA ) && defined( AB ) && (AA == 0) && (AB == 0)
#   // This happens
# endif

This makes sense if you know the rules. They are:

  1. Be lazy. #define evaluates as little as possible. It defines the symbol as a string of tokens to be evaluated later.
  2. Evaluate when necessary. #if forces its expression to be evaluated. Except that symbols in defined(..) are not evaluated.
  3. No recursion. When a symbol is evaluated, it generates a string of tokens which are scanned again. But the original symbol is temporarily undefined while the generated symbols are scanned.
  4. Undefined evaluates to 0 (zero). An undefined symbol is 0 when #if forces evaluation.

Here’s another set of examples.

# define ITSELF ITSELF
# if defined( ITSELF ) && (ITSELF == 0)
#   // This happens
# endif

# define ITSELF2 ITSELF2 ITSELF2
// The following does not compile. "Missing right parenthesis"
/*
# if defined( ITSELF2 ) && (ITSELF2 == ITSELF2)
#   error xxkx
# endif
*/

# define ITSELF3 (ITSELF3 + ITSELF3 + ITSELF3)
# if defined( ITSELF3 ) && (ITSELF3 == 0)
#   // This happens
# endif

The last #if expands something like this.

  1. # if defined( ITSELF3 ) && (ITSELF3 == 0)
  2. # if defined( ITSELF3 ) && ((ITSELF3 + ITSELF3 + ITSELF3) == 0)
  3. # if defined( ITSELF3 ) && ((0 + 0 + 0) == 0)

Now consider this recursive definition of factorial.

# define FACTORIAL( N ) \
        (((N) == 0) ? 1 : ((N) * FACTORIAL( N - 1 )))
// The following will not compile.
/*
# if (FACTORIAL( 3 ) == 6)
# endif
*/

You can see why it doesn’t work if you expand FACTORIAL(..). (The ?: operator works fine in macros by the way.)

  1. FACTORIAL( 3 )
  2. (((3) == 0) ? 1 : ((3) * FACTORIAL( 3 – 1 )))
  3. (0 ? 1 : ((3) * FACTORIAL( 2 )))
  4. ((3) * FACTORIAL( 2 ))
  5. ((3) * 0( 2 ))

In the last step, the temporarily undefined symbol FACTORIAL becomes 0 (zero). The error message is “unmatched parenthesis: missing ‘)’”.

You see. It makes perfect sense. :)

The C++ preprocessor – undefined tokens

The C++ preprocessor (#define etc) doesn’t get much attention, and programmers are told to avoid it. C++ inherited the preprocessor from C, and C got the concept from assembly language macros. And although C++ macros are discouraged, they are used everywhere.

The preprocessor can do some cool things however, like generate documentation or even HTML (I saw that demonstrated somewhere), but I’m just going to start by looking at simple constructs that run into odd behavior. Consider the following, where NOT_DEFINED is an undefined (not #define’d) token.

// The following compiles without complaint.
# if defined( NOT_DEFINED )
#   error This never happens
# endif

// You can check that a token is defined before using it.
# if defined( NOT_DEFINED ) && (NOT_DEFINED == 2)
#   error This never happens
# endif

// You can use NOT_DEFINED in other tests.
# if (4 == 5) && (NOT_DEFINED == 2)
#   error This never happens
# endif

In the last two #if lines, it looks like the preprocessor stops evaluating before it gets to NOT_DEFINED. You’d think that the preprocessor would choke if it had to eval NOT_DEFINED because “not defined” means “it is illegal to evaluate this”.

But this is apparently not the case — both MSVC++ 9.0 (Visual Studio 2008) and the GNU C++ compiler (MinGW) treat an undefined token as 0 (zero) when they are forced to in a #if line.

# if NOT_DEFINED == 0
#   // This DOES happen
#   define UNDEFINED_VARS_ARE_ZERO
# else
#   error This never happens
# endif

// The following make sense when you realize undefined
// vars are 0 (zero).
# if NOT_DEFINED != NOT_DEFINED
#   error This never happens
# endif

# if NOT_DEFINED != NOT_DEFINED_2
#   error This never happens
# endif

# if (1 + NOT_DEFINED + 5 * (2 * NOT_DEFINED * 2)) == 1
#   // This happens
# endif

A token defined as nothing (an empty string) does not act like an undefined token. It does not act like zero.

// D_NULL is an empty token.
# define D_NULL

# if (1 + 1 D_NULL) == 2
#   // Gets here, no complaints
# endif

// The preprocessor does not accept this
// because it forces NOT_DEFINED to be 0.
/*
# if (1 + 1 NOT_DEFINED) == 2
# endif
*/

A var that is #defined and then later undefined still acts like 0.

# define NOT_DEFINED_3 567
# if NOT_DEFINED_3 == 567
#   // This happens
# endif
#
# undef NOT_DEFINED_3
#
# if NOT_DEFINED_3 == 0
#   // This happens
# endif

This behavior can be a problem. You might (wrongly) assume that since templates can understand compile-time constants, the preprocessor will too.

const int flag_telling_us_what_to_do = 4;
# if flag_telling_us_what_to_do == 4
#   error We never get here
# elif flag_telling_us_what_to_do == 0
#   // We DO get here!
# endif

// Another confusing example.
const int fifty_five = 55;
const int seventy_two = 72;
# if fifty_five == seventy_two
#   // We DO get here!
# endif

// Or let's say misspell something.
# define FLAG_TELLING_ME_WHAT_TO_DO 7
# if FLAG_TELLING_me_WHAT_TO_DO == 7
#   error We never get to this error
# elif FLAG_TELLING_ME_WHAT_TO_DO == 0
#   error We never get to this error
# elif FLAG_TELLING_me_WHAT_TO_DO == 0
#   // We DO get here!
# endif

We can treat undefined tokens as both undefined and defined at the same time.

# if (! defined( NOT_DEFINED )) && (NOT_DEFINED == 0)
#   // The compiler lands here without complaint
# endif

But not all undefined tokens can be forced to act like zero. C++ keywords act like they really are undefined.

# if defined( false )
#   error This never happens
# endif
#
# if defined( true )
#   error This never happens
# endif

// The following #if's do not compile.
/*
# if true
# endif
#
# if defined( true ) && (true == 0)
# endif
#
# if defined( false) && false
# endif
#
# if false == false
# endif
*/

// But the above #if's are OK if you spell
// true and false a little differently.
# if true_
#   error This never happens
# endif
#
# if defined( true_ ) && (true_ == 0)
#   error This never happens
# endif
#
# if defined( false_) && false_
#   error This never happens
# endif
#
# if false_ == false_
# //error This does happen
# endif

This applies to all C++ keywords, not just true and false.

# if defined( if )
#   error This never happens
# endif
#
# if defined( template )
#   error This never happens
# endif
#
# if defined( int )
#   error This never happens
# endif

// None of these compile.
/*
# if (if == 0)
# endif
#
# if template
# endif
#
# if int == if
# endif
*/

Even though the preprocessor knows what the keywords are, it still lets you commit terrible coding crimes with them.

# define false if
# define if 88
# define template 99
# define int 33
#
# if false
#   // This does happen
# endif
#
# if if
#   // This does happen
# endif
#
# if template
#   // This does happen
# endif
#
# if int
#   // This does happen
# endif
#
# undef int
# undef if
# undef template
# undef false

// When you undefine a keyword it goes back
// to how it was. It does not act like zero.
// The following does not compile.
/*
# if template
# endif
*/

I hope you’ve enjoyed this look into one of the preprocessor’s dimly-lit corners. I wrote this after I found a misspelling bug and was surprised the preprocessor didn’t choke. Next time I’ll find a bug like that quicker.

← Previous Page