The C++ preprocessor – undefined tokens

The C++ preprocessor (#define etc) doesn’t get much attention, and programmers are told to avoid it. C++ inherited the preprocessor from C, and C got the concept from assembly language macros. And although C++ macros are discouraged, they are used everywhere.

The preprocessor can do some cool things however, like generate documentation or even HTML (I saw that demonstrated somewhere), but I’m just going to start by looking at simple constructs that run into odd behavior. Consider the following, where NOT_DEFINED is an undefined (not #define’d) token.

// The following compiles without complaint.
# if defined( NOT_DEFINED )
#   error This never happens
# endif

// You can check that a token is defined before using it.
# if defined( NOT_DEFINED ) && (NOT_DEFINED == 2)
#   error This never happens
# endif

// You can use NOT_DEFINED in other tests.
# if (4 == 5) && (NOT_DEFINED == 2)
#   error This never happens
# endif

In the last two #if lines, it looks like the preprocessor stops evaluating before it gets to NOT_DEFINED. You’d think that the preprocessor would choke if it had to eval NOT_DEFINED because “not defined” means “it is illegal to evaluate this”.

But this is apparently not the case — both MSVC++ 9.0 (Visual Studio 2008) and the GNU C++ compiler (MinGW) treat an undefined token as 0 (zero) when they are forced to in a #if line.

# if NOT_DEFINED == 0
#   // This DOES happen
#   define UNDEFINED_VARS_ARE_ZERO
# else
#   error This never happens
# endif

// The following make sense when you realize undefined
// vars are 0 (zero).
# if NOT_DEFINED != NOT_DEFINED
#   error This never happens
# endif

# if NOT_DEFINED != NOT_DEFINED_2
#   error This never happens
# endif

# if (1 + NOT_DEFINED + 5 * (2 * NOT_DEFINED * 2)) == 1
#   // This happens
# endif

A token defined as nothing (an empty string) does not act like an undefined token. It does not act like zero.

// D_NULL is an empty token.
# define D_NULL

# if (1 + 1 D_NULL) == 2
#   // Gets here, no complaints
# endif

// The preprocessor does not accept this
// because it forces NOT_DEFINED to be 0.
/*
# if (1 + 1 NOT_DEFINED) == 2
# endif
*/

A var that is #defined and then later undefined still acts like 0.

# define NOT_DEFINED_3 567
# if NOT_DEFINED_3 == 567
#   // This happens
# endif
#
# undef NOT_DEFINED_3
#
# if NOT_DEFINED_3 == 0
#   // This happens
# endif

This behavior can be a problem. You might (wrongly) assume that since templates can understand compile-time constants, the preprocessor will too.

const int flag_telling_us_what_to_do = 4;
# if flag_telling_us_what_to_do == 4
#   error We never get here
# elif flag_telling_us_what_to_do == 0
#   // We DO get here!
# endif

// Another confusing example.
const int fifty_five = 55;
const int seventy_two = 72;
# if fifty_five == seventy_two
#   // We DO get here!
# endif

// Or let's say misspell something.
# define FLAG_TELLING_ME_WHAT_TO_DO 7
# if FLAG_TELLING_me_WHAT_TO_DO == 7
#   error We never get to this error
# elif FLAG_TELLING_ME_WHAT_TO_DO == 0
#   error We never get to this error
# elif FLAG_TELLING_me_WHAT_TO_DO == 0
#   // We DO get here!
# endif

We can treat undefined tokens as both undefined and defined at the same time.

# if (! defined( NOT_DEFINED )) && (NOT_DEFINED == 0)
#   // The compiler lands here without complaint
# endif

But not all undefined tokens can be forced to act like zero. C++ keywords act like they really are undefined.

# if defined( false )
#   error This never happens
# endif
#
# if defined( true )
#   error This never happens
# endif

// The following #if's do not compile.
/*
# if true
# endif
#
# if defined( true ) && (true == 0)
# endif
#
# if defined( false) && false
# endif
#
# if false == false
# endif
*/

// But the above #if's are OK if you spell
// true and false a little differently.
# if true_
#   error This never happens
# endif
#
# if defined( true_ ) && (true_ == 0)
#   error This never happens
# endif
#
# if defined( false_) && false_
#   error This never happens
# endif
#
# if false_ == false_
# //error This does happen
# endif

This applies to all C++ keywords, not just true and false.

# if defined( if )
#   error This never happens
# endif
#
# if defined( template )
#   error This never happens
# endif
#
# if defined( int )
#   error This never happens
# endif

// None of these compile.
/*
# if (if == 0)
# endif
#
# if template
# endif
#
# if int == if
# endif
*/

Even though the preprocessor knows what the keywords are, it still lets you commit terrible coding crimes with them.

# define false if
# define if 88
# define template 99
# define int 33
#
# if false
#   // This does happen
# endif
#
# if if
#   // This does happen
# endif
#
# if template
#   // This does happen
# endif
#
# if int
#   // This does happen
# endif
#
# undef int
# undef if
# undef template
# undef false

// When you undefine a keyword it goes back
// to how it was. It does not act like zero.
// The following does not compile.
/*
# if template
# endif
*/

I hope you’ve enjoyed this look into one of the preprocessor’s dimly-lit corners. I wrote this after I found a misspelling bug and was surprised the preprocessor didn’t choke. Next time I’ll find a bug like that quicker.

Comments

2 Responses to “The C++ preprocessor – undefined tokens”

  1. yonil on December 30th, 2008 2:20 am

    after running into this issue myself, and spending a couple hours hunting a bug related to a side effect of this NOT_DEFINED issue, I’m bewildered. what was the standards commitee thinking when they allowed the preprocessor to substitute 0 for undefined tokens? it should omit a level 1 warning at minimum.

  2. Neal on December 30th, 2008 12:13 pm

    Yeah, seems like a bad idea to me too. I suspect it came about because some C compilers back in the loose-standards day were doing this. Dunno, maybe they wanted to avoid short-circuited evaluation of the #if test.

    But I’m pretty sure the standards guys thought and fought about it, and had good reasons at the time.

    Still, small consolation after you’ve wasted hours hunting down a bug.

Leave a Reply